SwarmFormer: Rethinking Efficient AI

We're proud to announce our first innovation of 2025 from the research team at Takara.ai: SwarmFormer, a new transformer architecture that fundamentally rethinks how we process information in AI models.

A Different Approach to AI

The core idea behind SwarmFormer came from observing how nature handles complex problems. Just as swarms of bees or ants can solve sophisticated tasks through simple local interactions, we wondered: could we apply similar principles to make AI models more efficient?

Traditional transformer models process all information globally, which becomes computationally expensive as sequences get longer. SwarmFormer takes a different approach by combining local processing with strategic global communication. This seemingly simple change has profound implications - our models use up to 94% fewer parameters while matching or exceeding traditional performance metrics.

The Technical Innovation

The magic happens through what we call “hierarchical local-global processing”, think of it like swarm-routing. Rather than having every part of the model interact with every other part, we organise information into clusters that can efficiently share information both locally and globally. This approach drastically reduces computational requirements while preserving model capabilities.

Swarm Aggregation for the SwarmFormer blog.

Our experiments show remarkable results. SwarmFormer-Base achieves 89% accuracy on standard benchmarks using just 6.7M parameters - compared to BERT's 108M parameters for similar performance. This represents a 94% reduction in model size without sacrificing accuracy. Even our smaller 4.3M parameter model achieves 86% accuracy, demonstrating the architecture's efficiency at various scales.

Real-World Impact

What excites us most is what this means for AI accessibility. The models train quickly on consumer hardware and can run on a wide range of devices. This efficiency translates directly to reduced infrastructure costs - up to 70% in our initial testing.

We're seeing promising applications across several areas:

Processing large document collections efficiently
Enabling sophisticated models on edge devices
Making advanced AI capabilities accessible to organisations of all sizes
Democratising AI research by enabling breakthroughs on consumer hardware, shifting innovation from large tech companies to broader scientific communities

The architecture's efficiency could make these applications accessible to underserved communities, aligning with our goal of transforming humanity. Furthermore SwarmFormer's reduced parameter count and computational efficiency translate to significant energy savings in AI deployments. By requiring less compute power and memory, these models can help organisations reduce their environmental impact while scaling AI capabilities. This architectural efficiency represents a step toward more sustainable AI development.

Breaking New Ground

The efficiency gains from SwarmFormer's architecture have enabled some remarkable achievements. We've successfully tested models with context windows spanning tens of millions of tokens - far beyond what's currently possible with traditional architectures. This opens up new possibilities for processing and understanding massive documents in their entirety.

In computer vision, our SwarmFormer-based models have already achieved state-of-the-art results on standard benchmarks while using significantly fewer parameters than existing approaches. Additionally, we're developing breakthrough text-to-speech models that require just a fraction of the compute and memory resources compared to current solutions.

These early results are encouraging, but we see them as just the beginning. SwarmFormer's efficient and versatile architecture is proving to be remarkably versatile across different domains, and we're excited to explore what else is possible.

We believe AI innovation should push the boundaries of what's possible while making the technology more accessible. SwarmFormer represents our first step toward that vision.

For those interested in the technical details, our research paper “SwarmFormer: Local-Global Hierarchical Attention via Clustered Token Representations” provides complete implementation details and mathematical foundations. We welcome collaboration and look forward to seeing how the community builds upon these ideas.

We also provide the model checkpoints and inference code so you can try it out and see how it works.

Github: Code
HuggingFace: Models

Feel free to reach out to us at research@takara.ai for questions or potential collaborations.

Citation

If you use SwarmFormer in your research, please cite:

@article{legg2025swarmformer,
  title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
  author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
  journal={Takara.ai Research},
  year={2025},
  url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}