Mamba introduces Selective State Space Models with input-dependent dynamics — providing a linear-complexity alternative to transformers that processes sequences in O(n) time instead of O(n²), enabling efficient handling of very long sequences while maintaining competitive performance on language, audio, and genomics tasks.
Key Innovation
- Selective Mechanism: Parameters vary based on input content (unlike fixed SSM).
- Hardware-Aware: Custom CUDA kernels for efficient GPU computation.
- Linear Scaling: O(n) complexity vs O(n²) for attention.
- No Attention: Replaces self-attention entirely with structured state spaces.
Performance
- Matches transformer quality on language modeling up to 1B parameters.
- Excels at very long sequences (16K-1M tokens).
- 5x faster inference throughput than similarly-sized transformers.
Models: Mamba-1, Mamba-2, Jamba (hybrid Mamba+Transformer by AI21).
Mamba represents the leading alternative to transformer architecture — proving that attention is not the only path to strong sequence modeling.