Home Knowledge Base State Space Models (SSMs)

State Space Models (SSMs) like Mamba are alternative architectures to transformers that process sequences with linear rather than quadratic complexity — using structured state spaces and selective mechanisms to achieve competitive quality with transformers while offering constant memory for long sequences and faster inference.

What Are State Space Models?

Why SSMs Matter

From Transformers to SSMs

Transformer Attention:

Attention: O(n²) compute, O(n) memory per layer
Every token attends to every other token
Quality: Excellent for most tasks
Problem: Doesn't scale to very long sequences

State Space Model:

SSM: O(n) compute, O(1) memory per layer
Information flows through hidden state
Update state with each new token
Challenge: Can it match transformer quality?

State Space Equations

Continuous Form:

h'(t) = Ah(t) + Bx(t)    (state update)
y(t) = Ch(t) + Dx(t)     (output)

Where:
- h: hidden state
- x: input
- y: output
- A, B, C, D: learned parameters

Discrete Form (for sequences):

h_t = Ā h_{t-1} + B̄ x_t
y_t = C h_t

Computed efficiently via parallel scan

Mamba: Selective State Spaces

Key Innovation:

Mamba Block:

Input
  ↓
┌─────────────────────────────────────┐
│ Linear projection (expand dim)      │
├─────────────────────────────────────┤
│ Conv1D (local context)              │
├─────────────────────────────────────┤
│ Selective SSM                       │
│ - Input-dependent A, B, C           │
│ - Selective scan (parallel)         │
├─────────────────────────────────────┤
│ Linear projection (reduce dim)      │
└─────────────────────────────────────┘
  ↓
Output

SSM vs. Transformer Comparison

Aspect            | Transformer      | Mamba/SSM
------------------|------------------|------------------
Complexity        | O(n²)            | O(n)
Memory            | O(n) KV cache    | O(1) state
Long context      | Expensive        | Cheap
In-context recall | Excellent        | Good (improving)
Ecosystem         | Mature           | Emerging
Training          | Parallel         | Parallel (scan)
Inference         | KV cache         | RNN-style

Mamba Models

Model           | Params | Performance
----------------|--------|----------------------------
Mamba-130M      | 130M   | Matches 350M transformer
Mamba-370M      | 370M   | Matches 1B transformer
Mamba-1.4B      | 1.4B   | Matches 3B transformer
Mamba-2.8B      | 2.8B   | Competitive with 7B
Jamba           | 52B    | Mamba + attention hybrid

Hybrid Architectures

Jamba (AI21):

Mamba-2:

Limitations

In-Context Learning:

Ecosystem:

Inference Frameworks

State Space Models are a promising alternative to transformers — while transformers dominate today, SSMs offer a fundamentally different approach with better theoretical scaling for long sequences, making them an important direction for future AI architectures.

mambas4state space modelssmlinear attentionsequence modelalternative architecture

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.