Home Knowledge Base Hyena

Hyena is a subquadratic attention replacement that combines long convolutions (computed via FFT) with element-wise data-dependent gating — achieving O(n log n) complexity instead of attention's O(n²) while maintaining the data-dependent processing crucial for language understanding, matching transformer quality on language modeling at 1-2B parameter scale with 100× speedup on 64K-token contexts, representing a fundamentally different architectural path beyond the attention mechanism.

What Is Hyena?

The Hyena Operator

ComponentFunctionAnalogy to Attention
Implicit Convolution FiltersParameterize convolution kernels with small neural networks, apply via FFTLike the attention pattern (which tokens interact)
Data-Dependent GatingElement-wise multiplication gated by the inputLike attention weights being conditioned on Q and K
FFT ComputationConvolution in frequency domain: O(n log n)Replaces the O(n²) QK^T attention matrix

Hyena computation: h = (v ⊙ filter₁(x)) ⊙ (x ⊙ filter₂(x))

Where ⊙ is element-wise multiplication and filters are implicitly parameterized.

Complexity Comparison

OperatorComplexityData-Dependent?Global Receptive Field?Exact?
Full AttentionO(n²)Yes (QK^T)YesYes
FlashAttentionO(n²) FLOPs, O(n) memoryYesYesYes
Linear AttentionO(n)ApproximateYes (kernel approx)No
HyenaO(n log n)Yes (gating)Yes (FFT convolution)N/A (different operator)
S4/MambaO(n) or O(n log n)Yes (selective)Yes (SSM)N/A (different operator)
Local AttentionO(n × w)YesNo (window only)Yes (within window)

Benchmark Results

BenchmarkTransformer (baseline)HyenaNotes
WikiText-103 (perplexity)18.7 (GPT-2 scale)18.9Within 1% quality
The Pile (perplexity)ComparableComparable at 1-2B scaleMatches at moderate scale
Long-range ArenaBaselineCompetitiveSynthetic long-range benchmarks
Speed (64K context)1× (with FlashAttention)~100× fasterDominant advantage at long contexts

Hyena vs Related Subquadratic Architectures

ModelCore MechanismComplexityMaturity
HyenaImplicit convolution + gatingO(n log n)Research (2023)
Mamba (S6)Selective State Space Model + hardware-aware scanO(n)Production-ready (2024)
RWKVLinear attention + recurrenceO(n)Open-source, active community
RetNetRetention mechanism (parallel + recurrent)O(n)Research (Microsoft)

Hyena represents a fundamentally new approach to sequence modeling beyond attention — replacing the O(n²) attention matrix with O(n log n) FFT-based implicit convolutions and data-dependent gating, matching transformer quality at moderate scale while delivering 100× speedups on long contexts, demonstrating that the attention mechanism may not be the only path to high-quality language understanding and opening the door to sub-quadratic foundation models.

hyenallm architecture

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.