Home Knowledge Base Compressive Transformer

Compressive Transformer is the long-range transformer architecture that extends context access through a hierarchical memory system — compressing older attention memories into progressively smaller representations rather than discarding them, enabling the model to reference thousands of tokens of history with bounded memory cost — the architecture that demonstrated how learned compression functions can preserve long-range information that fixed-window transformers simply cannot access.

What Is the Compressive Transformer?

Why Compressive Transformer Matters

Compressive Transformer Architecture

Memory Management:

Compression Functions:

Memory Hierarchy Parameters

TierSizeResolutionAgeAccess
Active Memorym tokensFullRecentDirect attention
Compressed Memorym/c tokensCompressedOlderCross-attention
Effective Contextm + m = 2m tokens equiv.MixedFull range2× versus Transformer-XL

Compressive Transformer is the architectural proof that memory doesn't have to be all-or-nothing — demonstrating that learned compression of older context preserves sufficient information for long-range tasks while maintaining the bounded compute that makes deployment practical, pioneering the hierarchical memory design pattern adopted by subsequent efficient transformer architectures.

compressive transformerllm architecture

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.