ATLAS (Attributed Text Generation with Retrieval-Augmented Language Models)

ATLAS (Attributed Text Generation with Retrieval-Augmented Language Models) is the few-shot learning system that jointly trains a dense passage retriever and a sequence-to-sequence generator to solve knowledge-intensive NLP tasks — demonstrating that a 11B parameter model with retrieval matches or exceeds the performance of 540B parameter PaLM on knowledge tasks with 50× fewer parameters — the architecture that proved end-to-end retriever-generator co-training is the key to efficient, attributable, knowledge-grounded language models.

What Is ATLAS?

- Definition: A retrieval-augmented language model comprising two jointly trained components: (1) a dense bi-encoder retriever (based on Contriever) that selects relevant passages from a large corpus, and (2) a Fusion-in-Decoder (FiD) generator (based on T5) that produces answers conditioned on the query plus all retrieved passages.
- Joint Training: Unlike RETRO (frozen retriever), ATLAS trains the retriever and generator end-to-end — the retriever learns what information the generator needs, and the generator learns to use what the retriever provides.
- Few-Shot Capability: ATLAS achieves remarkable few-shot performance — with only 64 examples, it matches or exceeds models trained on thousands of examples, because the retrieval database provides implicit knowledge that substitutes for task-specific training data.
- Attribution: Generated outputs can be traced back to specific retrieved passages — providing source attribution that enables fact verification and trust.

Why ATLAS Matters

- 50× Parameter Efficiency: ATLAS-11B matches PaLM-540B on Natural Questions, TriviaQA, and FEVER — demonstrating that retrieval-augmented small models can compete with massive dense models on knowledge tasks.
- End-to-End Retriever Training: Joint training enables the retriever to learn task-specific relevance — selecting passages that actually help the generator answer correctly, not just passages that match lexically.
- Updatable Knowledge: Swapping the retrieval corpus updates the model's knowledge without retraining — ATLAS can be updated to reflect new information by re-indexing the document collection.
- Source Attribution: Every generated answer is conditioned on specific retrieved passages — enabling users to verify claims against original sources.
- Sample Efficiency: In few-shot settings, retrieval provides the missing context that small training sets cannot — ATLAS with 64 examples outperforms non-retrieval models with thousands of examples.

ATLAS Architecture

Retriever (Contriever-based):
- Bi-encoder: encode query q and passage p into dense vectors independently.
- Relevance score: dot product of query and passage embeddings.
- Top-k retrieval from pre-built FAISS index over the full corpus (Wikipedia or larger).
- Jointly trained — retriever adapts to provide passages that maximize generator performance.

Generator (Fusion-in-Decoder):
- Based on T5 (encoder-decoder architecture).
- Each retrieved passage is encoded independently with the query by the T5 encoder.
- T5 decoder cross-attends to all encoded passage representations simultaneously.
- Fusion happens in the decoder — enabling information aggregation across multiple retrieved documents.

Training Strategies:
- Attention Distillation: Use generator's cross-attention scores to provide supervision signal to retriever — passages the generator attends to most should be scored highest by retriever.
- EMDR²: Expectation-Maximization with Document Retrieval as Latent Variable — treats retrieved documents as latent variables and optimizes the marginal likelihood.
- Perplexity Distillation: Train retriever to select passages that minimize generator perplexity.

ATLAS Performance

| Task | PaLM-540B | ATLAS-11B | Parameters Ratio |
|------|-----------|-----------|-----------------|
| Natural Questions | 29.3 (64-shot) | 42.4 (64-shot) | 50× fewer |
| TriviaQA | 81.4 | 84.7 | 50× fewer |
| FEVER | 87.3 | 89.1 | 50× fewer |

ATLAS is the definitive demonstration that retrieval-augmented small models can outperform massive dense models on knowledge tasks — proving that the future of knowledge-intensive NLP lies not in scaling parameters to memorize facts, but in combining efficient generators with learned retrieval systems that access external knowledge on demand.

ATLAS (Attributed Text Generation with Retrieval-Augmented Language Models)

Want to learn more?