Context Length and Context Windows
What is Context Length? Context length (or context window) is the maximum number of tokens an LLM can process in a single request, including both the input prompt and generated output.
Context Lengths by Model
| Model | Max Context | Notes |
|---|---|---|
| GPT-4 Turbo | 128,000 | ~300 pages of text |
| GPT-4o | 128,000 | Most efficient |
| Claude 3.5 Sonnet | 200,000 | Largest commercial |
| Gemini 1.5 Pro | 1,000,000 | Experimental |
| Llama 3 70B | 8,192 | Base, extendable with RoPE |
| Mistral Large | 32,000 | Good balance |
Why Context Length Matters 1. Document processing: Longer context = more pages per request 2. Conversation history: More turns remembered 3. Few-shot learning: More examples in prompt 4. RAG applications: More retrieved chunks
Trade-offs of Long Context
| Longer Context | Implications |
|---|---|
| ✅ More information | Can include full documents |
| ❌ Higher cost | More tokens = higher API bills |
| ❌ Slower | More computation required |
| ❌ Lost in the middle | Models may miss information in middle of long contexts |
Extending Context
- RoPE scaling: Extend position embeddings (YaRN, NTK-aware)
- RAG: Retrieve only relevant chunks instead of full documents
- Summarization: Compress earlier context
- Sliding window: Process documents in chunks with overlap
Best Practices
- Use RAG for large document sets instead of full context
- Place important information at start and end of prompts
- Monitor "lost in the middle" effects on long contexts
contextcontext lengthwindow
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.