Guidance is the constraint-based language model programming framework by Microsoft that enables precise control over LLM output structure through interleaved generation and templating — allowing developers to define exact output formats with variables, conditionals, loops, and regex constraints that the model must follow during generation, eliminating post-processing and reducing hallucination through structural enforcement.
What Is Guidance?
- Definition: A Python library that combines templating with constrained generation, letting developers interleave fixed text, LLM generation, and programmatic logic in a single program.
- Core Innovation: Generation happens within structural constraints — the model can only produce tokens that satisfy the specified format.
- Key Difference: Unlike prompt engineering (hoping for the right format), Guidance enforces format through constrained decoding.
- Creator: Microsoft Research, led by Scott Lundberg.
Why Guidance Matters
- Guaranteed Structure: Output always matches the specified format — no parsing failures or format errors.
- Reduced Hallucination: Structural constraints limit the model's generation space, reducing opportunities for hallucination.
- Efficiency: Single forward pass generates structured output — no retry loops or post-processing needed.
- Interleaved Logic: Mix generation with Python code execution, conditionals, and loops within a single program.
- Token Efficiency: Only generate variable content — fixed template text is injected without using tokens.
Core Features
| Feature | Description | Benefit |
|---------|-------------|---------|
| Templates | Jinja-style templates with generation blocks | Structured output |
| Select | Constrain output to specific choices | Guaranteed valid enum values |
| Regex | Match generation against regex patterns | Format enforcement |
| Gen | Free-form generation within constraints | Controlled creativity |
| If/For | Programmatic control flow | Dynamic output structure |
How Guidance Works
Programs are written as templates where `{{gen}} blocks indicate where the model generates text, {{select}} blocks constrain choices, and Python logic controls flow. The model generates tokens that satisfy all active constraints, producing correctly structured output in a single pass.
Example Patterns
- Structured Extraction: Force output into JSON with specific field types.
- Classification: Constrain output to valid class labels using select`.
- Chain-of-Thought: Alternate between reasoning generation and structured answer extraction.
- Multi-Step: Use loops to generate lists of items with consistent formatting.
Guidance is the most precise tool for controlling LLM output structure — replacing the unreliability of prompt-based formatting with guaranteed structural compliance through constrained decoding, making it essential for applications where output format correctness is non-negotiable.