Instructor

Instructor is a Python library that forces LLMs to return valid, validated Pydantic models by patching official provider SDKs — combining JSON mode, function calling, and automatic retry-with-error-feedback into a single decorator-driven interface — making structured LLM output as simple as defining a Python class and as reliable as a typed API endpoint.

What Is Instructor?

- Definition: An open-source Python library (by Jason Liu, 2023) that wraps OpenAI, Anthropic, Google, and other LLM provider SDKs to add a response_model parameter — specify any Pydantic BaseModel subclass and Instructor guarantees the LLM response parses into a valid instance of that class.
- Core Mechanism: Instructor uses the provider's native structured output mechanism (OpenAI JSON mode, function calling, or tool use) and adds Pydantic validation on top — if validation fails, it automatically re-prompts the LLM with the validation error message and retries.
- Pydantic Integration: Every field definition, validator, and description in your Pydantic model becomes a prompt signal — Field(description="Must be a positive integer") is automatically included in the schema sent to the LLM.
- Automatic Retries: Configure max_retries=3 and Instructor handles the retry loop — catching Pydantic ValidationErrors, formatting them as feedback to the LLM, and requesting a corrected response.
- Multi-Provider: Supports OpenAI, Anthropic Claude, Google Gemini, Cohere, Mistral, Ollama, and any OpenAI-compatible endpoint — same code, different providers.

Why Instructor Matters

- Developer Ergonomics: Defining a Pydantic model is already standard Python practice — Instructor makes it the complete interface for LLM structured output, requiring zero prompt engineering for format compliance.
- Validation as Specification: Pydantic validators serve as both input specification and output guarantee — @validator("age") def age_must_be_positive becomes both documentation and enforcement.
- Streaming Support: Stream Pydantic model instances as they generate — useful for progressive UI updates where you want to show partial results as the LLM generates each field.
- Observability Integration: First-class integration with Langfuse, Logfire, and OpenTelemetry — every Instructor call is automatically traced with input schema, output, validation errors, and retry count.
- Widely Adopted: One of the most-starred structured output libraries on GitHub — used by thousands of production applications for data extraction, classification, and agent tool responses.

Core Usage Pattern

``python import instructor from anthropic import Anthropic from pydantic import BaseModel, Field

client = instructor.from_anthropic(Anthropic())

class Person(BaseModel): name: str = Field(description="Full name of the person") age: int = Field(ge=0, le=150, description="Age in years") occupation: str

person = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=512, messages=[{"role": "user", "content": "Extract: John Smith, 34, works as a software engineer"}], response_model=Person, ) # person.name == "John Smith", person.age == 34, always a valid Person`

Advanced Instructor Features

Nested Models:`python class Address(BaseModel): street: str city: str country: str

class Company(BaseModel): name: str headquarters: Address # Nested Pydantic model works automatically employees: list[Person] # List of models also works`

Partial Streaming:`python for partial_person in client.messages.create(..., stream=True, response_model=Iterable[Person]): print(partial_person) # Progressive output as fields generate`

Validation with Feedback: When the LLM outputs"age": "thirty-four", Pydantic raises ValidationError: age must be int. Instructor automatically sends: "The previous response had a validation error: age must be int. Please correct and retry." — the LLM self-corrects without developer intervention.

Instructor vs Alternatives

| Feature | Instructor | Outlines | Guidance | Raw JSON mode | |---------|-----------|---------|---------|--------------| | Pydantic integration | Native | Good | Limited | Manual | | API model support | Excellent | Limited | Good | Full | | Retry on failure | Automatic | N/A | N/A | Manual | | Learning curve | Very low | Low | Medium | Low | | Streaming | Yes | No | Limited | Manual | | Validation feedback | Yes (auto) | No | No | No |

Common Use Cases

- Document Extraction: Extract invoices, contracts, and reports into typed Python objects for downstream processing. - Classification: Multi-label classification withLiteral type hints — category: Literal["tech", "sports", "politics"]`.
- Agent Tool Responses: Ensure tool-calling agents return well-formed tool results that downstream functions can consume without error handling.
- Data Pipeline ETL: Transform unstructured text sources into structured database records with guaranteed schema compliance.
- API Response Generation: Build LLM-powered API endpoints that always return valid JSON matching your OpenAPI schema.

Instructor is the simplest path from Pydantic model to reliable structured LLM output — by leveraging the validation infrastructure Python developers already use daily, Instructor makes LLM-powered data extraction and classification as trustworthy and maintainable as any other typed function in a production codebase.

Want to learn more?