LiteLLM is a Python library and proxy server that provides a unified OpenAI-compatible interface to 100+ LLM providers — enabling developers to switch between GPT-4, Claude, Gemini, Llama, Mistral, and any other model by changing a single string, with built-in cost tracking, rate limiting, fallbacks, and load balancing across providers.
What Is LiteLLM?
- Definition: An open-source Python package (and optional proxy server) that maps every major LLM provider's API to the OpenAI
chat.completionsformat — developers write code once using the OpenAI interface, LiteLLM handles translation to Anthropic, Google, Cohere, Mistral, Bedrock, or any other provider's native format. - Provider Coverage: 100+ providers including OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Cohere, Mistral, Together AI, Groq, Ollama, HuggingFace, Replicate, and any OpenAI-compatible endpoint.
- Proxy Server Mode: LiteLLM can run as a standalone proxy (
litellm --model gpt-4) exposing an OpenAI-compatible HTTP endpoint — enabling existing OpenAI SDK code to route through LiteLLM without code changes, just abase_urlupdate. - Cost Tracking: Real-time token cost calculation across providers —
response._hidden_params["response_cost"]gives per-call cost in USD. - Load Balancing: Distribute requests across multiple API keys or providers with configurable routing strategies — reduce rate limit exposure and improve throughput.
Why LiteLLM Matters
- Vendor Independence: Write provider-agnostic code that can switch from OpenAI to Claude with one word — prevents vendor lock-in and enables rapid model evaluation.
- Cost Optimization: Route expensive requests to GPT-4o and simple classification to GPT-4o-mini (or Haiku) based on task complexity — cost-aware routing reduces LLM spend by 40-60% in mixed-workload applications.
- Reliability via Fallbacks: Configure automatic fallbacks — if OpenAI returns a 429 or 500, retry on Anthropic or Azure automatically, with no application code changes.
- Budget Guardrails: Set per-user, per-team, or per-project spending limits — when a user hits their monthly budget, LiteLLM blocks further requests without application-level changes.
- Observability: Built-in logging to Langfuse, Helicone, Datadog, and 20+ other platforms — every request is traced regardless of provider.
Core Python Usage
Basic Unified Call:
from litellm import completion
# Same interface, different models
response = completion(model="gpt-4o", messages=[{"role":"user","content":"Hello!"}])
response = completion(model="claude-3-5-sonnet-20241022", messages=[{"role":"user","content":"Hello!"}])
response = completion(model="gemini/gemini-1.5-pro", messages=[{"role":"user","content":"Hello!"}])
response = completion(model="ollama/llama3", messages=[{"role":"user","content":"Hello!"}])
Fallbacks:
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role":"user","content":"Summarize this document."}],
fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"],
num_retries=2
)
Async + Load Balancing:
from litellm import Router
router = Router(model_list=[
{"model_name": "gpt-4", "litellm_params": {"model":"gpt-4o", "api_key":"key1"}},
{"model_name": "gpt-4", "litellm_params": {"model":"gpt-4o", "api_key":"key2"}}, # Round-robin across keys
])
response = await router.acompletion(model="gpt-4", messages=[...])
Proxy Server Setup
# config.yaml for LiteLLM proxy
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4o
api_key: sk-...
- model_name: claude
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: sk-ant-...
router_settings:
routing_strategy: least-busy
fallbacks: [{"gpt-4": ["claude"]}]
Run with: litellm --config config.yaml --port 8000
Then existing OpenAI SDK code connects with just base_url="http://localhost:8000".
Key LiteLLM Features
- Token Counter:
litellm.token_counter(model="gpt-4", messages=[...])— accurate token counts before sending requests for budget planning. - Cost Calculator:
litellm.completion_cost(completion_response=response)— exact USD cost for any completed request across all providers. - Streaming: Unified streaming interface — same
stream=Trueparameter works for all providers, LiteLLM normalizes the SSE format. - Vision: Pass image messages in OpenAI format — LiteLLM translates to provider-specific format (Anthropic base64, Gemini inlineData, etc.).
- Function Calling: Unified tool/function calling interface — define once in OpenAI format, LiteLLM handles provider-specific translation.
LiteLLM vs Alternatives
| Feature | LiteLLM | PortKey | Direct SDK |
|---|---|---|---|
| Provider coverage | 100+ | 20+ | 1 per SDK |
| Proxy mode | Yes | Yes | No |
| Cost tracking | Built-in | Built-in | Manual |
| Open source | Yes (MIT) | Partially | Varies |
| Self-hostable | Yes | Yes | N/A |
LiteLLM is the essential abstraction layer for any LLM application that needs to work across multiple providers — by normalizing 100+ provider APIs into the single most-familiar interface in AI development, LiteLLM enables teams to evaluate models, optimize costs, and ensure reliability without writing provider-specific integration code.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.