LiteLLM

LiteLLM is a Python library and proxy server that provides a unified OpenAI-compatible interface to 100+ LLM providers — enabling developers to switch between GPT-4, Claude, Gemini, Llama, Mistral, and any other model by changing a single string, with built-in cost tracking, rate limiting, fallbacks, and load balancing across providers.

What Is LiteLLM?

- Definition: An open-source Python package (and optional proxy server) that maps every major LLM provider's API to the OpenAI chat.completions format — developers write code once using the OpenAI interface, LiteLLM handles translation to Anthropic, Google, Cohere, Mistral, Bedrock, or any other provider's native format.
- Provider Coverage: 100+ providers including OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Cohere, Mistral, Together AI, Groq, Ollama, HuggingFace, Replicate, and any OpenAI-compatible endpoint.
- Proxy Server Mode: LiteLLM can run as a standalone proxy (litellm --model gpt-4) exposing an OpenAI-compatible HTTP endpoint — enabling existing OpenAI SDK code to route through LiteLLM without code changes, just a base_url update.
- Cost Tracking: Real-time token cost calculation across providers — response._hidden_params["response_cost"] gives per-call cost in USD.
- Load Balancing: Distribute requests across multiple API keys or providers with configurable routing strategies — reduce rate limit exposure and improve throughput.

Why LiteLLM Matters

- Vendor Independence: Write provider-agnostic code that can switch from OpenAI to Claude with one word — prevents vendor lock-in and enables rapid model evaluation.
- Cost Optimization: Route expensive requests to GPT-4o and simple classification to GPT-4o-mini (or Haiku) based on task complexity — cost-aware routing reduces LLM spend by 40-60% in mixed-workload applications.
- Reliability via Fallbacks: Configure automatic fallbacks — if OpenAI returns a 429 or 500, retry on Anthropic or Azure automatically, with no application code changes.
- Budget Guardrails: Set per-user, per-team, or per-project spending limits — when a user hits their monthly budget, LiteLLM blocks further requests without application-level changes.
- Observability: Built-in logging to Langfuse, Helicone, Datadog, and 20+ other platforms — every request is traced regardless of provider.

Core Python Usage

Basic Unified Call:
``python from litellm import completion

# Same interface, different models response = completion(model="gpt-4o", messages=[{"role":"user","content":"Hello!"}]) response = completion(model="claude-3-5-sonnet-20241022", messages=[{"role":"user","content":"Hello!"}]) response = completion(model="gemini/gemini-1.5-pro", messages=[{"role":"user","content":"Hello!"}]) response = completion(model="ollama/llama3", messages=[{"role":"user","content":"Hello!"}])`

Fallbacks:`python from litellm import completion

response = completion( model="gpt-4o", messages=[{"role":"user","content":"Summarize this document."}], fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"], num_retries=2 )`

Async + Load Balancing:`python from litellm import Router

router = Router(model_list=[ {"model_name": "gpt-4", "litellm_params": {"model":"gpt-4o", "api_key":"key1"}}, {"model_name": "gpt-4", "litellm_params": {"model":"gpt-4o", "api_key":"key2"}}, # Round-robin across keys ]) response = await router.acompletion(model="gpt-4", messages=[...])`

Proxy Server Setup

`yaml # config.yaml for LiteLLM proxy model_list: - model_name: gpt-4 litellm_params: model: openai/gpt-4o api_key: sk-... - model_name: claude litellm_params: model: anthropic/claude-3-5-sonnet-20241022 api_key: sk-ant-...

router_settings: routing_strategy: least-busy fallbacks: [{"gpt-4": ["claude"]}]`

Run with: litellm --config config.yaml --port 8000

Then existing OpenAI SDK code connects with just base_url="http://localhost:8000".

Key LiteLLM Features

- Token Counter: litellm.token_counter(model="gpt-4", messages=[...])— accurate token counts before sending requests for budget planning. - Cost Calculator:litellm.completion_cost(completion_response=response)— exact USD cost for any completed request across all providers. - Streaming: Unified streaming interface — samestream=True` parameter works for all providers, LiteLLM normalizes the SSE format.
- Vision: Pass image messages in OpenAI format — LiteLLM translates to provider-specific format (Anthropic base64, Gemini inlineData, etc.).
- Function Calling: Unified tool/function calling interface — define once in OpenAI format, LiteLLM handles provider-specific translation.

LiteLLM vs Alternatives

| Feature | LiteLLM | PortKey | Direct SDK |
|---------|---------|---------|-----------|
| Provider coverage | 100+ | 20+ | 1 per SDK |
| Proxy mode | Yes | Yes | No |
| Cost tracking | Built-in | Built-in | Manual |
| Open source | Yes (MIT) | Partially | Varies |
| Self-hostable | Yes | Yes | N/A |

LiteLLM is the essential abstraction layer for any LLM application that needs to work across multiple providers — by normalizing 100+ provider APIs into the single most-familiar interface in AI development, LiteLLM enables teams to evaluate models, optimize costs, and ensure reliability without writing provider-specific integration code.

Want to learn more?