Python for LLM development provides the essential programming foundation for building AI applications — with libraries for API access, model serving, vector databases, and application frameworks, Python is the dominant language for LLM development due to its ecosystem, readability, and extensive ML tooling.
Why Python for LLMs?
- Ecosystem: Most LLM tools and libraries are Python-first.
- ML Heritage: Built on PyTorch, TensorFlow, scikit-learn.
- API Clients: Official SDKs from OpenAI, Anthropic, etc.
- Rapid Prototyping: Quick iteration from idea to working code.
- Community: Largest AI/ML developer community.
Essential Libraries
API Clients:
```
Library | Purpose | Install
------------|---------------------|------------------
openai | OpenAI API | pip install openai
anthropic | Claude API | pip install anthropic
google-ai | Gemini API | pip install google-generativeai
together | Together.ai API | pip install together
Model & Inference:
``
Library | Purpose | Install
-------------|---------------------|------------------
transformers | Hugging Face models | pip install transformers
vllm | Fast LLM serving | pip install vllm
llama-cpp | Local inference | pip install llama-cpp-python
optimum | Optimized inference | pip install optimum
Frameworks & Tools:
``
Library | Purpose | Install
------------|---------------------|------------------
langchain | LLM orchestration | pip install langchain
llamaindex | RAG framework | pip install llama-index
chromadb | Vector database | pip install chromadb
pydantic | Data validation | pip install pydantic
Quick Start Examples
OpenAI API:
`python
from openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY env var
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
`
Claude API:
`python
from anthropic import Anthropic
client = Anthropic() # Uses ANTHROPIC_API_KEY env var
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(message.content[0].text)
`
Streaming Responses:
`python
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
`
Async for High Throughput:
`python
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def process_batch(prompts):
tasks = [
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": p}]
)
for p in prompts
]
return await asyncio.gather(*tasks)
# Run batch
responses = asyncio.run(process_batch(prompts))
`
Best Practices
Environment Variables:
`python
import os
from dotenv import load_dotenv
load_dotenv() # Load from .env file
api_key = os.environ["OPENAI_API_KEY"]
# Never hardcode keys!
`
Retry Logic:
`python
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60)
)
def call_llm_with_retry(prompt):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
`
Response Caching:
`python
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_llm_call(prompt_hash):
# Cache based on hash of prompt
return call_llm(prompt)
def call_with_cache(prompt):
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
return cached_llm_call(prompt_hash)
`
Simple RAG Implementation:
`python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
# 1. Load and split documents
texts = CharacterTextSplitter().split_text(document)
# 2. Create vector store
vectorstore = Chroma.from_texts(texts, OpenAIEmbeddings())
# 3. Query
results = vectorstore.similarity_search("my question", k=3)
# 4. Generate answer with context
context = "
".join([r.page_content for r in results])
answer = call_llm(f"Context: {context}
Question: my question")
`
Project Structure:
```
my_llm_app/
├── .env # API keys (gitignored)
├── requirements.txt # Dependencies
├── src/
│ ├── __init__.py
│ ├── llm.py # LLM client wrapper
│ ├── embeddings.py # Embedding functions
│ └── prompts.py # Prompt templates
├── tests/
│ └── test_llm.py
└── main.py
Python for LLM development is the gateway to building AI applications — its rich ecosystem of libraries, straightforward syntax, and extensive community resources make it the natural choice for developers entering the AI space.