Home Knowledge Base Python for LLM development

Python for LLM development provides the essential programming foundation for building AI applications — with libraries for API access, model serving, vector databases, and application frameworks, Python is the dominant language for LLM development due to its ecosystem, readability, and extensive ML tooling.

Why Python for LLMs?

Essential Libraries

API Clients:

Library     | Purpose              | Install
------------|---------------------|------------------
openai      | OpenAI API          | pip install openai
anthropic   | Claude API          | pip install anthropic
google-ai   | Gemini API          | pip install google-generativeai
together    | Together.ai API     | pip install together

Model & Inference:

Library      | Purpose              | Install
-------------|---------------------|------------------
transformers | Hugging Face models | pip install transformers
vllm         | Fast LLM serving    | pip install vllm
llama-cpp    | Local inference     | pip install llama-cpp-python
optimum      | Optimized inference | pip install optimum

Frameworks & Tools:

Library     | Purpose              | Install
------------|---------------------|------------------
langchain   | LLM orchestration   | pip install langchain
llamaindex  | RAG framework       | pip install llama-index
chromadb    | Vector database     | pip install chromadb
pydantic    | Data validation     | pip install pydantic

Quick Start Examples

OpenAI API:

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Claude API:

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(message.content[0].text)

Streaming Responses:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async for High Throughput:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def process_batch(prompts):
    tasks = [
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": p}]
        )
        for p in prompts
    ]
    return await asyncio.gather(*tasks)

# Run batch
responses = asyncio.run(process_batch(prompts))

Best Practices

Environment Variables:

import os
from dotenv import load_dotenv

load_dotenv()  # Load from .env file

api_key = os.environ["OPENAI_API_KEY"]
# Never hardcode keys!

Retry Logic:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=60)
)
def call_llm_with_retry(prompt):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )

Response Caching:

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_llm_call(prompt_hash):
    # Cache based on hash of prompt
    return call_llm(prompt)

def call_with_cache(prompt):
    prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
    return cached_llm_call(prompt_hash)

Simple RAG Implementation:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter

# 1. Load and split documents
texts = CharacterTextSplitter().split_text(document)

# 2. Create vector store
vectorstore = Chroma.from_texts(texts, OpenAIEmbeddings())

# 3. Query
results = vectorstore.similarity_search("my question", k=3)

# 4. Generate answer with context
context = "
".join([r.page_content for r in results])
answer = call_llm(f"Context: {context}

Question: my question")

Project Structure:

my_llm_app/
├── .env                 # API keys (gitignored)
├── requirements.txt     # Dependencies
├── src/
│   ├── __init__.py
│   ├── llm.py          # LLM client wrapper
│   ├── embeddings.py   # Embedding functions
│   └── prompts.py      # Prompt templates
├── tests/
│   └── test_llm.py
└── main.py

Python for LLM development is the gateway to building AI applications — its rich ecosystem of libraries, straightforward syntax, and extensive community resources make it the natural choice for developers entering the AI space.

python llmopenai sdkanthropic apiasync pythonlangchaintransformersapi clients

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.