Observability for LLM Applications
The Three Pillars of Observability
1. Logs Discrete events recorded over time.
- Request/response logs (with prompt/completion)
- Error logs and stack traces
- System events (model loads, scaling)
2. Metrics Aggregated numerical measurements.
- Latency percentiles (P50, P95, P99)
- Throughput (requests/sec, tokens/sec)
- Error rates
- Cost metrics (tokens consumed, $ spent)
3. Traces Request flow through distributed systems.
- End-to-end request tracing
- Time spent in each component
- Parent-child relationship of spans
LLM-Specific Observability
Key Metrics to Track
| Metric | Description | Target |
|---|---|---|
| TTFT | Time to First Token | <500ms |
| TPOT | Time Per Output Token | <50ms |
| E2E Latency | Full request time | <3s for chat |
| Throughput | Tokens/second | Maximize |
| Error Rate | Failed requests | <0.1% |
| Cost/Request | $ per inference | Minimize |
LLM Observability Tools
| Tool | Type | Highlights |
|---|---|---|
| LangSmith | Commercial | LangChain native, best tracing |
| Langfuse | Open Source | Self-hostable, generous free tier |
| Phoenix (Arize) | Open Source | Strong eval integration |
| Helicone | Commercial | Proxy-based, easy setup |
| Weights & Biases | Commercial | Experiment tracking |
| OpenLLMetry (Traceloop) | Open Source | OpenTelemetry for LLMs |
Logging Best Practices
What to Log
log_entry = {
"request_id": "uuid-123",
"timestamp": "2024-01-15T10:30:00Z",
"model": "gpt-4",
"prompt_tokens": 150,
"completion_tokens": 200,
"latency_ms": 1200,
"user_id": "user-456", # Can be anonymized
"prompt_hash": "abc123", # For PII protection
"status": "success"
}
PII Considerations
- Hash or redact sensitive data
- Anonymize user identifiers
- Implement data retention policies
- Comply with GDPR/CCPA if applicable
Alerting Strategy
| Condition | Severity | Action |
|---|---|---|
| Error rate > 1% | High | Page on-call |
| P99 latency > 5s | Medium | Alert Slack |
| Cost spike > 2x | Medium | Alert team |
| Model drift detected | Low | Create ticket |
monitoringloggingobservability
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.