Home Knowledge Base Observability for LLM Applications

Observability for LLM Applications

The Three Pillars of Observability

1. Logs Discrete events recorded over time.

2. Metrics Aggregated numerical measurements.

3. Traces Request flow through distributed systems.

LLM-Specific Observability

Key Metrics to Track

MetricDescriptionTarget
TTFTTime to First Token<500ms
TPOTTime Per Output Token<50ms
E2E LatencyFull request time<3s for chat
ThroughputTokens/secondMaximize
Error RateFailed requests<0.1%
Cost/Request$ per inferenceMinimize

LLM Observability Tools

ToolTypeHighlights
LangSmithCommercialLangChain native, best tracing
LangfuseOpen SourceSelf-hostable, generous free tier
Phoenix (Arize)Open SourceStrong eval integration
HeliconeCommercialProxy-based, easy setup
Weights & BiasesCommercialExperiment tracking
OpenLLMetry (Traceloop)Open SourceOpenTelemetry for LLMs

Logging Best Practices

What to Log

log_entry = {
    "request_id": "uuid-123",
    "timestamp": "2024-01-15T10:30:00Z",
    "model": "gpt-4",
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "latency_ms": 1200,
    "user_id": "user-456",  # Can be anonymized
    "prompt_hash": "abc123",  # For PII protection
    "status": "success"
}

PII Considerations

Alerting Strategy

ConditionSeverityAction
Error rate > 1%HighPage on-call
P99 latency > 5sMediumAlert Slack
Cost spike > 2xMediumAlert team
Model drift detectedLowCreate ticket
monitoringloggingobservability

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.