Home Knowledge Base Prometheus

Prometheus is the open-source monitoring and alerting toolkit that collects time-series metrics by scraping HTTP endpoints on a pull-based architecture — serving as the industry-standard metrics backend powering observability stacks for AI infrastructure, Kubernetes clusters, and GPU monitoring at companies from startups to hyperscalers.

What Is Prometheus?

Why Prometheus Matters for AI Infrastructure

Core Concepts

Metric Types:

Data Model Example: inference_request_duration_seconds{model="llama-3-70b", status="success", quantization="awq"} = 2.34

Labels enable slicing: query by model, by status, by quantization type independently.

PromQL — The Query Language

rate(inference_requests_total[5m]) → requests per second over last 5 minutes histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) → p99 latency sum by (model) (gpu_memory_used_bytes) → memory usage grouped by model name increase(token_generation_total[1h]) → total tokens generated in last hour

Key Exporters for AI

ExporterWhat It Monitors
DCGM ExporterNVIDIA GPU metrics (temp, memory, utilization)
node_exporterHost CPU, memory, disk, network
kube-state-metricsKubernetes pod/deployment health
vLLM built-inLLM inference queue, TTFT, throughput
postgres_exporterVector DB (pgvector) performance
redis_exporterCaching layer hit rate and latency

Prometheus Architecture

Prometheus Server pulls metrics every 15s (configurable) from:

Storage: Local TSDB (time-series database) — efficient compressed blocks, 15 days default retention. Remote Write: Stream metrics to long-term storage (Thanos, Cortex, Grafana Mimir) for years-long retention.

Setting Up GPU Monitoring

Deploy DCGM Exporter as DaemonSet on all GPU nodes. Prometheus scrapes it. Key metrics:

Prometheus is the metrics backbone of modern AI infrastructure — its simple pull-based model, expressive query language, and massive exporter ecosystem make it the universal choice for monitoring everything from GPU temperatures during training runs to token throughput in production inference serving.

prometheusmetricsmonitoring

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.