Prometheus

Keywords: prometheus,mlops

Prometheus is an open-source monitoring and alerting toolkit that collects, stores, and queries time-series metrics data. It has become the de facto standard for monitoring infrastructure and applications, especially in Kubernetes environments.

Core Architecture

- Pull-Based Collection: Prometheus periodically scrapes metrics from HTTP endpoints exposed by applications and exporters (default: every 15 seconds).
- Time-Series Database: Metrics are stored as time-series data — sequences of timestamped values identified by metric name and key-value labels.
- PromQL: A powerful query language for selecting, filtering, aggregating, and computing over metrics data.
- Alert Manager: Evaluates alerting rules against metrics and routes notifications to email, Slack, PagerDuty, etc.

Key Concepts

- Metrics Endpoint: Applications expose a /metrics HTTP endpoint returning metrics in Prometheus format.
- Exporters: Pre-built adapters that expose metrics from third-party systems (node_exporter for OS metrics, nvidia_gpu_exporter for GPU metrics, mysqld_exporter for MySQL).
- Labels: Key-value pairs that add dimensions to metrics — http_requests_total{method="POST", status="200", model="gpt-4"}.
- Recording Rules: Pre-compute expensive queries and store results as new metrics for dashboard performance.

Prometheus for AI/ML Monitoring

- GPU Metrics: Use DCGM Exporter to collect NVIDIA GPU utilization, memory, temperature, and power consumption.
- Inference Metrics: Track request latency, throughput, queue depth, and error rates for model serving endpoints.
- Custom Metrics: Instrument application code with Prometheus client libraries to expose model-specific metrics (token counts, cache hit rates, quality scores).

Common PromQL Queries

- rate(http_requests_total[5m]) — Requests per second over 5 minutes.
- histogram_quantile(0.99, rate(request_duration_seconds_bucket[5m])) — p99 latency.
- avg(gpu_utilization) by (instance) — Average GPU utilization per server.

Ecosystem

- Grafana: Primary visualization tool for Prometheus metrics — dashboards, graphs, and alerts.
- Thanos / Cortex / Mimir: Long-term storage and horizontal scaling for Prometheus.
- Kubernetes: Prometheus is the native monitoring solution for Kubernetes via kube-prometheus-stack.

Prometheus is a foundational monitoring tool — if you're running any production infrastructure (especially Kubernetes), Prometheus is almost certainly part of your stack.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT