AI cost monitoring

AI cost monitoring implements real-time tracking and alerting for API and infrastructure expenses — measuring token usage, model costs, and cloud spending to prevent budget overruns, optimize allocation, and provide visibility into the true cost of AI features across an organization.

What Is AI Cost Monitoring?

- Definition: Tracking and controlling AI-related expenditures.
- Scope: API costs, GPU compute, storage, inference serving.
- Goal: Visibility, predictability, optimization.
- Challenge: Costs can spike unexpectedly with usage.

Why Cost Monitoring Matters

- Budget Control: Prevent surprising bills.
- ROI Calculation: Understand cost per feature/user.
- Optimization: Identify expensive operations.
- Planning: Forecast future spending.
- Accountability: Allocate costs to teams/projects.

Cost Components

LLM API Costs:
``Component | Cost Driver | Example (GPT-4o) -------------------|----------------------|------------------ Input tokens | Context length | $2.50/1M tokens Output tokens | Response length | $10.00/1M tokens Embeddings | Vector generation | $0.13/1M tokens Fine-tuning | Training runs | $8/1M tokens`

Infrastructure Costs:`Component | Cost Driver | Example -------------------|----------------------|------------------ GPU instances | Hours × instance type| $2-100/hr Vector DB | Storage + queries | $0.10-0.50/hr Storage | Data volume | $0.023/GB/month Networking | Egress traffic | $0.05-0.12/GB`

Monitoring Implementation

Basic Cost Tracking:`python import time from dataclasses import dataclass

@dataclass class CostTracker: total_tokens: int = 0 total_cost: float = 0.0 COSTS = { "gpt-4o": {"input": 2.50/1_000_000, "output": 10.00/1_000_000}, "gpt-4o-mini": {"input": 0.15/1_000_000, "output": 0.60/1_000_000}, "claude-3-5-sonnet": {"input": 3.00/1_000_000, "output": 15.00/1_000_000}, } def track(self, model: str, input_tokens: int, output_tokens: int): rates = self.COSTS.get(model, {"input": 0, "output": 0}) cost = (input_tokens rates["input"]) + (output_tokens rates["output"]) self.total_tokens += input_tokens + output_tokens self.total_cost += cost return cost

# Usage tracker = CostTracker() cost = tracker.track("gpt-4o", input_tokens=1500, output_tokens=500) print(f"Request cost: ${cost:.4f}")`

Database Logging:`python async def log_request_cost( request_id: str, model: str, input_tokens: int, output_tokens: int, cost: float, user_id: str, feature: str ): await db.execute(""" INSERT INTO ai_costs (request_id, model, input_tokens, output_tokens, cost, user_id, feature, timestamp) VALUES (?, ?, ?, ?, ?, ?, ?, NOW()) """, [request_id, model, input_tokens, output_tokens, cost, user_id, feature])`

Alerting

Threshold Alerts:`python ALERTS = { "hourly_spend": {"threshold": 100, "action": "warn"}, "daily_spend": {"threshold": 500, "action": "alert"}, "single_request": {"threshold": 1, "action": "flag"}, "rate_spike": {"threshold": 2.0, "action": "investigate"}, # 2× normal }

async def check_cost_alerts(): hourly = await get_hourly_spend() daily = await get_daily_spend() if hourly > ALERTS["hourly_spend"]["threshold"]: await send_alert(f"Hourly spend ${hourly:.2f} exceeds threshold") if daily > ALERTS["daily_spend"]["threshold"]: await send_alert(f"Daily spend ${daily:.2f} exceeds threshold")`

Cost Dashboard Queries

`sql -- Daily spend by model SELECT DATE(timestamp) as date, model, SUM(cost) as total_cost, SUM(input_tokens + output_tokens) as total_tokens, COUNT(*) as request_count FROM ai_costs WHERE timestamp > NOW() - INTERVAL 30 DAY GROUP BY DATE(timestamp), model ORDER BY date DESC, total_cost DESC;

-- Cost per user SELECT user_id, SUM(cost) as total_cost, COUNT(*) as requests, AVG(cost) as avg_cost_per_request FROM ai_costs WHERE timestamp > NOW() - INTERVAL 30 DAY GROUP BY user_id ORDER BY total_cost DESC LIMIT 20;

-- Cost by feature SELECT feature, SUM(cost) as total_cost, SUM(cost) / COUNT(DISTINCT DATE(timestamp)) as daily_avg FROM ai_costs WHERE timestamp > NOW() - INTERVAL 30 DAY GROUP BY feature ORDER BY total_cost DESC;`

Optimization Strategies

`Strategy | Savings | Trade-off ----------------------|-----------------|------------------- Use smaller models | 10-50× | Possible quality drop Prompt optimization | 20-50% | Engineering effort Response caching | 80-95% for hits | Stale responses Batch requests | 10-30% | Added latency Rate limiting | Budget-capped | User impact`

Tools & Services

`Tool | Features -----------------|---------------------------------- Helicone | LLM cost tracking, analytics LangSmith | LangChain cost monitoring OpenAI Usage | Native OpenAI dashboard Custom logging | Full control, any provider``

AI cost monitoring is essential for sustainable AI operations — without visibility into spending, costs can escalate rapidly, and without optimization guidance, teams waste money on inefficient patterns that compound at scale.

Want to learn more?