Home Knowledge Base AI cost monitoring

AI cost monitoring implements real-time tracking and alerting for API and infrastructure expenses — measuring token usage, model costs, and cloud spending to prevent budget overruns, optimize allocation, and provide visibility into the true cost of AI features across an organization.

What Is AI Cost Monitoring?

Why Cost Monitoring Matters

Cost Components

LLM API Costs:

Component          | Cost Driver           | Example (GPT-4o)
-------------------|----------------------|------------------
Input tokens       | Context length       | $2.50/1M tokens
Output tokens      | Response length      | $10.00/1M tokens
Embeddings         | Vector generation    | $0.13/1M tokens
Fine-tuning        | Training runs        | $8/1M tokens

Infrastructure Costs:

Component          | Cost Driver           | Example
-------------------|----------------------|------------------
GPU instances      | Hours × instance type| $2-100/hr
Vector DB          | Storage + queries    | $0.10-0.50/hr
Storage            | Data volume          | $0.023/GB/month
Networking         | Egress traffic       | $0.05-0.12/GB

Monitoring Implementation

Basic Cost Tracking:

import time
from dataclasses import dataclass

@dataclass
class CostTracker:
    total_tokens: int = 0
    total_cost: float = 0.0
    
    COSTS = {
        "gpt-4o": {"input": 2.50/1_000_000, "output": 10.00/1_000_000},
        "gpt-4o-mini": {"input": 0.15/1_000_000, "output": 0.60/1_000_000},
        "claude-3-5-sonnet": {"input": 3.00/1_000_000, "output": 15.00/1_000_000},
    }
    
    def track(self, model: str, input_tokens: int, output_tokens: int):
        rates = self.COSTS.get(model, {"input": 0, "output": 0})
        cost = (input_tokens * rates["input"]) + (output_tokens * rates["output"])
        
        self.total_tokens += input_tokens + output_tokens
        self.total_cost += cost
        
        return cost

# Usage
tracker = CostTracker()
cost = tracker.track("gpt-4o", input_tokens=1500, output_tokens=500)
print(f"Request cost: ${cost:.4f}")

Database Logging:

async def log_request_cost(
    request_id: str,
    model: str,
    input_tokens: int,
    output_tokens: int,
    cost: float,
    user_id: str,
    feature: str
):
    await db.execute("""
        INSERT INTO ai_costs 
        (request_id, model, input_tokens, output_tokens, cost, 
         user_id, feature, timestamp)
        VALUES (?, ?, ?, ?, ?, ?, ?, NOW())
    """, [request_id, model, input_tokens, output_tokens, 
          cost, user_id, feature])

Alerting

Threshold Alerts:

ALERTS = {
    "hourly_spend": {"threshold": 100, "action": "warn"},
    "daily_spend": {"threshold": 500, "action": "alert"},
    "single_request": {"threshold": 1, "action": "flag"},
    "rate_spike": {"threshold": 2.0, "action": "investigate"},  # 2× normal
}

async def check_cost_alerts():
    hourly = await get_hourly_spend()
    daily = await get_daily_spend()
    
    if hourly > ALERTS["hourly_spend"]["threshold"]:
        await send_alert(f"Hourly spend ${hourly:.2f} exceeds threshold")
    
    if daily > ALERTS["daily_spend"]["threshold"]:
        await send_alert(f"Daily spend ${daily:.2f} exceeds threshold")

Cost Dashboard Queries

-- Daily spend by model
SELECT 
    DATE(timestamp) as date,
    model,
    SUM(cost) as total_cost,
    SUM(input_tokens + output_tokens) as total_tokens,
    COUNT(*) as request_count
FROM ai_costs
WHERE timestamp > NOW() - INTERVAL 30 DAY
GROUP BY DATE(timestamp), model
ORDER BY date DESC, total_cost DESC;

-- Cost per user
SELECT 
    user_id,
    SUM(cost) as total_cost,
    COUNT(*) as requests,
    AVG(cost) as avg_cost_per_request
FROM ai_costs
WHERE timestamp > NOW() - INTERVAL 30 DAY
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 20;

-- Cost by feature
SELECT 
    feature,
    SUM(cost) as total_cost,
    SUM(cost) / COUNT(DISTINCT DATE(timestamp)) as daily_avg
FROM ai_costs
WHERE timestamp > NOW() - INTERVAL 30 DAY
GROUP BY feature
ORDER BY total_cost DESC;

Optimization Strategies

Strategy              | Savings         | Trade-off
----------------------|-----------------|-------------------
Use smaller models    | 10-50×          | Possible quality drop
Prompt optimization   | 20-50%          | Engineering effort
Response caching      | 80-95% for hits | Stale responses
Batch requests        | 10-30%          | Added latency
Rate limiting         | Budget-capped   | User impact

Tools & Services

Tool             | Features
-----------------|----------------------------------
Helicone         | LLM cost tracking, analytics
LangSmith        | LangChain cost monitoring
OpenAI Usage     | Native OpenAI dashboard
Custom logging   | Full control, any provider

AI cost monitoring is essential for sustainable AI operations — without visibility into spending, costs can escalate rapidly, and without optimization guidance, teams waste money on inefficient patterns that compound at scale.

cost monitoringbudgetalertspendtrackingusagebillingoptimization

Related Topics

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.