Home Knowledge Base Experiment tracking

Experiment tracking with tools like Weights & Biases (W&B) and MLflow enables systematic logging of ML experiments — recording hyperparameters, metrics, model artifacts, and visualizations to enable reproducibility, comparison, and collaboration across training runs and team members.

Why Experiment Tracking Matters

Key Concepts

What to Track:

Category           | Examples
-------------------|----------------------------------
Hyperparameters    | Learning rate, batch size, epochs
Metrics            | Loss, accuracy, F1, custom metrics
Artifacts          | Model checkpoints, plots
Code               | Git commit, dependencies
Data               | Dataset version, splits
Environment        | GPU type, library versions

Weights & Biases (W&B)

Basic Setup:

import wandb

# Initialize run
wandb.init(
    project="my-llm-project",
    config={
        "learning_rate": 1e-4,
        "batch_size": 32,
        "epochs": 10,
        "model": "gpt2",
    }
)

# Training loop
for epoch in range(config.epochs):
    loss = train_epoch()
    accuracy = evaluate()
    
    # Log metrics
    wandb.log({
        "epoch": epoch,
        "loss": loss,
        "accuracy": accuracy,
    })

# Finish run
wandb.finish()

Advanced W&B Features:

# Log artifacts
artifact = wandb.Artifact("model", type="model")
artifact.add_file("model.pt")
wandb.log_artifact(artifact)

# Log tables
table = wandb.Table(columns=["input", "output", "label"])
for item in eval_data:
    table.add_data(item.input, item.output, item.label)
wandb.log({"predictions": table})

# Log custom plots
wandb.log({"confusion_matrix": wandb.plot.confusion_matrix(
    probs=probs, y_true=labels
)})

# Hyperparameter sweeps
sweep_config = {
    "method": "bayes",
    "metric": {"name": "accuracy", "goal": "maximize"},
    "parameters": {
        "learning_rate": {"min": 1e-5, "max": 1e-3},
        "batch_size": {"values": [16, 32, 64]},
    }
}
sweep_id = wandb.sweep(sweep_config)
wandb.agent(sweep_id, train_function)

MLflow

Basic Setup:

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")

# Start run
with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 1e-4)
    mlflow.log_param("batch_size", 32)
    
    # Training
    for epoch in range(epochs):
        loss = train_epoch()
        mlflow.log_metric("loss", loss, step=epoch)
    
    # Log model
    mlflow.pytorch.log_model(model, "model")
    
    # Log artifacts
    mlflow.log_artifact("config.yaml")

MLflow Model Registry:

# Register model
mlflow.register_model(
    f"runs:/{run_id}/model",
    "production-model"
)

# Transition model stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="production-model",
    version=1,
    stage="Production"
)

# Load production model
model = mlflow.pyfunc.load_model(
    model_uri="models:/production-model/Production"
)

Comparison

Feature             | W&B           | MLflow
--------------------|---------------|----------------
Hosting             | Cloud/Self    | Self-hosted
Visualizations      | Excellent     | Good
Collaboration       | Built-in      | Manual setup
Artifact tracking   | Yes           | Yes
Model registry      | Yes           | Yes
Sweeps/Search       | Built-in      | Basic
LLM evaluations     | Yes           | Limited
Pricing             | Freemium      | Open source

Best Practices

Naming Conventions:

# Clear run names
wandb.init(
    project="llm-finetune",
    name=f"llama-lora-r16-lr{lr}",
    tags=["lora", "llama", "production"]
)

Config Management:

# Use structured configs
config = {
    "model": {
        "name": "llama-3.1-8b",
        "quantization": "4bit",
    },
    "training": {
        "learning_rate": 1e-4,
        "batch_size": 16,
    },
    "data": {
        "dataset": "my-instructions",
        "version": "v2",
    }
}
wandb.init(config=config)

Artifact Versioning:

# Always version data and models
artifact = wandb.Artifact(
    f"training-data-{date}",
    type="dataset",
    metadata={"rows": len(data), "source": "internal"}
)

Experiment tracking is essential infrastructure for serious ML work — without systematic logging, teams lose hours recreating experiments, can't compare approaches fairly, and struggle to reproduce their best results.

experiment trackingwandbmlflowlogginghyperparametersmetricsreproducibility

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.