Day 13: Experiment Tracking

📂 Data & Training 📖 15 min read Needs expansion

Learning Objectives

Understand why ad-hoc experiment tracking leads to irreproducible results
Learn the MLflow workflow: log params, metrics, artifacts, model registry
Integrate MLflow (SQLite backend) with a training script

Theory (15 min)

The Reproducibility Crisis

Most ML projects fail at reproduction: - "Which hyperparameters produced that 92% accuracy run?" - "What data was used for this model?" - "Where is the training script version that generated this?"

A single source of truth for experiments fixes this.

What to Track

Type	Examples	Why
Parameters	lr=3e-5, batch_size=16, model=bert-base	Reproduce training
Metrics	loss, accuracy, tokens/sec, GPU mem	Compare runs
Artifacts	model.pt, tokenizer, config.json	Deploy from any run
Source	git commit, diff, script hash	Audit trail
Environment	Python version, CUDA version, GPU type	Debug hardware issues

MLflow Components

MLflow Tracking ──▶ Log params, metrics, artifacts (runs)
MLflow Registry ──▶ Staging → Production transitions
MLflow Models ────▶ Standard model packaging format

Lightweight setup: MLflow with SQLite backend (no extra infrastructure).

Workflow

1. Run training → MLflow logs params + metrics + model weights
2. Compare runs in UI → pick best validation loss
3. Register model → "prod" stage
4. Deploy from registry → consistent artifact path

Hands-on (15 min)

Integrate MLflow with a Training Script

pip install mlflow

#!/usr/bin/env python3
"""mlflow-tracking.py — experiment tracking with MLflow."""
import mlflow
import mlflow.pyfunc
import json
import time
import random

# Stub — Ayva will expand with:
# - Real model training logged to MLflow
# - Hyperparameter sweeps (GridSearch / Optuna)
# - Model registry: staging → production promotion
# - Artifact logging (model weights, tokenizer, config)
# - MLflow UI setup (mlflow server)
# - Compare runs and select best
# - Integration with the fault-tolerant training from Day 12

# Set tracking URI (SQLite)
mlflow.set_tracking_uri("sqlite:///mlruns.db")
mlflow.set_experiment("ai-system-design")

def train_with_tracking():
    with mlflow.start_run(run_name=f"run-{int(time.time())}"):
        # Log parameters
        params = {
            "learning_rate": 3e-5,
            "batch_size": 16,
            "epochs": 5,
            "model_name": "qwen2.5-3b",
            "lora_rank": 8,
            "lora_alpha": 16,
            "dataset": "code-alpaca-5k",
        }
        mlflow.log_params(params)
        print(f"Logged params: {params}")

        # Simulate training and log metrics
        for epoch in range(5):
            train_loss = max(0.5, 2.0 / (epoch + 1) + random.uniform(-0.1, 0.1))
            val_loss = max(0.6, 2.2 / (epoch + 1) + random.uniform(-0.1, 0.1))
            accuracy = min(0.95, 0.5 + epoch * 0.09 + random.uniform(-0.02, 0.02))

            mlflow.log_metrics({
                "train_loss": train_loss,
                "val_loss": val_loss,
                "accuracy": accuracy,
            }, step=epoch)
            print(f"  epoch {epoch}: train={train_loss:.4f}, val={val_loss:.4f}, acc={accuracy:.3f}")
            time.sleep(0.2)

        # Log a dummy artifact (real: model weights)
        artifact_path = "./artifacts"
        import os
        os.makedirs(artifact_path, exist_ok=True)
        with open(f"{artifact_path}/config.json", "w") as f:
            json.dump(params, f)
        mlflow.log_artifact(artifact_path)
        print(f"Logged artifacts from {artifact_path}")

        # Register the run
        run_id = mlflow.active_run().info.run_id
        print(f"\n✅ Run complete! run_id: {run_id}")
        print(f"   View: mlflow ui --backend-store-uri sqlite:///mlruns.db")

        return run_id

if __name__ == "__main__":
    train_with_tracking()

    # View runs
    print("\n📊 Recent runs:")
    runs = mlflow.search_runs(experiment_names=["ai-system-design"])
    for _, run in runs.iterrows():
        print(f"   {run['run_id'][:8]}... acc={run.get('metrics.accuracy', 'N/A')}")

View the MLflow UI:

mlflow ui --backend-store-uri sqlite:///mlruns.db --port 5002

Questions for Ayva: - How to integrate MLflow with distributed training (autologging)? - What's the best practice for model registry promotion workflow? - When should you use MLflow vs W&B vs Neptune?

Key Takeaways

Experiment tracking is non-negotiable for reproducible ML — log everything
MLflow with SQLite is zero-infrastructure and sufficient for most teams
Track: params, metrics, artifacts, source code, environment
Model registry (staging → prod transitions) enables structured deployment

🧠 AI System Design