🧠 AI System Design

Day 13: Experiment Tracking

šŸ“‚ Data & Training šŸ“– 15 min read Needs expansion

Learning Objectives

  • Understand why ad-hoc experiment tracking leads to irreproducible results
  • Learn the MLflow workflow: log params, metrics, artifacts, model registry
  • Integrate MLflow (SQLite backend) with a training script

Theory (15 min)

The Reproducibility Crisis

Most ML projects fail at reproduction: - "Which hyperparameters produced that 92% accuracy run?" - "What data was used for this model?" - "Where is the training script version that generated this?"

A single source of truth for experiments fixes this.

What to Track

Type Examples Why
Parameters lr=3e-5, batch_size=16, model=bert-base Reproduce training
Metrics loss, accuracy, tokens/sec, GPU mem Compare runs
Artifacts model.pt, tokenizer, config.json Deploy from any run
Source git commit, diff, script hash Audit trail
Environment Python version, CUDA version, GPU type Debug hardware issues

MLflow Components

MLflow Tracking ──▶ Log params, metrics, artifacts (runs)
MLflow Registry ──▶ Staging → Production transitions
MLflow Models ────▶ Standard model packaging format

Lightweight setup: MLflow with SQLite backend (no extra infrastructure).

Workflow

1. Run training → MLflow logs params + metrics + model weights
2. Compare runs in UI → pick best validation loss
3. Register model → "prod" stage
4. Deploy from registry → consistent artifact path

Hands-on (15 min)

Integrate MLflow with a Training Script

pip install mlflow
#!/usr/bin/env python3
"""mlflow-tracking.py — experiment tracking with MLflow."""
import mlflow
import mlflow.pyfunc
import json
import time
import random

# Stub — Ayva will expand with:
# - Real model training logged to MLflow
# - Hyperparameter sweeps (GridSearch / Optuna)
# - Model registry: staging → production promotion
# - Artifact logging (model weights, tokenizer, config)
# - MLflow UI setup (mlflow server)
# - Compare runs and select best
# - Integration with the fault-tolerant training from Day 12

# Set tracking URI (SQLite)
mlflow.set_tracking_uri("sqlite:///mlruns.db")
mlflow.set_experiment("ai-system-design")

def train_with_tracking():
    with mlflow.start_run(run_name=f"run-{int(time.time())}"):
        # Log parameters
        params = {
            "learning_rate": 3e-5,
            "batch_size": 16,
            "epochs": 5,
            "model_name": "qwen2.5-3b",
            "lora_rank": 8,
            "lora_alpha": 16,
            "dataset": "code-alpaca-5k",
        }
        mlflow.log_params(params)
        print(f"Logged params: {params}")

        # Simulate training and log metrics
        for epoch in range(5):
            train_loss = max(0.5, 2.0 / (epoch + 1) + random.uniform(-0.1, 0.1))
            val_loss = max(0.6, 2.2 / (epoch + 1) + random.uniform(-0.1, 0.1))
            accuracy = min(0.95, 0.5 + epoch * 0.09 + random.uniform(-0.02, 0.02))

            mlflow.log_metrics({
                "train_loss": train_loss,
                "val_loss": val_loss,
                "accuracy": accuracy,
            }, step=epoch)
            print(f"  epoch {epoch}: train={train_loss:.4f}, val={val_loss:.4f}, acc={accuracy:.3f}")
            time.sleep(0.2)

        # Log a dummy artifact (real: model weights)
        artifact_path = "./artifacts"
        import os
        os.makedirs(artifact_path, exist_ok=True)
        with open(f"{artifact_path}/config.json", "w") as f:
            json.dump(params, f)
        mlflow.log_artifact(artifact_path)
        print(f"Logged artifacts from {artifact_path}")

        # Register the run
        run_id = mlflow.active_run().info.run_id
        print(f"\nāœ… Run complete! run_id: {run_id}")
        print(f"   View: mlflow ui --backend-store-uri sqlite:///mlruns.db")

        return run_id

if __name__ == "__main__":
    train_with_tracking()

    # View runs
    print("\nšŸ“Š Recent runs:")
    runs = mlflow.search_runs(experiment_names=["ai-system-design"])
    for _, run in runs.iterrows():
        print(f"   {run['run_id'][:8]}... acc={run.get('metrics.accuracy', 'N/A')}")

View the MLflow UI:

mlflow ui --backend-store-uri sqlite:///mlruns.db --port 5002

Questions for Ayva: - How to integrate MLflow with distributed training (autologging)? - What's the best practice for model registry promotion workflow? - When should you use MLflow vs W&B vs Neptune?


Key Takeaways

  • Experiment tracking is non-negotiable for reproducible ML — log everything
  • MLflow with SQLite is zero-infrastructure and sufficient for most teams
  • Track: params, metrics, artifacts, source code, environment
  • Model registry (staging → prod transitions) enables structured deployment

References