🧠 AI System Design Architect
A 30-day, 30-minutes/day curriculum. Each day is 15 minutes of theory + 15 minutes of hands-on, designed to be run on your own infrastructure.
Week 1: Foundations
7 daysCore architectural patterns — the vocabulary of AI systems
Day 1
ML System Blueprint
The three pillars: compute, storage, orchestration
Day 2
Sync vs Async Inference
Request-response, batch, streaming — tradeoffs in latency, throughput, cost
Day 3
Caching Strategies
Semantic caching, KV-cache reuse, prompt caching
Day 4
Load Balancing & Routing
Round-robin, least-connections, semantic routers
Day 5
Stateless vs Stateful Inference
KV cache, conversation history, managing state at scale
Day 6
Rate Limiting & Quotas
Token bucket, sliding window, user-tier enforcement
Day 7
Mini-Project — AI Gateway
Containerise your proxy, cache, and rate limiter into a unified gateway
Week 2: Data & Training
7 daysPipelines, training infra, experiment tracking
Day 8
Data Pipelines
Extract, transform, embed, store — batch vs streaming
Day 9
Vector Databases
Index types (IVF, HNSW), tradeoffs, hybrid search
Day 10
RAG Architecture
Ingestion pipeline, retriever, reranker, generator
Day 11
Distributed Training 101
Data parallelism, model parallelism, pipeline parallelism
Day 12
Checkpointing & Fault Tolerance
Save/restore training state, preemption handling, spot instances
Day 13
Experiment Tracking
Structured logging, hyperparameter sweeps, model registry
Day 14
Mini-Project — End-to-End RAG
Integrate pipeline + vector DB + LLM into a complete RAG system
Week 3: Serving & Inference
7 daysOptimising models in production
Day 15
Inference Optimization
Quantization (GGUF, GPTQ, AWQ), throughput, quality tradeoffs
Day 16
Continuous Batching & Speculative Decoding
How vLLM/TGI achieve high throughput
Day 17
Prefill vs Decode
The two phases of transformer inference
Day 18
GPU vs CPU Offloading
Layer placement, PCIe bottlenecks, memory hierarchy
Day 19
Streaming & SSE
Token streaming fan-out, head-of-line blocking prevention
Day 20
Model Adapters & LoRA
Adapter swapping, multi-task serving, parameter-efficient fine-tuning
Day 21
Mini-Project — Inference Benchmark Suite
Script that sweeps parameters and produces a comparison table
Week 4: Production & Case Studies
9 daysObservability, safety, real-world systems
Day 22
Observability
Metrics, traces, logs — the three pillars for AI systems
Day 23
Guardrails & Safety
Input/output filtering, PII detection, prompt injection defense
Day 24
A/B Testing & Canary Deployments
Shadow traffic, gradual rollout, automated rollback
Day 25
Case Study: ChatGPT
The infrastructure behind a global chat product
Day 26
Case Study: Perplexity
Real-time web search + RAG at scale
Day 27
Case Study: GitHub Copilot
Context window management, code-specific embeddings, fast completion
Day 28
Cost Engineering
Token economics, cache hit rates, model selection by task difficulty
Day 29
Scaling Law Intuition
How data/compute affects system cost — hardware vs optimisation
Day 30
Final Project: Design a Production AI System
End-to-end architecture design, from concept to deployment