AI System Design — 30-Day Portal

🧠 AI System Design Architect

A 30-day, 30-minutes/day curriculum. Each day is 15 minutes of theory + 15 minutes of hands-on, designed to be run on your own infrastructure.

📆 30 days ⏱ 30 min/day 7 ready 23 to expand

Week 1: Foundations

7 days

Core architectural patterns — the vocabulary of AI systems

Day 1

Week 2: Data & Training

7 days

Pipelines, training infra, experiment tracking

Day 8

Data Pipelines

Extract, transform, embed, store — batch vs streaming

Stub

Day 9

Vector Databases

Index types (IVF, HNSW), tradeoffs, hybrid search

Stub

Day 10

RAG Architecture

Ingestion pipeline, retriever, reranker, generator

Stub

Day 11

Distributed Training 101

Data parallelism, model parallelism, pipeline parallelism

Stub

Day 12

Checkpointing & Fault Tolerance

Save/restore training state, preemption handling, spot instances

Stub

Day 13

Experiment Tracking

Structured logging, hyperparameter sweeps, model registry

Stub

Day 14

Mini-Project — End-to-End RAG

Integrate pipeline + vector DB + LLM into a complete RAG system

Stub

Week 3: Serving & Inference

7 days

Optimising models in production

Day 15

Inference Optimization

Quantization (GGUF, GPTQ, AWQ), throughput, quality tradeoffs

Stub

Day 16

Continuous Batching & Speculative Decoding

How vLLM/TGI achieve high throughput

Stub

Day 17

Prefill vs Decode

The two phases of transformer inference

Stub

Day 18

GPU vs CPU Offloading

Layer placement, PCIe bottlenecks, memory hierarchy

Stub

Day 19

Streaming & SSE

Token streaming fan-out, head-of-line blocking prevention

Stub

Day 20

Model Adapters & LoRA

Adapter swapping, multi-task serving, parameter-efficient fine-tuning

Stub

Day 21

Mini-Project — Inference Benchmark Suite

Script that sweeps parameters and produces a comparison table

Stub

Week 4: Production & Case Studies

9 days

Observability, safety, real-world systems

Day 22

Observability

Metrics, traces, logs — the three pillars for AI systems

Stub

Day 23

Guardrails & Safety

Input/output filtering, PII detection, prompt injection defense

Stub

Day 24

A/B Testing & Canary Deployments

Shadow traffic, gradual rollout, automated rollback

Stub

Day 25

Case Study: ChatGPT

The infrastructure behind a global chat product

Stub

Day 26

Case Study: Perplexity

Real-time web search + RAG at scale

Stub

Day 27

Case Study: GitHub Copilot

Context window management, code-specific embeddings, fast completion

Stub

Day 28

Cost Engineering

Token economics, cache hit rates, model selection by task difficulty

Stub

Day 29

Scaling Law Intuition

How data/compute affects system cost — hardware vs optimisation

Stub

Day 30

Final Project: Design a Production AI System

End-to-end architecture design, from concept to deployment

Stub

🧠 AI System Design Architect

Week 1: Foundations

ML System Blueprint

Sync vs Async Inference

Caching Strategies

Load Balancing & Routing

Stateless vs Stateful Inference

Rate Limiting & Quotas

Mini-Project — AI Gateway

Week 2: Data & Training

Data Pipelines

Vector Databases

RAG Architecture

Distributed Training 101

Checkpointing & Fault Tolerance

Experiment Tracking

Mini-Project — End-to-End RAG

Week 3: Serving & Inference

Inference Optimization

Continuous Batching & Speculative Decoding

Prefill vs Decode

GPU vs CPU Offloading

Streaming & SSE

Model Adapters & LoRA

Mini-Project — Inference Benchmark Suite

Week 4: Production & Case Studies

Observability

Guardrails & Safety

A/B Testing & Canary Deployments

Case Study: ChatGPT

Case Study: Perplexity

Case Study: GitHub Copilot

Cost Engineering

Scaling Law Intuition

Final Project: Design a Production AI System