Day 30: Final Project: Design a Production AI System
Learning Objectives
- Synthesise everything from Days 1-29 into a coherent system design
- Write a one-page architecture document for a production AI system
- Present and defend your design decisions
Theory (15 min)
What You Should Know After 30 Days
By now you understand:
- How models serve inference (prefill, decode, batching, streaming)
- How to optimise (quantisation, caching, offloading)
- How to scale (horizontal, load balancing, canary deployments)
- How to observe (metrics, logs, traces)
- How to protect (guardrails, rate limiting, safety)
- How to manage (state, adapters, cost)
- How real systems do it (ChatGPT, Perplexity, Copilot)
The Final Project
Design a production AI system from scratch.
Pick one problem from below (or your own idea):
- Personal AI Research Assistant โ ingests your Obsidian vault + web search, answers questions with citations
- Multi-Model API Gateway โ routes requests to appropriate model (1.5B/3B/7B/API) based on task difficulty, with caching, rate limiting, and cost tracking
- Local Code Copilot โ code completion via FIM, context-aware file selection, multi-language support
- Document Q&A Service โ upload PDFs/markdown, get answers with citations from your documents
Architecture Document Format
Title: [System Name]
Date: [Today]
1. Overview (3 sentences)
What does this system do? Who uses it? What's the key design goal?
2. Architecture Diagram (ASCII or Mermaid)
Client โ [Component] โ [Component] โ Model โ Response
3. Component Breakdown
- Component 1: responsibility, technology, why this choice
- Component 2: ...
...
4. Key Design Decisions (3-5 decisions with rationale)
- Why this model? Why this cache strategy? Why this routing?
5. Data Flow (step by step)
1. User types query โ 2. Gateway receives โ 3. Rate limiter checks โ ...
6. Scaling Plan
- At 10 users, at 100 users, at 10,000 users
- Where does it break? What's the next upgrade?
7. Cost Estimate
- Hardware, tokens, electricity per month
- Cost-per-query at each scale tier
Hands-on (15 min)
Write Your Architecture Document
#!/usr/bin/env python3
"""design-doc-generator.py โ framework for the final project design document."""
# Stub โ Ayva will expand with:
# - Multiple design examples (Research Assistant, API Gateway, Code Copilot)
# - Architecture diagram generation (Mermaid.js)
# - Tradeoff analysis for common decisions
# - Review checklist for design documents
# - Real-world architecture reviews (compare your decisions to known systems)
SYSTEM_SKELETON = """
# ๐ ARCHITECTURE DOCUMENT
## System: {system_name}
## Author: Vijay
## Date: {date}
---
## 1. Overview
{overview}
## 2. Architecture Diagram
```mermaid
graph LR
Client -->|HTTP| Gateway
Gateway --> RateLimiter
RateLimiter --> Router
Router -->|simple| SmallModel
Router -->|complex| BigModel
Router -->|expert| ExternalAPI
SmallModel --> Response
BigModel --> Response
ExternalAPI --> Response
Gateway --> Cache
Cache --> Response
Response --> Client
3. Component Breakdown
Gateway
- Technology: {gateway_tech}
- Responsibility: Single entry point, auth, logging
- Why: {gateway_rationale}
Rate Limiter
- Algorithm: {rate_algorithm}
- Tiers: {rate_tiers}
- Why: {rate_rationale}
Model Router
- Strategy: {router_strategy}
- Models: {router_models}
- Why: {router_rationale}
Cache
- Type: {cache_type}
- TTL: {cache_ttl}
- Why: {cache_rationale}
Inference
- Engine: {inference_engine}
- Quantisation: {inference_quant}
- Batch mode: {inference_batch}
4. Key Design Decisions
| Decision | Option A | Option B | Chosen | Rationale |
|---|---|---|---|---|
| {d1_name} | {d1_a} | {d1_b} | {d1_chosen} | {d1_rationale} |
| {d2_name} | {d2_a} | {d2_b} | {d2_chosen} | {d2_rationale} |
5. Data Flow
- {step_1}
- {step_2}
- {step_3}
- {step_4}
- {step_5}
- {step_6}
6. Scaling Plan
| Scale | Users | Queries/Day | Architecture | Cost/Month |
|---|---|---|---|---|
| Dev | 1 | 5,000 | Single server | ${cost_dev} |
| Team | 10 | 50,000 | + cache layer | ${cost_team} |
| Org | 100 | 500,000 | + load balancer | ${cost_org} |
| Prod | 10,000 | 50M | Kubernetes cluster | ${cost_prod} |
Bottleneck at each stage: {bottleneck_analysis}
7. Cost Estimate
- Server: {server_cost}
- Electricity: {power_cost}
- API calls (if any): {api_cost}
- Storage: {storage_cost}
- Total: {total_cost}
- Cost per query: {cpq} """
print("๐ Final Project โ Architecture Design Document\n") print("Pick one of these scenarios and write a 1-page architecture doc:\n")
scenarios = [ ("๐งช Personal AI Research Assistant", "Ingest Obsidian vault (2K files) + web search. Answer questions with citations.", "Day 10 (RAG) + Day 26 (Perplexity)"), ("๐ Multi-Model API Gateway", "Route requests to 1.5B/3B/7B/API based on task difficulty. Cache + rate limit + cost track.", "Day 4 (Router) + Day 6 (Rate Limiter) + Day 7 (Gateway) + Day 28 (Cost)"), ("๐ป Local Code Copilot", "FIM code completion via llama.cpp. Context-aware file selection.", "Day 27 (Copilot) + Day 5 (Context)"), ("๐ Document Q&A Service", "Upload PDF/markdown โ RAG pipeline โ answers with source citations.", "Day 8-10 (RAG) + Day 14 (Mini-project)"), ]
for name, desc, refs in scenarios: print(f" {name}") print(f" {desc}") print(f" ๐ฏ References: {refs}") print()
print("Use the skeleton above to structure your design document.") print("Save to: ~/ai-system-design-final-doc.md")
### Deliverable
Save your design doc to a file:
```bash
touch /opt/data/ai-system-design-final-doc.md
echo "# My AI System Design โ Final Project" > /opt/data/ai-system-design-final-doc.md
Then review: Open the doc you built on Day 1 (your stack map). Compare it to this final design. How far you've come is the real metric.
Key Takeaways
- You now understand every layer of a production AI system โ from inference internals to cost economics
- The patterns are universal: the same architecture powers ChatGPT, Perplexity, Copilot, and your VPS
- The skill is knowing which tradeoff to make, not knowing "the right answer"
- Keep the design doc โ it's a portfolio piece showing you can architect AI systems