Day 30: Final Project: Design a Production AI System

📂 Production & Case Studies 📖 15 min read Needs expansion

Learning Objectives

Synthesise everything from Days 1-29 into a coherent system design
Write a one-page architecture document for a production AI system
Present and defend your design decisions

Theory (15 min)

What You Should Know After 30 Days

By now you understand:

How models serve inference (prefill, decode, batching, streaming)
How to optimise (quantisation, caching, offloading)
How to scale (horizontal, load balancing, canary deployments)
How to observe (metrics, logs, traces)
How to protect (guardrails, rate limiting, safety)
How to manage (state, adapters, cost)
How real systems do it (ChatGPT, Perplexity, Copilot)

The Final Project

Design a production AI system from scratch.

Pick one problem from below (or your own idea):

Personal AI Research Assistant — ingests your Obsidian vault + web search, answers questions with citations
Multi-Model API Gateway — routes requests to appropriate model (1.5B/3B/7B/API) based on task difficulty, with caching, rate limiting, and cost tracking
Local Code Copilot — code completion via FIM, context-aware file selection, multi-language support
Document Q&A Service — upload PDFs/markdown, get answers with citations from your documents

Architecture Document Format

Title: [System Name]
Date: [Today]

1. Overview (3 sentences)
   What does this system do? Who uses it? What's the key design goal?

2. Architecture Diagram (ASCII or Mermaid)
   Client → [Component] → [Component] → Model → Response

3. Component Breakdown
   - Component 1: responsibility, technology, why this choice
   - Component 2: ...
   ...

4. Key Design Decisions (3-5 decisions with rationale)
   - Why this model? Why this cache strategy? Why this routing?

5. Data Flow (step by step)
   1. User types query → 2. Gateway receives → 3. Rate limiter checks → ...

6. Scaling Plan
   - At 10 users, at 100 users, at 10,000 users
   - Where does it break? What's the next upgrade?

7. Cost Estimate
   - Hardware, tokens, electricity per month
   - Cost-per-query at each scale tier

Hands-on (15 min)

Write Your Architecture Document

#!/usr/bin/env python3
"""design-doc-generator.py — framework for the final project design document."""

# Stub — Ayva will expand with:
# - Multiple design examples (Research Assistant, API Gateway, Code Copilot)
# - Architecture diagram generation (Mermaid.js)
# - Tradeoff analysis for common decisions
# - Review checklist for design documents
# - Real-world architecture reviews (compare your decisions to known systems)

SYSTEM_SKELETON = """
# 🏗 ARCHITECTURE DOCUMENT

## System: {system_name}
## Author: Vijay
## Date: {date}

---

## 1. Overview
{overview}

## 2. Architecture Diagram

```mermaid
graph LR
    Client -->|HTTP| Gateway
    Gateway --> RateLimiter
    RateLimiter --> Router
    Router -->|simple| SmallModel
    Router -->|complex| BigModel
    Router -->|expert| ExternalAPI
    SmallModel --> Response
    BigModel --> Response
    ExternalAPI --> Response
    Gateway --> Cache
    Cache --> Response
    Response --> Client

3. Component Breakdown

Gateway

Technology: {gateway_tech}
Responsibility: Single entry point, auth, logging
Why: {gateway_rationale}

Rate Limiter

Algorithm: {rate_algorithm}
Tiers: {rate_tiers}
Why: {rate_rationale}

Model Router

Strategy: {router_strategy}
Models: {router_models}
Why: {router_rationale}

Cache

Type: {cache_type}
TTL: {cache_ttl}
Why: {cache_rationale}

Inference

Engine: {inference_engine}
Quantisation: {inference_quant}
Batch mode: {inference_batch}

4. Key Design Decisions

Decision	Option A	Option B	Chosen	Rationale
{d1_name}	{d1_a}	{d1_b}	{d1_chosen}	{d1_rationale}
{d2_name}	{d2_a}	{d2_b}	{d2_chosen}	{d2_rationale}

5. Data Flow

{step_1}
{step_2}
{step_3}
{step_4}
{step_5}
{step_6}

6. Scaling Plan

Scale	Users	Queries/Day	Architecture	Cost/Month
Dev	1	5,000	Single server	${cost_dev}
Team	10	50,000	+ cache layer	${cost_team}
Org	100	500,000	+ load balancer	${cost_org}
Prod	10,000	50M	Kubernetes cluster	${cost_prod}

Bottleneck at each stage: {bottleneck_analysis}

7. Cost Estimate

Server: {server_cost}
Electricity: {power_cost}
API calls (if any): {api_cost}
Storage: {storage_cost}
Total: {total_cost}
Cost per query: {cpq} """

print("📝 Final Project — Architecture Design Document\n") print("Pick one of these scenarios and write a 1-page architecture doc:\n")

scenarios = [ ("🧪 Personal AI Research Assistant", "Ingest Obsidian vault (2K files) + web search. Answer questions with citations.", "Day 10 (RAG) + Day 26 (Perplexity)"), ("🚀 Multi-Model API Gateway", "Route requests to 1.5B/3B/7B/API based on task difficulty. Cache + rate limit + cost track.", "Day 4 (Router) + Day 6 (Rate Limiter) + Day 7 (Gateway) + Day 28 (Cost)"), ("💻 Local Code Copilot", "FIM code completion via llama.cpp. Context-aware file selection.", "Day 27 (Copilot) + Day 5 (Context)"), ("📚 Document Q&A Service", "Upload PDF/markdown → RAG pipeline → answers with source citations.", "Day 8-10 (RAG) + Day 14 (Mini-project)"), ]

for name, desc, refs in scenarios: print(f" {name}") print(f" {desc}") print(f" 🎯 References: {refs}") print()

print("Use the skeleton above to structure your design document.") print("Save to: ~/ai-system-design-final-doc.md")

### Deliverable

Save your design doc to a file:

```bash
touch /opt/data/ai-system-design-final-doc.md
echo "# My AI System Design — Final Project" > /opt/data/ai-system-design-final-doc.md

Then review: Open the doc you built on Day 1 (your stack map). Compare it to this final design. How far you've come is the real metric.

Key Takeaways

You now understand every layer of a production AI system — from inference internals to cost economics
The patterns are universal: the same architecture powers ChatGPT, Perplexity, Copilot, and your VPS
The skill is knowing which tradeoff to make, not knowing "the right answer"
Keep the design doc — it's a portfolio piece showing you can architect AI systems