๐Ÿง  AI System Design

Day 30: Final Project: Design a Production AI System

๐Ÿ“‚ Production & Case Studies ๐Ÿ“– 15 min read Needs expansion

Learning Objectives

  • Synthesise everything from Days 1-29 into a coherent system design
  • Write a one-page architecture document for a production AI system
  • Present and defend your design decisions

Theory (15 min)

What You Should Know After 30 Days

By now you understand:

  • How models serve inference (prefill, decode, batching, streaming)
  • How to optimise (quantisation, caching, offloading)
  • How to scale (horizontal, load balancing, canary deployments)
  • How to observe (metrics, logs, traces)
  • How to protect (guardrails, rate limiting, safety)
  • How to manage (state, adapters, cost)
  • How real systems do it (ChatGPT, Perplexity, Copilot)

The Final Project

Design a production AI system from scratch.

Pick one problem from below (or your own idea):

  1. Personal AI Research Assistant โ€” ingests your Obsidian vault + web search, answers questions with citations
  2. Multi-Model API Gateway โ€” routes requests to appropriate model (1.5B/3B/7B/API) based on task difficulty, with caching, rate limiting, and cost tracking
  3. Local Code Copilot โ€” code completion via FIM, context-aware file selection, multi-language support
  4. Document Q&A Service โ€” upload PDFs/markdown, get answers with citations from your documents

Architecture Document Format

Title: [System Name]
Date: [Today]

1. Overview (3 sentences)
   What does this system do? Who uses it? What's the key design goal?

2. Architecture Diagram (ASCII or Mermaid)
   Client โ†’ [Component] โ†’ [Component] โ†’ Model โ†’ Response

3. Component Breakdown
   - Component 1: responsibility, technology, why this choice
   - Component 2: ...
   ...

4. Key Design Decisions (3-5 decisions with rationale)
   - Why this model? Why this cache strategy? Why this routing?

5. Data Flow (step by step)
   1. User types query โ†’ 2. Gateway receives โ†’ 3. Rate limiter checks โ†’ ...

6. Scaling Plan
   - At 10 users, at 100 users, at 10,000 users
   - Where does it break? What's the next upgrade?

7. Cost Estimate
   - Hardware, tokens, electricity per month
   - Cost-per-query at each scale tier

Hands-on (15 min)

Write Your Architecture Document

#!/usr/bin/env python3
"""design-doc-generator.py โ€” framework for the final project design document."""

# Stub โ€” Ayva will expand with:
# - Multiple design examples (Research Assistant, API Gateway, Code Copilot)
# - Architecture diagram generation (Mermaid.js)
# - Tradeoff analysis for common decisions
# - Review checklist for design documents
# - Real-world architecture reviews (compare your decisions to known systems)

SYSTEM_SKELETON = """
# ๐Ÿ— ARCHITECTURE DOCUMENT

## System: {system_name}
## Author: Vijay
## Date: {date}

---

## 1. Overview
{overview}

## 2. Architecture Diagram

```mermaid
graph LR
    Client -->|HTTP| Gateway
    Gateway --> RateLimiter
    RateLimiter --> Router
    Router -->|simple| SmallModel
    Router -->|complex| BigModel
    Router -->|expert| ExternalAPI
    SmallModel --> Response
    BigModel --> Response
    ExternalAPI --> Response
    Gateway --> Cache
    Cache --> Response
    Response --> Client

3. Component Breakdown

Gateway

  • Technology: {gateway_tech}
  • Responsibility: Single entry point, auth, logging
  • Why: {gateway_rationale}

Rate Limiter

  • Algorithm: {rate_algorithm}
  • Tiers: {rate_tiers}
  • Why: {rate_rationale}

Model Router

  • Strategy: {router_strategy}
  • Models: {router_models}
  • Why: {router_rationale}

Cache

  • Type: {cache_type}
  • TTL: {cache_ttl}
  • Why: {cache_rationale}

Inference

  • Engine: {inference_engine}
  • Quantisation: {inference_quant}
  • Batch mode: {inference_batch}

4. Key Design Decisions

Decision Option A Option B Chosen Rationale
{d1_name} {d1_a} {d1_b} {d1_chosen} {d1_rationale}
{d2_name} {d2_a} {d2_b} {d2_chosen} {d2_rationale}

5. Data Flow

  1. {step_1}
  2. {step_2}
  3. {step_3}
  4. {step_4}
  5. {step_5}
  6. {step_6}

6. Scaling Plan

Scale Users Queries/Day Architecture Cost/Month
Dev 1 5,000 Single server ${cost_dev}
Team 10 50,000 + cache layer ${cost_team}
Org 100 500,000 + load balancer ${cost_org}
Prod 10,000 50M Kubernetes cluster ${cost_prod}

Bottleneck at each stage: {bottleneck_analysis}

7. Cost Estimate

  • Server: {server_cost}
  • Electricity: {power_cost}
  • API calls (if any): {api_cost}
  • Storage: {storage_cost}
  • Total: {total_cost}
  • Cost per query: {cpq} """

print("๐Ÿ“ Final Project โ€” Architecture Design Document\n") print("Pick one of these scenarios and write a 1-page architecture doc:\n")

scenarios = [ ("๐Ÿงช Personal AI Research Assistant", "Ingest Obsidian vault (2K files) + web search. Answer questions with citations.", "Day 10 (RAG) + Day 26 (Perplexity)"), ("๐Ÿš€ Multi-Model API Gateway", "Route requests to 1.5B/3B/7B/API based on task difficulty. Cache + rate limit + cost track.", "Day 4 (Router) + Day 6 (Rate Limiter) + Day 7 (Gateway) + Day 28 (Cost)"), ("๐Ÿ’ป Local Code Copilot", "FIM code completion via llama.cpp. Context-aware file selection.", "Day 27 (Copilot) + Day 5 (Context)"), ("๐Ÿ“š Document Q&A Service", "Upload PDF/markdown โ†’ RAG pipeline โ†’ answers with source citations.", "Day 8-10 (RAG) + Day 14 (Mini-project)"), ]

for name, desc, refs in scenarios: print(f" {name}") print(f" {desc}") print(f" ๐ŸŽฏ References: {refs}") print()

print("Use the skeleton above to structure your design document.") print("Save to: ~/ai-system-design-final-doc.md")

### Deliverable

Save your design doc to a file:

```bash
touch /opt/data/ai-system-design-final-doc.md
echo "# My AI System Design โ€” Final Project" > /opt/data/ai-system-design-final-doc.md

Then review: Open the doc you built on Day 1 (your stack map). Compare it to this final design. How far you've come is the real metric.


Key Takeaways

  • You now understand every layer of a production AI system โ€” from inference internals to cost economics
  • The patterns are universal: the same architecture powers ChatGPT, Perplexity, Copilot, and your VPS
  • The skill is knowing which tradeoff to make, not knowing "the right answer"
  • Keep the design doc โ€” it's a portfolio piece showing you can architect AI systems

References