Day 10: RAG Architecture

📂 Data & Training 📖 15 min read Needs expansion

Learning Objectives

Understand the full RAG pipeline: retrieve → rerank → generate
Learn the critical design decisions (chunking, top-k, reranking)
Build a RAG pipeline over your Obsidian vault

Theory (15 min)

The RAG Pipeline

RAG = Retrieval-Augmented Generation. Instead of asking the LLM to know everything, give it relevant documents at query time.

Query ──▶ Embedder ──▶ Vector Search ──▶ Top-K chunks
                               │
                               ▼
                    ┌─────────────────────┐
                    │    Reranker         │── Reorder by relevance
                    └─────────────────────┘
                               │
                               ▼
                    ┌─────────────────────┐
                    │  Prompt Builder     │── System + context + query
                    └─────────────────────┘
                               │
                               ▼
                           LLM ──▶ Answer

The Critical Knobs

1. Chunking strategy - Fixed-size: simple, but may split concepts - Semantic: split at natural boundaries (paragraphs, sections, H2 headers) - Agentic: use an LLM to decide where to split (expensive but best quality)

2. Top-K (how many chunks to retrieve) - Too few: missing information - Too many: context window overflow, distraction for the LLM - Sweet spot: 3-10 chunks depending on domain

3. Reranking The retriever uses cheap/fast embeddings. The reranker uses a cross-encoder (more expensive but more accurate) to reorder the top results.

Retriever finds 20 docs (fast, cheap embedding search)
Reranker reorders top 20 → picks best 5 (slow, accurate cross-encoder)

4. Prompt template How you present retrieved context matters enormously:

Good: "Answer based on the following context. If the context doesn't contain
       the answer, say 'I don't know'. Here is the context: {context}"
Bad:  "Here's some stuff: {context}. Now answer: {query}"

Common RAG Failure Modes

Failure	Cause	Fix
Missing context	Retrieval missed relevant docs	Increase top-K, hybrid search
Irrelevant context	Noise in retrieval	Reranker, better chunking
LLM ignores context	Strong model priors	Better prompt engineering
Lost-in-the-middle	Context in middle of prompt	Put most relevant docs at start/end

Hands-on (15 min)

Build RAG Over Your Obsidian Vault

#!/usr/bin/env python3
"""obsidian-rag.py — RAG over your Obsidian vault."""
import os
from pathlib import Path

# Stub — Ayva will expand with:
# - Walk Obsidian vault (/opt/obsidian-vault)
# - Parse markdown frontmatter + body
# - Chunk by headers (smart splitting)
# - Embed with local model (llama.cpp embeddings or sentence-transformers)
# - Index in Qdrant/Chroma
# - Retriever with configurable top-K
# - Cross-encoder reranker
# - Prompt builder that formats context nicely
# - Answer generation via LLM

VAULT_PATH = Path("/opt/obsidian-vault")

def scan_vault():
    files = list(VAULT_PATH.rglob("*.md"))
    print(f"Found {len(files)} markdown files in vault")
    # Show folder breakdown
    folders = {}
    for f in files:
        folder = f.parent.relative_to(VAULT_PATH)
        folders[folder] = folders.get(folder, 0) + 1
    for folder, count in sorted(folders.items()):
        print(f"  📁 {folder}: {count} files")
    return files

if __name__ == "__main__":
    if VAULT_PATH.exists():
        scan_vault()
    else:
        print(f"Vault at {VAULT_PATH} not found — run with your path")

Questions for Ayva: - Best chunking strategy for Obsidian notes (headers, tags, frontmatter)? - How to handle file updates (incremental indexing vs full re-index)? - Optimal prompt template for answering from personal notes?

Key Takeaways

RAG pipelines are the most practical way to ground LLM responses in your own data
The three stages (retrieve, rerank, generate) each have critical configuration knobs
Reranking is the most impactful accuracy improvement per compute cost
Prompt engineering is essential — the LLM needs clear instructions on how to use context

🧠 AI System Design