IndexZero. Build a search engine from scratch in Python.
You work with AI systems every day. But search, the retrieval layer that everything depends on, is still a black box.
IndexZero is a course that walks you through building a working search engine, step by step, in Python. You start with raw text and end with a FastAPI endpoint serving ranked results from your own code.
What you'll build
- Build BM25 from scratch and understand why it still beats neural models in production
- Write evaluation harnesses that tell you when your system gets worse, not just better
- Debug RAG retrieval failures at the index level instead of guessing at the prompt
- Implement vector search and hybrid ranking without wrapping a black-box library
- Ship a working FastAPI endpoint that serves search results from your own code
- Walk away with a mental model of retrieval that makes every search decision clearer
The course
Three parts. Each one changes the system.
Part 1: Language into structure (M0-M1). Tokenization, inverted index, boolean retrieval. How raw text becomes something you can query. 5-8 hours. M0 and M1 are open access.
Part 2: Structure into ranking (M2-M4). TF-IDF, BM25, vector embeddings. How relevance scores emerge from term statistics and dense representations. 12-16 hours.
Part 3: Ranking into production (M5-M9). Approximate nearest neighbors, hybrid search, evaluation methodology, FastAPI endpoint. How a ranked list becomes a production system. 15-20 hours.
Total: 20-30 hours of focused work across 9 modules. One codebase that grows with each step.
Who this is for
You should take this course if:
- You build RAG pipelines and want to understand why retrieval fails at the index level
- You work on product search or enterprise search and need deeper intuition for ranking
- You are a senior engineer who has used Elasticsearch or Vespa and wants to know what happens inside
- You want a portfolio piece that demonstrates systems thinking, not another tutorial project
Office hours
I run occasional live office hours on Discord for people working through the material. No fixed schedule. I announce them a week ahead.
Join the Discord (link coming soon)
The code
The course lives at github.com/caprion/indexzero-v2. Each module is a directory with code, README, and exercises. Python, a terminal, and curiosity are enough. This is not a video course. It is code you read, run, modify, and break. The best way to understand a search engine is to build one.
Built by Sumit Garg. I spent years building search infrastructure at Microsoft, working on the systems behind Azure AI Search.