Index Zero

IndexZero. Build a search engine from scratch in Python.

You work with AI systems every day. But search, the retrieval layer that everything depends on, is still a black box.

IndexZero is a course that walks you through building a working search engine, step by step, in Python. You start with raw text and end with a FastAPI endpoint serving ranked results from your own code.

What you'll build

The course

Three parts. Each one changes the system.

Part 1: Language into structure (M0-M1). Tokenization, inverted index, boolean retrieval. How raw text becomes something you can query. 5-8 hours. M0 and M1 are open access.

Part 2: Structure into ranking (M2-M4). TF-IDF, BM25, vector embeddings. How relevance scores emerge from term statistics and dense representations. 12-16 hours.

Part 3: Ranking into production (M5-M9). Approximate nearest neighbors, hybrid search, evaluation methodology, FastAPI endpoint. How a ranked list becomes a production system. 15-20 hours.

Total: 20-30 hours of focused work across 9 modules. One codebase that grows with each step.

Who this is for

You should take this course if:

Office hours

I run occasional live office hours on Discord for people working through the material. No fixed schedule. I announce them a week ahead.

Join the Discord (link coming soon)

The code

The course lives at github.com/caprion/indexzero-v2. Each module is a directory with code, README, and exercises. Python, a terminal, and curiosity are enough. This is not a video course. It is code you read, run, modify, and break. The best way to understand a search engine is to build one.

Built by Sumit Garg. I spent years building search infrastructure at Microsoft, working on the systems behind Azure AI Search.