track — project series

Build the SQLite
of vector databases

Every app wants on-device semantic search, photo similarity, local RAG — AI features without a network call. There is no good embedded vector database. We are going to build one, from scratch, and understand every decision along the way.

Server-side vector databases are designed for a different world — abundant RAM, persistent network connections, dedicated hardware. The moment you embed one in a mobile app, every assumption breaks. This track is about building for the environment that actually exists on a user's device.

17 lessons
intermediate → advanced
Python → C
~7 hours
This track is being built

You need to be comfortable writing code and know basic linear algebra (dot products, what a vector is). No prior database, mobile, or ML knowledge required.

The environment you are building for

RAM budget
256 MB – 1 GB
float32 vectors don't fit — quantization is not optional
Network
none
every query is local — no round trip, no fallback
Process model
embedded library
no server, no daemon — linked directly into the app
Cold start
must be fast
queries must work before the index is fully in memory
Distribution
single file
the database ships with your app, like a SQLite file
Writers
one, rarely
no distributed consensus — reads dominate everything

These constraints are not limitations to apologise for. They are the source of every interesting design decision in this track. You will learn more about memory, storage, and query execution by building for a phone than you ever would building for a server.

What you will enable

on-device photo search local semantic search offline RAG face similarity document similarity on-device recommendations private AI features

What you will build

Part I
Naive foundation

Flat array, linear scan, exact cosine distance. Correct, slow, your baseline forever.

Part II
Storage & mmap

Binary layout, single-file format, mmap for zero-copy reads and fast cold starts.

Part III
Quantization

Scalar and product quantization. Fitting a million vectors into the RAM you actually have.

Part IV
Filtering

Inverted indexes, compiled predicates, pre- vs post-filter tradeoffs at mobile scale.

Part V
Approximate search

HNSW built from scratch, tuned for read-heavy embedded workloads.

Part VI
The C library

Rewrite in C. Clean API. Something you could actually ship inside a mobile app.

The final artifact is a C library with a clean API — something you could actually ship inside a mobile app. Not a demo. Not a proof of concept. Something real.

vdb_t *db = vdb_open("photos.vdb");
vdb_result_t *r = vdb_search(db, embedding, 10);

Lessons

Part I — Naive Foundation
lesson 01 · Python
The problem
What similarity search is, why you can't use a B-tree, and what makes every existing vector database the wrong tool for an embedded app.
lesson 02 · Python
Vector distance from scratch
Dot product, cosine similarity, L2. Implement all three. Understand what each one measures and when to pick it.
lesson 03 · Python
Brute-force search
O(n·d) scan — check every vector. This is your correctness baseline. Everything built later is measured against it.
Part II — Storage & mmap
lesson 04 · Python
Binary layout
When the schema is known at open time, records are fixed-size. Memory layout computable once — not per read. How this eliminates parsing overhead entirely.
lesson 05 · Python
Single-file format
File header, append-only record segments, the index as a contiguous block. A format you can hand someone as a file attachment and have work immediately.
lesson 06 · C
Memory-mapped files
mmap the index. Zero-copy reads. What the OS actually does when you fault a page in. Why this is the right model for a cold-start database.
Part III — Quantization
lesson 07 · Python
Why float32 doesn't fit
1M vectors × 512 dimensions × 4 bytes = 2 GB. The memory math that makes quantization unavoidable on every device that isn't a server.
lesson 08 · Python
Scalar quantization
Compress float32 to int8. Calibration, scale factors, reconstruction error. 4× memory reduction with a small, measurable recall cost.
lesson 09 · Python
Product quantization
Split vectors into subspaces, quantize each independently. Build a codebook. How PQ codes get you to 32× compression and asymmetric distance computation.
Part IV — Filtering
lesson 10 · Python
Pre-filter vs post-filter
Filter before the vector search or after it. The tradeoff is not obvious — selectivity determines the winner. How to measure which strategy to use.
lesson 11 · C
Inverted indexes & compiled predicates
Build an inverted index for scalar fields. When field offsets are known at schema time, filters become direct struct member reads — no lookup tables at query time.
Part V — Approximate Search
lesson 12 · Python
Why exact search fails at scale
The recall vs. latency tradeoff. Why every production system — even on-device — eventually trades exactness for speed, and how to measure that tradeoff honestly.
lesson 13 · Python
HNSW: the idea
Hierarchical navigable small world graphs. How layered greedy search achieves sub-linear query time. The intuition before the code.
lesson 14 · Python
HNSW: implementation
Build it. Graph construction, node insertion, layer selection, greedy search. Tuned for read-heavy workloads where writes are rare and cold-start latency matters.
Part VI — The C Library
lesson 15 · C
Rewriting in C
Port the Python to C. Every data structure you built in Python becomes a struct. Every loop becomes a reason to think about cache lines.
lesson 16 · C
SIMD distance
AVX2 on x86, NEON on ARM. Rewrite the inner distance loop with intrinsics. Measure the speedup against scalar C. Understand what auto-vectorization misses and why.
lesson 17 · C
The finished library
Clean public API. vdb_open, vdb_search, vdb_insert, vdb_close. Benchmark against the Python baseline and against brute-force. Ship it.