Database Indexes

The single biggest speed lever in any database. Get them wrong and queries get hundreds of times slower.

The slow way. Scan every row

Your users table has 100 million rows. You run SELECT * FROM users WHERE email = '[email protected]'.

Without an index, the database does a full table scan. It reads every single row. It checks every email. It returns the matches. That is 100 million row reads. It can take seconds or minutes.

That is fine when the table has 100 rows. It is a disaster when the table has 100 million.

This is the most common database performance bug. You query a column that has no index. Suddenly your fast API call takes 30 seconds. The database pegs at 100 percent CPU. Everything that depends on it gets slow.

An index is a shortcut to the row

An index is a separate data structure. It maps a column value to the location of the row. It is kept sorted so lookups are fast. Most databases use a B-tree, which is a balanced tree.

Now add an index on email. The database looks up the email in the tree. A few hops. Microseconds. It gets the row location and reads that one row. You went from 100 million row reads to about 4 or 5. Millions of times faster.

The cost is disk space and slower writes. Every write to an indexed column has to also update the index. Reads get much faster. Writes get a bit slower. For most apps, which read more than they write, this is a great trade.

What to put an index on

Index columns that show up in your queries.

WHERE clauses like WHERE email = '...' or WHERE user_id = 42. JOIN conditions, which are the foreign keys that link tables. ORDER BY clauses. Sorting is much faster on an indexed column.

What not to index.

Columns you rarely query. The index costs space and write time for no gain. Columns with very few unique values. A gender column with 3 options barely benefits, since the index still scans many rows. Columns updated very often. Every write has to update the index.

You can also build composite indexes that cover several columns at once, like (user_id, created_at). The order matters. That index helps WHERE user_id = 42 AND created_at > ... but does not help WHERE created_at > ... on its own.

Use EXPLAIN to check your queries

Every SQL database has an EXPLAIN command. Postgres also has EXPLAIN ANALYZE. It shows the plan the database will use for your query.

The output tells you a few things. Is it doing a "Seq Scan," which is a full table scan? Or an "Index Scan," which uses an index? How many rows does it think it will read? What is the estimated cost?

Reading EXPLAIN output is how database engineers find slow queries. "Why is this query slow?" almost always boils down to "It is doing a Seq Scan on a 10 million row table because there is no index on the WHERE column."

Spend time learning to read these plans. It is the difference between guessing and knowing.

Now build it yourself →