Cache

The slow database does not need to hear every question. Most of them have the same answer.

The database is the bottleneck

Databases are reliable but they are slow. Every query has to walk through disk pages, hold locks, join tables, and return rows. A typical relational database can handle about 30 reads per second before query times start to balloon out of control.

Now imagine your web app is getting 150 reads per second. Every one of them goes to the database. The database drowns. Users get errors. Someone on call gets paged at 3am.

You could just buy a bigger database, but that gets expensive fast. And eventually traffic will grow, you will hit the same ceiling again, and you will be back where you started.

There is a smarter way.

Most reads are asking for the same thing

Look at real traffic. Most reads are not for unique data. They are for the same few popular items, over and over. The homepage. The top trending post. The user's own profile.

So here is the trick. Once you have fetched a piece of data from the database, keep the answer around in memory. The next time someone asks for the same thing, you skip the database entirely.

This in-memory store is called a cache. Common ones include Redis, Memcached, and Cloudflare KV. They are built for one job. Take a key, return a value, very fast.

Put the cache in front of the database

Place the cache between your server and the database. From now on, every read works in two steps.

Step 1. Check the cache. Is the answer already there? If yes, this is called a cache hit. Return the value to the client in less than a millisecond. The database does not even know this request existed.

Step 2. Cache miss? Fall through to the database. Get the value. Before sending it back to the client, also store it in the cache. The next request for the same key will be a hit.

This pattern is called cache-aside (or look-aside). It is by far the most common caching pattern in web apps.

Let us check the math

A well-tuned cache has about a 90 percent hit rate. So out of 150 reads per second:

About 135 r/s hit the cache and come back instantly. The database never sees them. The remaining 15 r/s miss the cache and fall through to the database.

15 r/s is well under the database's 30 r/s limit. The database is happy. The users get sub-100ms responses every time.

Hit ratio matters a lot. At 80 percent hit rate, the database sees 30 r/s, right at the line. At 50 percent hit rate, it sees 75 r/s, back to overloaded. Your job as the engineer is to keep that hit rate high. That usually means caching the right things and giving them the right time-to-live.

The tradeoff. Caches lie

Caches lie. On purpose.

When you update a row in the database, the cache still holds the old version until you do something about it. For some number of seconds, sometimes minutes, your users see outdated information.

You have a few ways to handle this.

TTL (time to live). Each cached entry automatically expires after, say, 5 minutes. Simple. But stale data can hang around for up to 5 minutes after a write.

Write-through. Every write goes to both the cache and the database at the same time. No staleness, but writes are slower because you have to wait for two stores.

Invalidate on write. When you update the database, immediately delete the matching cache entry. The next read will be a miss and re-populate it with fresh data.

Pick based on how much staleness your users can tolerate. We will go deeper on these strategies in the invalidation concept.

Now build it yourself →