Why Caches Exist

A cache trades memory for time. Once you see how much time it saves, you will want to cache everything.

The speed pyramid

Different kinds of storage have wildly different access speeds. Look at how long it takes to read data from each one.

A CPU register, about 1 nanosecond. A CPU L1 cache, about 1 nanosecond. RAM (main memory), about 100 nanoseconds. That is 100 times slower than L1. SSD disk, about 100 microseconds. That is 1,000 times slower than RAM. A network call to a database, about 1 to 10 milliseconds. That is 10,000 times slower than RAM. A network call across continents, about 100 to 300 milliseconds. That is a million times slower than RAM.

Each layer down the list is roughly 100 times slower than the one above it. Your goal as a system designer is simple. Keep the data that you use a lot as close to the top of this pyramid as possible.

Most reads are the same reads

Here is the key insight. Most workloads have huge skew. In a typical web app:

About 80 percent of requests touch only 20 percent of your data. This pattern is so common it has a name. The Pareto distribution.

Some examples. Your homepage is loaded a million times for every time it gets edited. The top trending post is read by everyone. Obscure posts are barely read at all. Logged-in users mostly view their own profile, the same friends, and the same recent posts.

So if you compute the answer once and remember it for the next thousand requests, you can avoid most of the work. That is the entire idea of a cache.

A cache lives in RAM, around 100 nanoseconds per access. A database lives on disk and behind a network call, around 1 to 10 milliseconds. A cache hit is around 10,000 times faster than a database read. If 90 percent of your reads can be served from cache, your database sees only one tenth of the load.

There are caches at every layer

When people say "cache," they could mean many different things. Real production systems use most of them at once.

CPU cache. Managed by the hardware itself. You do not see it. Browser cache. Your browser keeps copies of images, CSS, and scripts so it does not have to re-download them on every page load. CDN cache. Covered in the CDN concept. Edge servers around the world hold static assets close to users. App-server local cache. Stored in the server's own memory. A Java heap, a Node.js Map, a Python dictionary. Very fast, but each server has its own copy. Distributed cache. A shared cache like Redis or Memcached. All servers see the same data. Slightly slower because it requires a network call. Database cache. The database itself keeps recent query results in RAM. Materialized views. Pre-computed query results stored in the database for instant access.

You do not pick one. You combine them. Each layer absorbs traffic so the layer below it stays fast.

The hard part. Knowing when the cache is wrong

Phil Karlton, a famous software engineer, once said this.

"There are only two hard things in computer science. Cache invalidation and naming things."

Why is cache invalidation hard? Because once you save an answer, the world keeps changing. The user updates their name, but the cache still has the old name. The product's price drops, but the cache still shows the old price. The deleted comment keeps showing up.

You have three options for how to deal with this.

Use a timer. Every cache entry expires after, say, 5 minutes. You accept some staleness in exchange for simplicity.

Invalidate on write. Whenever the underlying data changes, you update or delete the cache entry. Cleanest. Hardest to get right.

Do not cache that thing at all. Some data is too important to risk being stale.

We will go deep on invalidation strategies in the next concept. For now, just know that caching is powerful, but every time you add one, you also added a question. What does the truth look like right now?

Now build it yourself →