Multi-Tier Caching

Real systems do not have one cache. They have a stack of them.

The full path, from browser to database

When a user loads their Twitter timeline, the request can be answered at any of these layers. Each one is faster than the next.

Browser cache on the user device, 0 ms. The HTML or JSON is sitting in their browser already.

CDN edge, about 10 ms. A nearby Cloudflare or CloudFront server has it.

Application cache, about 1 ms inside the same data center. The web server has the answer in its own memory.

Shared cache like Redis, about 3 ms. A Redis cluster shared across all servers.

Database, around 10 to 100 ms. The real source of truth.

The faster the layer, the more requests it should absorb. The database at the bottom should only see what nothing above it could answer.

Hit rates stack up

If each layer has a 90 percent hit rate, watch what happens.

100 requests hit the edge. 90 get served there. 10 fall through.

Those 10 hit the app cache. 9 get served. 1 falls through.

That 1 hits Redis. About 0.9 gets served. 0.1 falls through.

The database sees about one tenth of one request.

You started with 100 requests and the database saw almost nothing. That is why large systems can serve millions of users on a single small database. The cache stack absorbs almost everything.

But it only works if you take care of it. You need a good key strategy. You need to invalidate correctly. You need to watch your hit rates. You need to size every layer. The closer to perfect each layer is, the more it absorbs.

Local vs shared caches

One tradeoff worth knowing. Local cache vs shared cache.

A local cache lives inside the server process. Think of a Java HashMap or a Node.js Map. Lookups take nanoseconds. But every server has its own copy. They can drift apart. Ten servers means ten copies, ten times the memory, and ten chances of stale data.

A shared cache like Redis or Memcached is one store that everyone uses. All servers see the same data. A write becomes visible everywhere at once. The cost is that every read is a network call, about 1 ms instead of nanoseconds.

Most systems use both. Local cache for very hot, rarely changing data like config values. Shared cache for user data, sessions, and computed results.

How the stack breaks

Multi-tier caching has a few classic ways to fail.

Thundering herd. A hot cache entry expires. Suddenly 10,000 requests all miss at the same time. They all hit the database at once. The database melts. The fix is to put a lock around the refresh, or to serve the stale value while a single worker fetches a fresh one.

Cold start. You just deployed. The servers are fresh. The caches are empty. The first minute of traffic hits the database hard. The fix is to warm the cache before sending real traffic to the new servers.

Mismatched invalidation. You cleared the CDN but not the app cache. Or the other way around. Users see old data on some pages and new data on others. The fix is to clear all layers from a single trigger.

Cache becomes the source of truth. A year goes by. Nobody checks the database. The cache holds writes that never made it back. One Redis crash and the data is gone. The fix is to remember that caches are caches. Always have a durable store behind them.

Now build it yourself →