Health Checks

A load balancer pointed at dead servers is just a slow way to fail.

The basic idea

You have a load balancer with three servers behind it. One of them crashes. It could be out of memory. It could be a hung process. The network cable could be unplugged. The reason does not matter. What matters is that the load balancer has to know not to send any more requests there.

The way it knows is called a health check. Every few seconds, the load balancer sends a tiny request to each server. The request usually hits a special URL like GET /health. The server is supposed to reply with a quick "I am fine" if everything is working.

If the server replies with 200 OK, it is considered healthy and the LB keeps sending traffic to it.

If the server replies with an error or does not reply at all within a few seconds, it is marked unhealthy. The LB stops sending traffic to it.

When the server starts responding successfully again, the LB marks it healthy and resumes sending traffic. This loop is happening every 1 to 30 seconds in every production system you have ever used.

What does "healthy" actually mean?

It turns out there are different kinds of "healthy." The most common three are below.

Liveness checks ask "is the process running at all?" The health endpoint returns 200 as long as the server can respond to anything. It does not check whether the server is actually able to do useful work.

Readiness checks ask "is the server ready to handle real traffic?" The endpoint only returns 200 if startup finished, the database connection works, and the cache is reachable. A server can be alive but not ready, for example if it just started up and is still warming up.

Deep health checks actually run a real request through the server. They might query the database, render a small template, and return real data. They catch more problems but they put more load on the server.

Production systems often run two checks at the same time. Liveness tells the platform whether to restart the container. Readiness tells the load balancer whether to send it traffic. Kubernetes made this distinction official. A pod can be "alive" without being "ready."

Two knobs you can turn

When you set up a health check, you get to decide two things. Both have real consequences.

How often to check. Every 1 second means fast detection. If a server dies, the LB notices and stops sending traffic within a few seconds. But the checks themselves add a small constant load on every backend. Every 30 seconds is lighter on the backends. But a dead server can keep getting traffic for up to 30 seconds before anyone notices.

How many failures in a row before marking a server unhealthy. If you remove a server after just one bad response, a brief network hiccup can cause traffic to flap around uselessly. If you require five failures in a row, real outages take 30 or more seconds to detect. Most setups use 2 or 3.

The default in AWS Application Load Balancer is to check every 30 seconds and mark a server unhealthy after 2 failures in a row. Tune based on how aggressive you need the detection to be.

Wrong settings cause one of two problems. Either you flag healthy servers as dead because of brief hiccups (false positives, called flapping). Or you keep sending traffic to actually-dead servers for too long (false negatives, causing user errors).

Now build it yourself →