Load Balancing Algorithms

There are several different ways a load balancer can choose which server to send each request to. Pick the one that fits your traffic.

Round-robin. Take turns

The simplest strategy. The load balancer just keeps a counter and rotates. Request number 1 goes to Server A. Request 2 to Server B. Request 3 to Server C. Request 4 back to A. And so on.

The good part. It is dead simple. And if every request takes the same amount of work, traffic ends up perfectly even.

The bad part. It treats every server as equal. It treats every request as equal. Real life is usually not like that.

For example, a request to /api/search might take 800 milliseconds. A request to /api/ping takes 5 milliseconds. Round-robin does not know the difference. Server A could be sitting idle while Server B is buried under three slow searches.

Round-robin is the default for most simple setups. It works fine when your requests are roughly uniform.

Least-connections. Send to whoever is least busy

In this strategy the load balancer keeps track of how many active requests each server is currently handling. When a new request comes in, the LB sends it to the server with the fewest in flight.

The good part. It adapts to actual load. If Server B is stuck on a slow query, the LB notices and stops piling more work on it.

The bad part. The LB now has to keep state about every backend. It is more complex than round-robin. And the number of open connections is not always a perfect measure of load. One big slow request might count the same as ten quick easy ones.

Least-connections is a strong default for APIs where the cost of each request varies a lot. Most cloud load balancers, like AWS ALB and Google Cloud Load Balancer, support this mode.

IP hash. Same user, same server

In this strategy the load balancer takes a hash of the client's IP address and uses that hash to pick a server. The math always gives the same result for the same IP. So user A always ends up on Server 2. Every single time.

Why would you want this? Sometimes a particular server has something useful for that user already loaded in memory. Maybe it has the user's active WebSocket connection. Maybe it has cached some user-specific data. Routing the user back to the same server means you do not have to redo that work.

Be careful though. This breaks the "every server is interchangeable" rule from the stateless concept. If Server 2 dies, user A's next request goes to a different server that has no memory of them. So only use IP hash when the stickiness is a nice-to-have optimization, not when it is required for things to work.

Weighted. Not all servers are the same size

Sometimes the servers behind your load balancer are not equal. Maybe you have two small machines with 4 CPU cores and one large machine with 16 cores. The big one can handle four times the work of a small one.

With weighted round-robin, you assign each server a weight. The load balancer sends requests in proportion. For example, out of every six requests, one goes to each small server and four go to the big one.

This is useful for a few situations. Gradual rollouts, where you send 10 percent of traffic to a new server build to test it before going all in. Mixed hardware. Or running A/B tests at the infrastructure level.

Most production load balancers support weighted routing. It is often combined with least-connections, giving you weighted least-connections, which is the best of both worlds.

Now build it yourself →