Horizontal vs Vertical Scaling

When traffic grows, you have two options. They lead to very different futures.

Vertical scaling. Make the server bigger

Vertical scaling, also called "scaling up," means giving your one server more power. More CPU. More RAM. Faster disk. More network bandwidth.

It is the easiest fix when you are small. Your app outgrew a 2-core, 4 GB machine? Upgrade to 8-core, 32 GB. No code changes. No architecture changes. Just bigger hardware.

This works until it does not. There are three reasons it stops working.

Hardware has a ceiling. Even the biggest cloud machine has limits.

Bigger machines cost much more than you would expect. A 64-core box is way more than 32 times the cost of a 2-core box.

And no matter how big it gets, it is still one server. If it crashes, your whole site is down. There is no backup.

Horizontal scaling. Add more servers

Horizontal scaling, also called "scaling out," means putting more servers behind a load balancer. Each server is identical to the others.

Traffic doubled? Add more servers. Need to deploy new code without downtime? Roll out new servers next to the old ones, then switch over. One server crashed? The load balancer stops sending it traffic and you spin up a replacement.

This is how the modern web works. Netflix, Twitter, Stripe, everyone. Every cloud platform is built for it. Auto-scaling groups on AWS, Kubernetes pods, ECS services. They all assume horizontal scaling.

The catch is that your code has to support it. Your servers must be stateless (see the previous concept). All state has to live in a shared place like a database or cache. If your servers keep things in their own memory, horizontal scaling will break.

When each one is the right choice

Vertical scaling makes sense for:

Early apps where simplicity matters more than huge scale. Databases. Most databases are hard to scale horizontally. You usually grow the database server first before splitting it. Workloads where one process needs huge amounts of memory or CPU at the same time.

Horizontal scaling makes sense for:

Web servers and API backends. They are stateless, so you can clone them easily. Workers pulling from a queue. Add more workers when there is more work. Any system that has to stay up. Multiple servers means if one fails, the others keep going.

In real production systems, you usually do both. You scale the database vertically to a reasonable size, and you scale the web tier and the workers horizontally across many smaller machines.

The rule of thumb. Scale the stateless tier horizontally. Scale the stateful tier vertically. When even that is not enough on the data side, you reach for replication or sharding (covered in later concepts).

Now build it yourself →