My name is Rex Morgan and I'm a Staff Software Engineer in Austin, TX working at Indeed.

I love learning new languages and tinkering with new technologies. I'm currently having fun with Elixir, Java, C#, and Kubernetes.

Scaling Web Applications

When trying to scale applications horizontally that weren't built with that in mind, you'll undoubtly be tasked with moving state around so that it can be available to each instance of the application.

It doesn't matter how you're orchestrating each instance of your application. Whether each instance is hosted on its own virtual machine, inside a container, has its own physical server, or some combination of the above. You'll still run into the issue of moving state to a centralized place where all instances can access it.

The application probably already does this for most of its state. For example, most of the application's data probably lives in some sort of centralized database, like PostgreSQL or MySQL. If this isn't the case, then that would be step one for scaling the application across multiple instances.

I believe that any running instance should be able to handle any request. When a client uses a web application, each request from the browser should be able to be handled by different backend servers, and the client application shouldn't know or care.

Session Data

How about session data that's being stored on the server that handled the request, unavailable to other instances? Many technology stacks are configured by default to work this way.

I prefer updating the session storage of the framework I'm using to store that data in something like redis. The redis server now becomes the central location which instances hit to load the data it needs for that user's session.

Sticky Sessions

When load balancers say they can do sticky sessions, that means that the load balancer will do its best to ensure that all requests for that user's session go to the same backend instance. This is often done by injecting a cookie in the first response for that user that the load balancer can then read back on subsequent requests to route the request to the same backend instance that serviced the first request.

At first glance, this solves the issue of user specific data being on a single instance. Users will be stuck to the instance that has their data, great! But in a way, it kicks the problem down the road and begs new questions.

What happens when you deploy new code and the instance comes down for updates? What happens if the instance crashes for any reason? You can't garuntee that an instance will be available for the duration of a user's session. Because of that, you still have to consider situations where requests for the same user's session go to different instances.

I don't think using sticky sessions is always a bad thing, however it would be a last resort option. There are some advanced situations where you might want to ensure a user gets pinned to a specific instance. You could do this to ensure the instance has a hot cache for user specific, often accessed, difficult to compute data. However, I would only go that route after having ruled out denormalizing that data in the main datastore or caching the data in something like redis where any instance would have access.

User Uploads

If the application allows user uploads (a user's avatar, for example) then we don't want to store that information on the filesystem of the server that received the request. If we did that, then it wouldn't be available to other instances running that need to serve the file back at some point. I prefer to store the data in an object storage service.

If you're hosting your application in the cloud, then depending on where you're hosting the application, that'd be Amazon S3, Cloud Storage (on Google), Azure Files, or Spaces (on DigitalOcean).

However, if you're hosting the application in a datacenter and ruled out using one of the above storage services, then you'd need to setup some shared space for storage. This could be a NAS (Network-Attached Storage), creating your own ceph storage cluster, or setting up Swift from OpenStack.

Third Party Libraries

Scaling your application also requires knowledge about how the third party libraries your application uses. For example, does your .NET application use NHibernate second-level caching? For ease, it may have been setup to use one of the built in memory caches, such as CoreMemoryCache, RtMemoryCache, SysCache, or SysCache2. If you were previously hosting on a single instance, that's not a problem. But now that you're trying to spread the load across multiple instances, you may start seeing really odd caching patterns. The cache might be updated in one instance but not others. Maybe your application begins hitting your database more than expected because each instance starts with a cold cache.

This is a single example, but you'll want to be knowledgable about the third party libraries your application uses, what features you're using, and if that's going to cause trouble when trying to scale your application.