block by mrchrisadams 38a5b74bb5517fc15d113c2d15ab7ef6

How to do application level scale to zero with Gunicorn

What’s this?

This is an experiment with making gunicorn gracefully scale down to zero after X seconds, as a way to do application-level scale-to-zero behaviour in applications that use a webserver like Gunicorn. The idea here is that you do not need to mess too much internal logic of an existing application, nor put it in a container if you use this.

Instead you use the web server’s own support for handling SIGTERM signals, to allow gracefully scaling down of processes when they are not in use.

If you’re using Linux to run a server, the chances of Systemd being used to manage your processes is fairly high, as it’s the default option for a number of linux distributions now.

It also means you might not need a complicated “serverless” system to orchestrate scaling up and down, to reclaim memory on a server for us in other tasks if you have a website or webservice that isn’t continually serving traffic.

This is nice, but how would you spin up processes as they come in?

You would typically combine this with something like Systemd’s existing wake on socket request functionality to spin up processes as soon as traffic is detected on the port that Systemd is listening to.

You’d rely on Systemd to wake gunicorn back up when there is new inbound traffic.

This is nice, but how would you spin down processes after they’re no longer needed?

You would use a config like the one below to tell gunicorn to gracefully exit after a given period time with zero traffic to handle scaling down.

This is nice, but how do you scale up and down processes if you have just little bit of traffic, or larger surges of traffic?

The act of telling gunicorn to scale to different numbers of workers / threads beyond default number is an exercise left to the reader.

scale_to_zero_gunicorn.py