Concurrency limits and kernel settings when running NGINX & Gunicorn

A few weeks ago the team I work on at Stylight encountered an unexpected concurrency issue with one of our services. While this specific issue turned out to be simple, we didn't find much information putting it all together online and thought our experience would be worth sharing.

The problem

After going to production and coming under increased load, one of our web services used for financial reporting started dropping requests with 502 (Bad Gateway) response codes alongside the following error message from NGINX:

connect() to unix:/run/gunicorn.socket failed (11: Resource temporarily unavailable)

A quick 10s load test performed with vegeta confirmed that the problem started appearing around 20 req/s while both the NGINX and Gunicorn configurations were setup to handle much more than that and did so when ran locally against the production database.

Running the service in docker locally however exhibited the same problem as production. After some head scratching, we were tipped off the real problem by this paragraph from Gunicorn's documentation:

How can I increase the maximum socket backlog?
Listening sockets have an associated queue of incoming connections that are waiting to be accepted. If you happen to have a stampede of clients that fill up this queue new connections will eventually start getting dropped.

Turns out the issue is quite simple: when using NGINX as a reverse proxy through a unix socket, the unix socket connection queue size (controlled by the net.core.somaxconn kernel setting on Linux machines) is the bottleneck regardless of NGINX's and / or the upstream's configured capacity (in our case Gunicorn backlog queue size). In practice, NGINX will hand over the requests to the socket, and when the socket's queue is full, it starts refusing requests leading NGINX to drop subsequent requests with the status code 502 (Bad Gateway). The number of workers (at either the NGINX or Gunicorn level) doesn't help as everything goes through the same socket.

You can find code and instructions to reproduce the problem in a minimal way in this Github repository.

The solution

As we run our services in Docker on AWS's infrastructure, we needed to figure out where the net.core.somaxconn setting was being limited. Turns out it is set to 128 by default in both docker containers and on Amazon's default Ubuntu AMIs. It can be tweaked with the following commands:

Disclaimer: The default setting of 128 queued connections per socket should work for most applications with fast transactions, and the problem only affects very high concurrency servers and / or applications which expect to wait on blocking I/O and queue up a lot of requests. This was the case for us with reporting queries expected to potentially run in minutes, in which case being able to queue and delay some users was preferable than dropping their requests. A higher setting should not be a problem for most applications with fast transactions; however increasing the TCP queue size could hide some downstream latency issues and failing early may be better. As always consider your specific use-case and whether this is an unavoidable problem to be solved or a symptom of a deeper issue (design, architecture, etc.).

In the end while the issue turned out pretty simple and straightforward once we knew where to look, this served as a good reminder that even when running in cloud infrastructure understanding the underlying tech is as important as ever.

Further reading

Here are some interesting links to dive more into related topics: