Summary: In my opinion: the gRPC server libraries should include a mechanism for limiting the number of concurrent requests and connections by default. This is necessary for ensuring that servers stays within memory limits during overload. The core libraries currently do not support this, which requires everyone using gRPC for production use cases to reinvent the wheel (see below).
- Is this something the core gRPC team agrees with? There is a discussion on grpc-java that suggests maybe it is: https://github.com/grpc/grpc-java/issues/1886
- If so, I have a very bad Go implementation I would be happy to contribute.
- Maybe we should create a "best practices" document that suggests people implement limits themselves, if we are waiting for the "right" implementation?
Without these limits, every gRPC server is a few hundred extra connections away from a cascading failure. I personally would like the core library to set some "reasonable" limits for both requests and connections that is not infinite, and allow me to configure them. This is similar to how the library sets a maximum message size by default. However, I can understand that might be controversial, and potentially a significant behaviour change. At the very least, the libraries should include the right pieces to make it easy for me to configure them, without needing to implement my own semaphore interceptor and listener.
Counter-argument: This is an application policy, not a gRPC policy
An argument against this is that the "right" settings are going to be different for each application, so gRPC should not set something that will be wrong. Additionally, some applications will want sophisticated policies, such as allowing a large number of "cheap" requests, but only a small number of "expensive" requests. Today, gRPC provides the necessary hooks to implement these limits yourself (e.g. I implement a version at
https://github.com/evanj/concurrentlimit).
Personally: This is true, but gRPC should at least include a "simple" implementation that would cover the "basic" use case. Advanced users could still override the default, if necessary.
Context
At work, we recently had a classic cascading failure of a gRPC service because the service ran out of memory. We believe one of the root causes is that during overload, our Go gRPC server accepted an unlimited amount of work, causing it to exceed its memory limits and get OOM killed. In testing, I can easily cause the server to run out of memory either by sending too many concurrent requests, or by establishing a very large number of idle gRPC connections. To be able to stay within memory limits, even in overload scenarios, I need this server to limit both the number of connections and the number of requests.
Related Issues
Searching in the gRPC closed issues finds a large number of issues that may have been caused by a lack of limits.
It turns out that none of the gRPC implementations in any language out of the box support the limits necessary for a server to survive this type of overload. Our service is in Go, but after a quick review, it appears to me the same limitations exist in the Java and C++ implementations.
People who have solved the same problem