Thanks for adding that additional context.
It re-iterates my hunch now. The Hub is basically coupling a lot of other distributed components inside it and I am not very conversant with the Grid4 distributed architecture.
Quoting the documentation
A Hub is the union of the following components:
- Router
- Distributor
- Session Map
- New Session Queue
- Event Bus
In Grid3 I know that the Hub merely plays the role of a dispatcher of requests to the appropriate nodes for new sessions or existing sessions.
The Grid4 does the same stuff in a different way.
I briefly dug in the codebase and I realised that Grid4 now seems to be using Netty under the hoods. Earlier it was Jetty. There was one JVM argument that would let you control the number of threads in the Jetty world. In Netty that JVM argument is "-Dio.netty.eventLoopThreads". By default it is n*2 (Where n is the number of processors on your machine)
Now with respect to each of the components that are part of the same JVM:
- The New Session Queue is basically just an executor service with a thread pool size of 1.
- The distributor also is using just an executor service with a thread pool size of 1.
- The event bus is using zero mq under the hoods and I wasn't able to figure out what its thread utilisation is.
So I am going to reiterate what I mentioned earlier. To be able to decipher as to which of the Hub component is basically ending up with thread starvation, you might want to run the Hub in the distributed mode (i.e., instead of having just 1 JVM for the Hub, it would now comprise of 5 JVMs 1 each for each of the above mentioned components). This will also give you better visibility into what is causing the thread starvation and you can accordingly take this forward.
Hope that helps