I have an issue with a Thorntail 2.6.0.Final javax.websocket.server.ServerEndpoint that streams data to clients. It has 4 threads receiving messages from distinct Redis pub/sub channels. Each thread forwards messages to client WebSocket Sessions via async/text.
The issue I am seeing is that from time to time the Undertow I/O threads allocate large ByteBuffer arrays that are considered 'Humongous' (more than half the G1 Heap Region Size) and result in heavy GC/CPU load.
Note that it is not the size of the ByteBuffers themselves, but the size of the ByteBuffer[] that is the problem. The Heap Region Size is 2M so given these are arrays of 64 bit pointers there are at least (2M / 2 / 64) 16K entries in the arrays. Profiling directly confirms array sizes in excess of 20,000 entries.
The allocation occurs under io.undertow.server.protocol.framed.AbstractFramedChannel.flushSenders() and from a code inspection appears to be due to the number of pending frames to be sent.
Is there anything I can do to tune this behaviour?
I could increase the Heap Region Size but I suspect that will only mask the problem, especially given that there is a downstream issue with sun.nio.ch.IOVecWrapper which keeps disposing its ThreadLocal cached entry because it keeps being asked for a slightly larger size than the prior calls, i.e. the ByteBuffer arrays are growing in length.
I'm assuming the problem is due to the throughput of messages being sent, however the system does not appear to be under any significant load prior to the problem occurring (the array size breaching the 1M size). CPU (6%) and GC (1 young per minute) load are not high and network (~ 200K/sec) is also not high. The network load does not increase when the issue occurs.
I've also considered slow-consumer back-pressure but found no evidence of that either. Plus I don't see how that would necessarily increase the number of buffers to be flushed.
Do I need more Undertow I/O Threads to (hopefully) reduce the number of pending frames being processed by each Thread? I currently have 16 which I suspect is default for an 8 core system. 8 Threads appear to be for inbound and 8 for outbound. I don't know how any Thread affinity works with Undertow but it appears most of the work is being done by just 1 of the outbound I/O Threads.
Or is there a way to limit the number of pending frames allocated to each array?
Any guidance appreciated.