Hello,
We are currently trying to replace a vendor supplied pooling solution with HikariCP (said vendor solution has a memory leak and although they have given us a patch, they have yet to port it to the most recent versions).
My goal was to swap and optimize the HikariCP pool settings at the same time, however management has asked that we roll out with the existing pool configuration to minimize risk and accelerate the migration. Our current settings are maxConnections: 200, minIdle: 25
We have frequent (1x every 2w) issues where we lose customer concurrency and have a subsequent mass login event, which stresses the pool for a brief period. With the vendor-based solution, we see the active connections immediately grow to 200, and then a few minutes taper back down to ~15-20 active connections. During the growth window, there are no timeouts (connection timeout = 5s, query timeout = 2s).
When we load-tested HikariCP, we noted that, again connections immediately grow to 200 and then taper back down to ~15-20 active connections. During the growth window, there are multiple timeouts, e.g. "Connection is not available, request timed out after 5002ms" (connection timeout/query timeout in Hikari is set to 5s).
Since the steady state is ~15-20 active connections, I attempted to alleviate the issue by using a fixed pool size of 25 connections. Unfortunately, the system exhibited the same issues/timeouts.
I wanted to check my understanding of pool initialization here. I did a bunch of research and read a bunch of past issues, but I'm not 100% sure. My assumption is that although the pool is expected to have 25 minimum connections, those connections are not established until they are actually needed. Since we have a flood of incoming requests all at once, there is a large number of requests for pool connections all at once. Since physical connection requests are blocking, the requests stack up and eventually timeout. My further assumption is that the blocking portion includes not only the getConnection, but the actual query execution itself (I agree this is potentially an application issue, and we have plans to fix it, but loading user data for our system is fairly complex and this task is pretty far out).
Finally, is a scenario that would benefit from setting the undocumented "com.zaxxer.hikari.blockUntilFilled" to true?
Sorry for the wall of text here, and thanks in advance,
Greg Haase