MAJOR Problems with Hikari Connection Pool

3,354 views
Skip to first unread message

William Karavites

unread,
Sep 20, 2016, 6:49:44 PM9/20/16
to Sakai Development, devs

We've had two downtimes in the last 2 weeks. After some investigation, we found that there may be a problem with the connection pool(s) coming out of the front-ends. 


When we look in the logs after the system goes down, we notice that we get this error message:


java.sql.SQLException: Timeout of 30000ms encountered waiting for connection


After some Google-Fu we came across a few threads that seem to suggest that there is a problem with the implementation of the Hikari Connection Pool. My guess is that the pool somehow gets exhausted or incorrectly times-out.


https://github.com/brettwooldridge/HikariCP/issues/104

https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing


https://github.com/brettwooldridge/HikariCP/issues/256


https://www.postgresql.org/message-id/1395487594923...@n5.nabble.com


I've also printed a copy of our relevant JDBC connection details from sakai.properties and our my.cnf file from the MariaDB database.

# mysql setup
hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect
ven...@org.sakaiproject.db.api.SqlService=mysql
driverC...@javax.sql.BaseDataSource=com.mysql.jdbc.Driver
u...@javax.sql.BaseDataSource=***OVERRIDE IN LOCAL.PROPERTIES***
user...@javax.sql.BaseDataSource=***OVERRIDE IN LOCAL.PROPERTIES***
pass...@javax.sql.BaseDataSource=***OVERRIDE IN LOCAL.PROPERTIES***
testOn...@javax.sql.BaseDataSource=false
validat...@javax.sql.BaseDataSource=select 1 from DUAL
#validat...@javax.sql.BaseDataSource=select 1 from DUAL
defaultTransacti...@javax.sql.BaseDataSource=TRANSACTION_READ_COMMITTED
initi...@javax.sql.BaseDataSource=300
maxA...@javax.sql.BaseDataSource=300
max...@javax.sql.BaseDataSource=300
min...@javax.sql.BaseDataSource=0


I double-checked our properties and the bottom 4 properties (initialSize,maxActive, maxIdle, minIdle) haven't changed in a long time, at least since Sakai 10.



Any help on this issue would be greatly appreciated!


Thank you,

William Karavites






------------------------------------
William Karavites
Team Leader - Development & Sakai
Office: 848-445-8726
------------------------------------

my.cnf

William Karavites

unread,
Sep 20, 2016, 7:05:59 PM9/20/16
to Sakai Development, devs

Here's also some of our InnoDB statuses. I've also attached the FULL innodb status output from MariaDB.


MariaDB [sakai]> show status like '%onn%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Aborted_connects         | 0     |
| Connections              | 542   |
| Max_used_connections     | 72    |
| Ssl_client_connects      | 0     |
| Ssl_connect_renegotiates | 0     |
| Ssl_finished_connects    | 0     |
| Threads_connected        | 71    |
+--------------------------+-------+


and other pertinent mysql ones


MariaDB [sakai]> show status where `variable_name` = 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 69    |
+-------------------+-------+
1 row in set (0.00 sec)



From: saka...@apereo.org <saka...@apereo.org> on behalf of William Karavites <will...@oit.rutgers.edu>
Sent: Tuesday, September 20, 2016 6:49:39 PM
To: Sakai Development
Cc: devs
Subject: [sakai-dev] MAJOR Problems with Hikari Connection Pool
 
--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.
innodb.csv

Matthew Jones

unread,
Sep 20, 2016, 7:32:20 PM9/20/16
to William Karavites, Sakai Development, devs
Similar issues with the poll were discussed on the sakai-user list a few weeks ago and that increasing maxActive fixed it. I would connect with JMX to verify that maxActive is actually set correctly and not being set to the default and you can monitor the JMX value of TotalConnections over time too. Just use visualvm and look at the com.zaxxer.hikari.Pool.  [2]

I mean if the value really is 300 and it's going up to 300 connections that seems pretty unusual. I'd make sure to upgrade to the latest hikaricp (which right now looks to be 2.5.0 but I haven't tried that yet as Sakai is still at 2.4.6) as it has better options for leak detection too.


It says 2.4.8 and 2.5.0 are identical but 2.5.0 just is named different and requires Java 8.

William Karavites

unread,
Sep 20, 2016, 7:42:07 PM9/20/16
to Matthew Jones, Sakai Development, devs

Matt,


Thanks for the advice. I'll doublecheck our connections shortly and see what we find.


What would you recommend we increase maxActive to? 450?


-Will


From: saka...@apereo.org <saka...@apereo.org> on behalf of Matthew Jones <mat...@longsight.com>
Sent: Tuesday, September 20, 2016 7:32:06 PM
To: William Karavites; Sakai Development
Cc: devs
Subject: Re: [sakai-dev] Re: MAJOR Problems with Hikari Connection Pool
 

Matthew Jones

unread,
Sep 20, 2016, 7:55:12 PM9/20/16
to William Karavites, Sakai Development, devs
I'd read the article about pool sizing [1] that's linked somewhere there. Unless your database is over 32 cores it's should max out at at around 50 connections. Oracle recommended decreasing connections to increase performance. You have to monitor it to see, but I don't think we have any over 100, and with yours at 300 seems like it should be sufficient and if it's still hitting that limit there's something else that's wrong.

Mark Triggs

unread,
Sep 20, 2016, 8:03:50 PM9/20/16
to Matthew Jones, William Karavites, Sakai Development, devs

Hi there,

How does `maximumPoolSize` factor into this? The Hikari docs say:

maximumPoolSize

This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. Basically this value will determine the maximum number of actual connections to the database backend. A reasonable value for this is best determined by your execution environment. When the pool reaches this size, and no idle connections are available, calls to getConnection() will block for up to connectionTimeout milliseconds before timing out. Default: 10

And the Sakai default looks to be 10 as well. Does Hikari pay attention to the maxActive setting at all?

Mark

Matthew Jones <mat...@longsight.com> writes:

Similar issues with the poll were discussed on the sakai-user list a few weeks ago and that increasing maxActive fixed it. I would connect with JMX to verify that maxActive is actually set correctly and not being set to the default and you can monitor the JMX value of TotalConnections over time too. Just use visualvm and look at the com.zaxxer.hikari.Pool. [2]

I mean if the value really is 300 and it's going up to 300 connections that seems pretty unusual. I'd make sure to upgrade to the latest hikaricp (which right now looks to be 2.5.0 but I haven't tried that yet as Sakai is still at 2.4.6) as it has better options for leak detection too.

2.4.7 had 47 commits
https://github.com/brettwooldridge/HikariCP/compare/HikariCP-2.4.7...dev 2.4.8 had 30 commits
https://github.com/brettwooldridge/HikariCP/compare/HikariCP-2.4.8...dev

It says 2.4.8 and 2.5.0 are identical but 2.5.0 just is named different and requires Java 8.

[1]
https://groups.google.com/a/apereo.org/forum/#!topic/sakai-user/YDhwK8ICqHc [2] https://jira.sakaiproject.org/browse/KNL-1437

On Tue, Sep 20, 2016 at 4:05 PM William Karavites <will...@oit.rutgers.edu> wrote:

Here's also some of our InnoDB statuses. I've also attached the FULL innodb status output from MariaDB.

-- 
Mark Triggs
<ma...@dishevelled.net>

Matthew Jones

unread,
Sep 20, 2016, 8:16:12 PM9/20/16
to Mark Triggs, William Karavites, Sakai Development, devs
That's a good point Mark, I didn't even notice that very important detail. You really want to be setting something like

maximumPoolSize@@javax.sql.BaseDataSource=100

The default is 10 and would certainly cause problems for production.

That's why I was wanting some information from JMX, my bet is the connections weren't going over 10.

I thought I'd written support to map the old parameters to the new but doesn't look like that one has a setter? I'd kind of expect there would be an error on startup? Hmm!

Matthew Jones

unread,
Sep 20, 2016, 8:16:49 PM9/20/16
to Mark Triggs, William Karavites, Sakai Development, devs
That's just one @

maximum...@javax.sql.BaseDataSource=100

William Karavites

unread,
Sep 20, 2016, 9:03:47 PM9/20/16
to Matthew Jones, Mark Triggs, Sakai Development, devs

Matt,


How would you recommend setting up JMX? I've never done it myself.


I'm trying to follow this thread and access the machine using Visual VM (over a VPN). It seems to be timing out and when I try accessing it using jconsole I'm getting a java memory exception. I'll have to give it a shot when I get into the actual office tomorrow.


-Will


From: saka...@apereo.org <saka...@apereo.org> on behalf of Matthew Jones <mat...@longsight.com>
Sent: Tuesday, September 20, 2016 8:16:36 PM
To: Mark Triggs
Cc: William Karavites; Sakai Development; devs
Subject: Re: [sakai-dev] Re: MAJOR Problems with Hikari Connection Pool
 

Matthew Jones

unread,
Sep 20, 2016, 9:26:11 PM9/20/16
to William Karavites, Mark Triggs, Sakai Development, devs
I think it's pretty easy in modern versions of Java. You have to make sure the ports are opened by passing appropriate JVM options then you either can VPN or SSH forward to the machine so you can directly connect.

Basically the same idea here. 

Most default options turn off the authentication which is fine if the ports are firewalled out and the only way you can get to the machine is through VPN or SSH anyway. Those have always worked for me. In the past I remember needing an RMI agent as described here. (Not updated for awhile) but I think you can just go direct now if the ports are opened through the setenv.sh or whatever method you use.


Matthew Buckett

unread,
Sep 21, 2016, 8:13:00 AM9/21/16
to Matthew Jones, William Karavites, Mark Triggs, Sakai Development, devs
Could it be that a local tool or contrib tool isn't returning
connections to the pool correctly in some situations and so you're
losing connections over time. HikariCP does allow you to get stack
traces when it notices a connection living outside the pool for a long
time, adding this to your connection URL

&leakDetectionThreshold=15000

will warn if a connection doesn't get returned to the pool after 15 seconds.

Do you have lots of connections that haven't been used for a long time:

SHOW FULL PROCESSLIST;


On 21 September 2016 at 02:25, Matthew Jones <mat...@longsight.com> wrote:
> I think it's pretty easy in modern versions of Java. You have to make sure
> the ports are opened by passing appropriate JVM options then you either can
> VPN or SSH forward to the machine so you can directly connect.
>
> Basically the same idea here.
> http://stackoverflow.com/a/32418821/3708872
>
> Most default options turn off the authentication which is fine if the ports
> are firewalled out and the only way you can get to the machine is through
> VPN or SSH anyway. Those have always worked for me. In the past I remember
> needing an RMI agent as described here. (Not updated for awhile) but I think
> you can just go direct now if the ports are opened through the setenv.sh or
> whatever method you use.
>
> https://confluence.sakaiproject.org/display/QA/Remote+JVM+profiling+via+SSH+tunnels
>
>
> On Tue, Sep 20, 2016 at 6:03 PM William Karavites <will...@oit.rutgers.edu>
--
Matthew Buckett
VLE Developer

WebLearn, Academic IT
IT Services, University of Oxford
13 Banbury Road, OX2 6NN
Tel: 01865 283349

Earle Nietzel

unread,
Sep 21, 2016, 10:21:38 AM9/21/16
to Matthew Buckett, Matthew Jones, William Karavites, Mark Triggs, Sakai Development, devs
Yep definitely need to consider that some threads in sakai never release a connection.
For example ClusterEventTracking is like this until recently.

Its always been my feeling that Sakai operates on about 20 connections and when the db is busy this will increase up to about 50 and then typically anything past that means that there is significant pressure on the database.

So while a connection per core is probably the most performant it doesn't account for threads that hold on to connections for the life of the app or any long running queries which sakai has both!

Reply all
Reply to author
Forward
0 new messages