Recovery after connection problem

3,269 views
Skip to first unread message

Luis Ángel Vicente Sánchez

unread,
Apr 1, 2014, 8:18:44 AM4/1/14
to hika...@googlegroups.com
We have an app that have been running without problems during the last month but in the last few days we have had some problems with our DWH (connection problems, DNS problems, ...). I was expecting HikariCP to recover from that but it's not; we are using HikariCP 1.3.3 and my log files have been full of messages like these for the last few hours even if the DWH started to work normally after a few minutes:

01:41:45.663 [scalike-14] DEBUG com.zaxxer.hikari.HikariPool - Connection returned to pool is broken, or the pool is shutting down. Closing connection.
01:41:45.663 [scalike-14] WARN  com.zaxxer.hikari.proxy.ConnectionProxy - Connection org.postgresql.jdbc4.Jdbc4Connection@1e3a07ec marked as broken because of SQLSTATE(08003), ErrorCode(0): null
01:41:45.663 [scalike-14] DEBUG com.zaxxer.hikari.HikariPool - Connection returned to pool is broken, or the pool is shutting down. Closing connection.
01:41:51.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:41:51.622 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:42:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:42:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:42:51.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:42:51.622 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:43:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:43:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:43:51.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:43:51.622 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:44:16.761 [scalike-4] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=15)
01:44:16.762 [scalike-4] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.762 [scalike-4] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=15)
01:44:16.762 [scalike-15] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=14)
01:44:16.762 [scalike-15] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.762 [scalike-15] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=14)
01:44:16.762 [scalike-12] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=13)
01:44:16.762 [scalike-12] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.762 [scalike-12] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=13)
01:44:16.763 [scalike-10] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=12)
01:44:16.763 [scalike-10] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.763 [scalike-10] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=12)
01:44:16.763 [scalike-11] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=11)
01:44:16.763 [scalike-11] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.763 [scalike-11] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=11)
01:44:16.763 [scalike-8] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=10)
01:44:16.763 [scalike-8] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.763 [scalike-8] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=10)
01:44:16.764 [scalike-6] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=9)
01:44:16.764 [scalike-6] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.764 [scalike-6] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=9)
01:44:16.764 [scalike-2] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=8)
01:44:16.764 [scalike-2] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.764 [scalike-2] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=8)
01:44:16.764 [scalike-16] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=7)
01:44:16.764 [scalike-16] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.764 [scalike-16] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=7)
01:44:16.765 [scalike-3] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=6)
01:44:16.765 [scalike-3] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.765 [scalike-3] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=6)
01:44:16.767 [scalike-9] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.767 [scalike-9] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.767 [scalike-9] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.767 [scalike-5] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.767 [scalike-5] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.767 [scalike-5] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.767 [scalike-7] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.767 [scalike-7] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.767 [scalike-7] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.768 [scalike-14] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.768 [scalike-14] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.768 [scalike-14] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.769 [scalike-1] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.769 [scalike-1] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.769 [scalike-1] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.769 [scalike-13] DEBUG com.zaxxer.hikari.HikariPool - Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:16.769 [scalike-13] ERROR com.zaxxer.hikari.HikariPool - Timeout of 150000ms encountered waiting for connection.
01:44:16.769 [scalike-13] DEBUG com.zaxxer.hikari.HikariPool - Timeout failure Pool stats (total=16, inUse=16, avail=0, waiting=5)
01:44:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - Before pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)
01:44:21.621 [Hikari Housekeeping Timer] DEBUG com.zaxxer.hikari.HikariPool - After pool cleanup Pool stats (total=16, inUse=16, avail=0, waiting=16)

The application is throttling the database queries so when everything is working normally, waiting is always 0... and so far anytime I saw waiting <> 0 there was either a temporary DNS problem or a temporary connection problem. I'm upgrading to 1.3.5 but I don't know if that would solve my problem as I don't really know if HikariCP is able to recover from that kind of failures.

Kind regards,

Luis

Brett Wooldridge

unread,
Apr 2, 2014, 2:45:46 AM4/2/14
to hika...@googlegroups.com
Looking at the logs, this is what I think occurred.  When the database connections failed and an exceptions were thrown, the calling code (either your code, or scalike) failed to close the connection.  Because Scala is asynchronous in nature, I'm not sure what the execution flow would look like, but conceptually similar in Java would be a try...finally which always ensures that connections are closed whether an error occurs or not.  If an exception is thrown, and it is not trapped by the callee and the connection is not closed (returned to the pool), then the pool will max. out and eventually when there are no more free connections everything will stop.

One useful thing would be to configure leak detection in this case.  If you never expect a query or use of a connection to take more than, for example, 5 seconds then set the leak detection at 2x or 3x that value -- say 10 seconds or 15 seconds.  When connection breaks, and a SQLException is thrown, if there are code paths where connection closes are not occurring the leak detection will log a stacktrace pointing to the source of allocation.

So to answer your question directly, yes HikariCP can recover from these situations, but only if the application itself properly handles closing connections even when exceptions occur.

Luis Ángel Vicente Sánchez

unread,
Apr 2, 2014, 5:14:08 AM4/2/14
to hika...@googlegroups.com
Hi Brett, 

Thank you for your answer! I'm using monads and monadic composition instead of a try/catch to handle exceptions, therefore the connection should be closed whether there is an exception or not. I'm going to check scalike source code to verify they are not leaking connections but as long as they raise an exception when there is a problem, it should be ok.

I'm going to just remove my ethernet cable while testing the app locally to see if I can reproduce the problem that way. I will let you know.

Luis Ángel Vicente Sánchez

unread,
Apr 2, 2014, 10:15:54 AM4/2/14
to hika...@googlegroups.com
I have been doing some more tests and using /etc/hosts I have been able to reproduce the problem; I added the database hostname and IP to /etc/hosts and, after allowing hikaricp to create all connections, I change the IP to a wrong one. The maxLifetime property is set to 5min; when hikaricp tried to create new connections I got this error from the connecionpool all the time:

[info] 04/02 13:31:23 DEBUG[Hikari Housekeeping Timer] c.z.h.HikariPool - Before pool cleanup Pool stats (total=4, inUse=4, avail=0, waiting=0)
[info] 04/02 13:31:23 DEBUG[Hikari Housekeeping Timer] c.z.h.HikariPool - After pool cleanup Pool stats (total=4, inUse=4, avail=0, waiting=0)
[info] 04/02 13:31:30 WARN [HikariCP connection filler] c.z.h.HikariPool - Connection attempt to database failed (not every attempt is logged): Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
[info] org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
[info] at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:207) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:138) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:31) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.Driver.makeConnection(Driver.java:410) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.Driver.connect(Driver.java:280) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at java.sql.DriverManager.getConnection(DriverManager.java:571) ~[na:1.7.0_25]
[info] at java.sql.DriverManager.getConnection(DriverManager.java:215) ~[na:1.7.0_25]
[info] at org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:99) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:82) ~[postgresql-9.3-1101-jdbc41.jar:na]
[info] at com.zaxxer.hikari.HikariPool.addConnection(HikariPool.java:331) [HikariCP-1.3.5.jar:na]
[info] at com.zaxxer.hikari.HikariPool.access$300(HikariPool.java:51) [HikariCP-1.3.5.jar:na]
[info] at com.zaxxer.hikari.HikariPool$1AddConnection.run(HikariPool.java:248) ~[HikariCP-1.3.5.jar:na]
[info] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_25]
[info] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_25]
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_25]
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_25]
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_25]
[info] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
[info] Caused by: java.net.ConnectException: Operation timed out
[info] at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.7.0_25]
[info] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[na:1.7.0_25]
[info] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[na:1.7.0_25]
[info] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[na:1.7.0_25]
[info] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.7.0_25]
[info] at java.net.Socket.connect(Socket.java:579) ~[na:1.7.0_25]
[info] at java.net.Socket.connect(Socket.java:528) ~[na:1.7.0_25]
[info] at org.postgresql.core.PGStream.<init>(PGStream.java:60) ~[postgresql-9.3-1101-jdbc41.jar:na]


scalike is waiting for a new connection but the connection pool can't create a new one and is retrying again and again and again. After putting back the right IP the connection pool recovered after a while but in the meantime I saw this log messages:

[info] 04/02 13:50:21 DEBUG[Hikari Housekeeping Timer] c.z.h.HikariPool - Before pool cleanup Pool stats (total=4, inUse=4, avail=0, waiting=5)

I don't know why it was stuck yesterday... maybe there was some other problem in the database side and hikaricp was able to create connections but the database was not responding... :S
Reply all
Reply to author
Forward
0 new messages