First time User: Many Doubts: Hikari CP 2.4.6

Yogendra Joshi

unread,

Jun 10, 2016, 11:03:18 PM6/10/16

to HikariCP

Hello Brett!

Firstly, congratulations on releasing this great library! I've been seeing results out from many many users and felt so excited that I would too have a fantastic experience. I have downloaded HikariCP 2.4.6 from Maven Central and started using the library to replace the ages old Apache DBCP 1.4.that we were using.

Let me first give you the context: We have a batch process in our business which processes 250 batch jobs every night starting from 7.00PM to 7.00AM and one of the batch jobs reads/processes and writes data in around 38-40 tables. Since this batch always took 4H.35m to complete with Apache, we thought of using HikariCP as an alternative as its FAST has great features and very well reviewed by users! We are utilizing Spring Batch's partition strategy whereby we are kicking off 30 parallel instances of the same job. There is only 1 Java process but 30 internal threads each processing some records. We are using IBM JDK 8.5.5 with Java 7 as our version. This batch is executed as part of a shell script.

Problem: We are trying to process 10,000 records with a commit count of 200 which means after every 200 records read/processed, it would be committed (200 all at once) to the DB. The Batch always processes around ~9500 records and goes into hang mode. At this moment, the batch does not do anything, no jvm activity, no database processing but HikariCP throws an error :

Caused by: java.sql.SQLException: Connection is closed

at com.zaxxer.hikari.pool.ProxyConnection$ClosedConnection$1.invoke(ProxyConnection.java:468)

We reduced the commit interval to 100 and the batch processes perfectly fine. We gained 15 mins of batch processing time with HikariCP! We again changed the commit interval to 200 and re-ran the batch. Our database is behind a firewall and is currently having SQLNET.EXPIRE_TIME set as 10 minutes.

We are using Spring Batch with JSR 352 approach and below is our Spring XML configuration:

<props>

<prop key="driverClassName">${batch.jdbc.driver}</prop>

<prop key="user">${batch.jdbc.user}</prop>

<prop key="password">${batch.jdbc.password}</prop>

</props>

</property>

</bean>

Doubt 1: We are using log4j for logging stuff and to satisfy hikari dependency, we are having slf4j-log4j12-1.7.7, slf4j-api-1.7.7.jar on the classpath. In our log4j.properties, we are having log4j.logger.com.zaxxer.hikari=DEBUG set but I never see ANY logs from HikariCP in my logging. Below is the excerpt from log4j.properties:

log4j.rootLogger = ALL, stdout, debug, exception

layoutPattern=%d{dd MMM yyyy HH:mm:ss,SSS} %-5p [%l] - %m %n

otherLayoutPattern=%d{dd MMM yyyy HH:mm:ss SSS}, %m %n

log4j.logger.com.zaxxer.hikari=DEBUG

Doubt 2: In order to understand the pool usage: idle, active, inuse I thought of using registerMbeans property of HikariCP and view it using JConsole as its an JMX MBean. However, the moment I added the property to "true", I got an error saying "java.lang.IllegalArgumentException: Pool with the same name already exists". I only have 1 Batch Job running at the moment so I dont know why it would give this error? Is it because we are running 30 internal threads it is saying this?

Doubt 3: We think the issue could be the firewall is causing the connections to close after 30mins ( current firewall timeout). We set the maximumPool to 30, maxLifeTIme to 10mins and idle to 5mins. Is there any other setting that you could help us with that could make the batch run successful without being hung?

Doubt 4: Whenever I give 30 as the maximumPoolSize, it somehow creates 60 connections. I verified this with GV$session. It doesnt make any sense to me, but ideally it should not cross beyond 30. To be double sure, I changed the setting to 60 and GV$session showed 120! Can you explain this behavior a bit please?

Doubt 5: When we set the leakDetectionThreshold to 30000 ( 30 seconds), I was assuming it would give me a stacktrace for all the threads which may be a leak suspects. But what I identified is, it always gives stacktrace for the Spring specific transaction package ( doBegin method to be more precise) where our application is beginning the transaction. Is there a way to print out a more detailed error trace to identify the real leak issue?

Hope this information is sufficient enough for you to help us out with. I am really really excited about making this successful and use this library as a permanent library for our application!

Thanks,

Yogendra

Brett Wooldridge

unread,

Jun 11, 2016, 2:46:11 AM6/11/16

to HikariCP

Ok, let's take these one by one.

Logging. I can't help you much there, the setup you describe should work. I will note that your rootLogger may be misconfugured (but not sure). As far as I remember, the level comes first followed by appenders, like so:

log4j.rootLogger=info, SERVER, CONSOLE

"Doubt 2" and "Doubt 4" are caused by the same thing. Somehow you are creating two instances of the pool. The "already registered" JMX error reflects this, as does the 60 connections when 30 are configured.

The exception you pasted implies that some action was executed against a connection that was already closed. Possibly closed due to timeout?

Before drilling into that issue further the most important thing is to figure out how two pool instances are getting created, and getting logging working properly. Log4j and slf4j may themselves offer debug logging for troubleshooting logging problems.

Yogendra Joshi

unread,

Jun 11, 2016, 10:19:59 AM6/11/16

to HikariCP

Thanks Brett. We never faced this double connection issue with Apache DBCP. It always used to be in the max range of maxActive attribute of Apache DBCP. No code has changed so I am really not sure what could cause this. In one of the posts I saw earlier (that I am not finding anymore) I saw, you had also mentioned that we can setup a auto incrementer for the pool ID during startup. Can I give that a try so that even if 2 pools are getting created due to some weird issue, it would still enable the MBeans and I can view what is happening on the pool? If yes, can you please suggest how that should be done?

Also, have you ever faced an issue where the query goes in the hang mode? Yesterday night I tried increasing the pool size to 70 instead of 30, adding connectionTimeout property and reducing the idleTimeout to 2mins from 10mins but nothing has helped so far. Can you suggest what should be the ideal settings for DB connections to not timeout? considering the expire time is set to 10mins from DB side?

Thanks,

Yogendra

Brett Wooldridge

unread,

Jun 11, 2016, 11:36:07 AM6/11/16

to HikariCP

If you set no poolName, the pool name will be generated automatically and auto-increment. I have never seen a hang that wasn't caused by application-side code, not to say that it is not possible, but there are tens of thousands (possibly hundreds of thousands) of HikariCP users not encountering this issue. I suggest getting a thread-dump of the VM while the application appears to be hung. On top of that, debug logs will be nearly indispensible in troubleshooting. Debug logs will give you more visbility than just the MBean will give.

Regarding timeout ... 2 minutes, 10 minutes, whatever should all be fine. If your pool is constant size (no minimumIdle set), the idleTimeout has no effect (is not used). What is important is that the maxLifetime is set below the DB maximum lifetime -- with a margin of at least 1 minute, and that the DB maximum lifetime should be set below whatever firewall max lifetime may be in effect.

-Brett

Brett Wooldridge

unread,

Jun 11, 2016, 11:43:38 AM6/11/16

to HikariCP

As a footnote to the previous post, while it may be true that nothing changed except the pool (from DBCP to HikariCP), HikariCP is notably less tolerant to application bugs than other pools. Just one example but, for example, DBCP can close "abandoned" connections. HikariCP will never do that. An abandoned connection represents a leak in the application (which does lead to a hang, btw) and HikariCP would rather highlight that than simple cover it up by clearing the connection. Threading, connection keep-alive, connection retirement, are all different than DBCP and can therefore lead to different application behavior that was tolerated by another pool.

Having said that, there is always the possibility of a race condition in the pool that is unique to your application's usage pattern. Both a thread dump and debug logs can shed more light on that possibility.

-Brett

Yogendra Joshi

unread,

Jun 11, 2016, 5:28:26 PM6/11/16

to HikariCP

Thanks Brett.

I'll give the debug logs a shot to see why they are not working. Btw, do you think adding a validationQuery and setting something similar to testWhileIdle ( Apache DBCP) will help to get rid of the connection closed issue? I am currently profiling my job using JProfiler to see what the application is doing, but my intention is get your help in identifying if all the settings of HikariConfig are correct.

Thanks for all your comments uptil now.

Regards,

Yogendra

Brett Wooldridge

unread,

Jun 12, 2016, 5:14:50 AM6/12/16

to HikariCP

Hi Yogendra,

Running periodic validation queries (testWhileIdle) defects the database-side wait_timeout configuration, because it artifically creates traffic on the connection which prevents the database from ever retiring it. Sure, the pool can be configured to retire idle connections by itself, but we don't believe in taking power away from the DBA. If the DBA determines what he believes is the best idle timeout for the database, a single application should not be able to override that decision. That is what a construct like testWhileIdle does.

HikariCP will always test connections before handing it out from the pool, as long as it has sat in the pool for more than 500ms. If a connection is returned, and then borrowed again, within a 500ms window, we assume it is alive and skip the validation in that one case.

My feeling is that JProfiler is not likely to shed much light -- except maybe showing where threads are hung. However, JConsole/VisualVM can provide you with the same information in that case.

-Brett

hazeh...@gmail.com

unread,

Jun 15, 2016, 10:53:34 AM6/15/16

to HikariCP

Hi Brett,

Let me join to your conversation. Generally, Hikari looks perfect, but I'm worrying about recovering after any outages. I have some doubts about this 500ms period when a connection is returned without a validation. What is happened if an old invalid connection is touched so quickly that it never be validated? It is possible to write a test that hungs because of it, but I don't really know if it is a real problem.

What do you think about it?

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import static org.mockito.Mockito.mock;

import java.sql.Connection;
import java.sql.Statement;
import java.util.Hashtable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.IntStream;

import javax.naming.Context;
import javax.naming.Name;
import javax.naming.Reference;
import javax.naming.StringRefAddr;
import javax.sql.DataSource;

import org.h2.tools.Server;
import org.junit.Before;
import org.junit.Test;

import com.zaxxer.hikari.HikariJNDIFactory;

public class HirakiPoolTest {
    private Server server;
    private DataSource pooledDataSource;
    private AtomicInteger errors = new AtomicInteger();

    @Before
    public void setup() throws Exception {
        server = Server.createTcpServer().start();

HikariJNDIFactory jndiDataSourceFactory = new HikariJNDIFactory();

        Reference ref = new Reference(DataSource.class.getName());
        ref.add(new StringRefAddr("jdbcUrl", "jdbc:h2:tcp://localhost/mem:server_test"));
        ref.add(new StringRefAddr("username", "test"));
        ref.add(new StringRefAddr("password", "test"));
        ref.add(new StringRefAddr("driverClassName", "org.h2.Driver"));
        ref.add(new StringRefAddr("maximumPoolSize", "10"));
        ref.add(new StringRefAddr("connectionTimeout", "0"));
        ref.add(new StringRefAddr("maxLifetime", "0"));
        ref.add(new StringRefAddr("idleTimeout", "0"));
        ref.add(new StringRefAddr("poolName", "db"));

pooledDataSource = (DataSource)jndiDataSourceFactory.getObjectInstance(ref, mock(Name.class), mock(Context.class), new Hashtable<Object, Object>());
}

    @Test
    public void shouldReconnectAfterAnOutageWhenConnectionsArentChecked() throws Exception {
        errors.set(0);
        assertTrue(server.isRunning(false));
        executeSqlOnConnections();
        assertEquals(0, errors.get());

        errors.set(0);
        server.stop();
        assertFalse(server.isRunning(false));
        executeSqlOnConnections();
        assertEquals(10, errors.get());

        server.start();
        assertTrue(server.isRunning(false));
        do {
            errors.set(0);
            executeSqlOnConnections();
        }
        while (0 != errors.get());
    }

    private void executeSqlOnConnections() throws InterruptedException {
        ExecutorService executor = Executors.newFixedThreadPool(10);
        IntStream.range(0, 10).forEach(i -> executor.execute(() -> executeSql()));
        executor.shutdown();
        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
    }

    private void executeSql() {
        try (Connection conn = pooledDataSource.getConnection(); Statement stmt = conn.createStatement()) {
            stmt.executeQuery("select 1 from dual");
        } catch (Exception e) {
            errors.incrementAndGet();
        }
    }
}

Brett Wooldridge

unread,

Jun 15, 2016, 11:03:35 AM6/15/16

to HikariCP

That is a possibility. It is a balancing act between performance in the 99.9% case (no outage) and reliability in the 0.01% case (outage). If you validate the connection on every getConnection() call you double the number of queries and wire-traffic to the database. Almost nobody that I know of does this. If you truly want to force validation of every connection all the time, you can set the undocumented system property "com.zaxxer.hikari.aliveBypassWindowMs" to 0. If your application has relatively low DB queries per-second requirements, and the need for a 100% reliability guarantee, this may be an option you can consider.

-Brett

p.s. as an undocumented system property there is no guarantee that this property will continue to be honored in the future, but I can say that it has existed since the beginnings of the project and is unlikely to disappear anytime soon.

Yogendra Joshi

unread,

Jun 15, 2016, 12:38:55 PM6/15/16

to HikariCP

Thanks Brett.

Some updates from last we spoke:

1. I was able to turn on debugging for Hikari by fixing the log4j.properties. The rootLogger was set to WARN and hence I was not getting the logs, all good now. Also, as recommended by you, I removed the poolName property from XML and it is showing 2 Pools, both Hikari with name HikariPool-2 and HikariPool-1. I see that HikariPool-2 and HikariPool-1 shows the below stats:

HikariPool-2 - Pool stats (total=20, active=0, idle=20, waiting=0)

HikariPool-1 - Pool stats (total=20, active=0, idle=20, waiting=0)

HikariPool-2 - Pool stats (total=15, active=0, idle=15, waiting=0)

HikariPool-2 - Pool stats (total=16, active=0, idle=16, waiting=0)

HikariPool-1 - Pool stats (total=20, active=2, idle=18, waiting=0)

HikariPool-1 - Pool stats (total=13, active=1, idle=12, waiting=0)

Above is just for example, as there are too many instances. But does HikariPool-2 denote that it is NEVER used as the active count is always 0 and I verified this for all the pool stats for Pool2. Its always active=0.

Also, the DBA's set a property named SQLNET.EXPIRE_TIME=10 which denotes 10mins. (As per them, after every 10mins, they send a packet to DB and re-activate the connection). After they set this property, my batch program is no more going into hang state. However, After a point in time, 2Hrs, I get the following exceptions in the log:

java.sql.SQLException: Connection is closed

The above error started happening after I got this below error in the log:

HikariPool-1 - Closing connection oracle.jdbc.driver.T4CConnection@5c9e5c15: (connection is evicted or dead)

Is it something to do with idleTimeOut, maxLifeTime parameters that I have set in HikariConfig? For your reference, here is the updated Spring XML file

<props>

<prop key="driverClassName">${batch.jdbc.driver}</prop>

<prop key="user">${batch.jdbc.user}</prop>

<prop key="password">${batch.jdbc.password}</prop>

</props>

</property>

</bean>

If you can shed some light on what direction the changes should happen, that would be helpful.

Thanks,

Yogendra

Yogendra Joshi

unread,

Jun 15, 2016, 12:52:03 PM6/15/16

to HikariCP

1 more thing I noticed from the logs is the below error. There are bunch of these errors. Both for Pool 1 and Pool 2.

HikariPool-2 - Executed rollback on connection oracle.jdbc.driver.T4CConnection@d55c254e due to dirty commit state on close().

HikariPool-1 - Executed rollback on connection oracle.jdbc.driver.T4CConnection@61998126 due to dirty commit state on close().

Also, I verified the code and found that in the legacy code there is java code written to create Apache DBCP connections, do you think this Apache DBCP Pool is wrapped into HikariPool-2? Other than this, I did not find anything in the code OR XML which explains why 2 pools are being created?

Yogendra Joshi

unread,

Jun 18, 2016, 8:51:13 PM6/18/16

to HikariCP

Hello Brett,

Do you think you have any insights on the below couple queries I sent?

Thanks,

Yogendra

Brett Wooldridge

unread,

Jul 6, 2016, 9:59:05 AM7/6/16

to hika...@googlegroups.com

If somehow the HikariDataSource is being passed as the database for DBCP, or the other way around, that would be bad news in general. I can't really predict what the effect would be.

The log message mentioned below is basically not good. This indicates that the pool is autoCommit=false, some query was executed, and Connection.commit() was never called. If the query in question was an insert or update, it just got rolled back. If it was a simple select, probably no harm done, but its messy. The other possibility is that the pool is autoCommit=true, but some code somewhere is calling Connection.setAutoCommit(false), executing a query, and then calling close() without committing. Again, you'll want to try to chase these down in the code.

-Brett

Reply all

Reply to author

Forward