DB pool exhaustion condition

40 views

Skip to first unread message

David Hagan

unread,

Apr 21, 2015, 4:21:22 AM4/21/15

to lif...@googlegroups.com

Hi,

I'm trying to track down a poorly described bug in our deployment which involves a large shared database (oracle). The description of the bug is that "during the week when the database was patched, the app became flaky and needed to be restarted a number of times". To make matters worse, I'm relatively confident that another business unit decided to load-test the application while that was happening. Even worse, the application logs were discarded when it was restarted, so I'm afraid I can't include stack-traces from the event. I know that none of that description is highly helpful, and I'm not looking for assistance in diagnosing a poorly described and unlogged event.

During my investigation into trying to resolve the issue, I began down the path of investigating the database connection pool. When looking through the StandardDBVendor's connection pooling behavior, it occurred to me that perhaps there's a condition we can reach where the pool becomes starved. I'm not sure whether that's the cause of the issue which I'm trying to fix, but it fits the symptoms, so I'm presently experimenting with a slightly different logic to address it, and I'll include my current thoughts below. What I'm seeing is in the following file:

https://github.com/lift/framework/blob/master/persistence/db/src/main/scala/net/liftweb/db/DB.scala

at line 1170, and relates to the incrementing of the "poolSize" counter when creating a new connection. It seems that the poolSize counter is increased whether the createOne call returned a full or empty box. I'm not sure whether this is related to the issue my deployment was having, but I'm hypothesizing that while the database was being patched and the load test was running, the createOne method kept returning failures, but the poolSize counter kept increasing, until finally the system had neither any live connections, nor any connections in the idle pool (the private "pool" variable), and had reached the "doNotExpandBeyond" limit. Once the poolSize counter is at the doNotExpandBeyondLimit, it seems to me that no new connections will be created, and if there are no connections in the idle pool, then it's possible for the application to be incapable of establishing new connections to the databse.

Am I correctly understanding the flow of logic in the db connection pool implemented in StandardDBVendor?

My experimental implementation in my own codebase is to change the behavior of the:

case Nil if poolSize < tempMaxSize

branch, to only increase the poolSize counter if the createOne returned a full box. To be precise, I've changed:

ret.foreach(_.setAutoCommit(false))
poolSize = poolSize + 1

ret.foreach(conn => {
  conn.setAutoCommit(false)
  poolSize = poolSize + 1
})

I don't have a lot of evidence yet for the benefit of this. Right now my reason to pursue this particular change is that I've been stress-testing my application and it has seemed that after a significant database network outage (or when the database admins forcefully terminate my sessions) the connection pool reduces in total speed. This particular change has, so far, yielded a better result - the pool appears to resume its pre-outage speed after the outage is resolved.

Can you provide any advice on my understanding of the database connection pool, my hypothesis or my workaround?

Thanks,

-Dave

Reply all

Reply to author

Forward

0 new messages