ActiveRecord connection pool management

352 views
Skip to first unread message

Michael Vigilante

unread,
Sep 26, 2015, 2:11:17 PM9/26/15
to Ruby on Rails: Core
Hi, just a small question on the way connections are managed in ActiveRecord.

From the best I can gather reading the code (and based on behaviour), ActiveRecord will check a connection out of the pool the first time a thread asks for one, then that thread will continue to hang on to the connection until it is closed. What is the reasoning behind this approach as opposed to checking out a connection for each DB operation and letting go of it once the operation is completed?

My team has had a couple of issues with this (leading to us needing to use a connection pool equal to the number of threads being used by our server, which is not really a big problem, but feels a bit counter-intuitive). This post is on much the same subject. I'm really just curious about the way this works.

Aaron Patterson

unread,
Sep 26, 2015, 4:54:16 PM9/26/15
to rubyonra...@googlegroups.com
On Thu, Sep 24, 2015 at 10:42:39PM -0700, Michael Vigilante wrote:
> Hi, just a small question on the way connections are managed in
> ActiveRecord.
>
> From the best I can gather reading the code (and based on behaviour),
> ActiveRecord will check a connection out of the pool the first time a
> thread asks for one, then that thread will continue to hang on to the
> connection until it is closed. What is the reasoning behind this approach
> as opposed to checking out a connection for each DB operation and letting
> go of it once the operation is completed?

Constantly checking out a connection then checking it back in seems like
it would cause a performance bottleneck. The assumption is that lock
contention will become the bottleneck if we have to constantly check in
/ check out connections. The other side of that is it will cause more
complexity throughout AR as we'll have to change every place that
touches the connection.

Doing it once per thread, and giving the user control over checking in
connections (see `release_connection` on the connection pool) seemed
like a good compromise for performance and code complexity.

> My team has had a couple of issues with this (leading to us needing to use
> a connection pool equal to the number of threads being used by our server,
> which is not really a big problem, but feels a bit counter-intuitive).

Is it counter-intuitive? If you're doing database intensive work, then
you'd probably want the same number of connections as you have request
threads in order to maximize throughput, otherwise you'll still be
blocking other requests at some point (see Amdahl's law).

> This pos
> <https://groups.google.com/d/msg/rubyonrails-core/Z4eiyPnGjwk/21a5KSL_YlIJ>t
> is on much the same subject. I'm really just curious about the way this
> works.

If we can prove that the concerns above are unwarranted, then it would
be fine to change.

--
Aaron Patterson
http://tenderlovemaking.com/

Michael Vigilante

unread,
Sep 27, 2015, 4:23:44 AM9/27/15
to Ruby on Rails: Core, tende...@ruby-lang.org
On Sunday, 27 September 2015 06:54:16 UTC+10, Aaron Patterson wrote:
Is it counter-intuitive? If you're doing database intensive work, then
you'd probably want the same number of connections as you have request
threads in order to maximize throughput, otherwise you'll still be
blocking other requests at some point (see Amdahl's law).

Yes, in case you're doing heavy database work you certainly want to have enough connections available to avoid contention. In our case, we had a couple of short database transactions wrapped around other computational work, including a call to an external service (we know). The thing we found counter-intuitive was that when the external call got slow (as they always do), we started being unable to checkout database connections. Not a particularly common or desirable use-case, I know, and I now get that we can work around this by manually checking out connections, so that part is good.

I'm certainly not suggesting that optimising for the above case is a good idea; like I said, I was just interested. I don't really come from a web or Ruby background, I'm used to resource sharing systems either forcing you to explicitly checkout/checkin manually or automatically checkout/checkin when you request the resource. I guess threads in web are pretty short lived so it wouldn't normally be an issue.

On Sunday, 27 September 2015 06:54:16 UTC+10, Aaron Patterson wrote:
Constantly checking out a connection then checking it back in seems like
it would cause a performance bottleneck.  The assumption is that lock
contention will become the bottleneck if we have to constantly check in
/ check out connections.
 
I dunno, it feels to me like under normal circumstances you'd not see any more lock contention. I would guess that under the current scheme, the first few web transactions would be fast (no locking at all), but after that you'd end up waiting on average about the same amount of time (since each thread has to wait for a connection before it can get started and then needs to do all its operations before it checks it back in). I'm curious, I'll see if I can mock something up to test this using the existing checkout/checkin functionality.

Aaron Patterson

unread,
Sep 28, 2015, 10:46:14 AM9/28/15
to Michael Vigilante, Ruby on Rails: Core, tende...@ruby-lang.org
On Sun, Sep 27, 2015 at 01:23:44AM -0700, Michael Vigilante wrote:
> > Constantly checking out a connection then checking it back in seems like
> > it would cause a performance bottleneck. The assumption is that lock
> > contention will become the bottleneck if we have to constantly check in
> > / check out connections.
> >
>
> I dunno, it feels to me like under normal circumstances you'd not see any
> *more* lock contention. I would *guess* that under the current scheme, the
> first few web transactions would be fast (no locking at all), but after
> that you'd end up waiting on average about the same amount of time (since
> each thread has to wait for a connection before it can get started

I don't really follow. If you have 5 request threads, and 5 connections
available, no request thread will contend on a lock for a connection
(unless you are using threads, which means you'd have 1 + new threads.count
connections used, which I said is off the beaten path).

> and then
> needs to do *all* its operations before it checks it back in). I'm curious,
> I'll see if I can mock something up to test this using the existing
> checkout/checkin functionality.

Sounds good! :D

Michael Vigilante

unread,
Sep 28, 2015, 6:37:43 PM9/28/15
to Ruby on Rails: Core, vigilant...@gmail.com, tende...@ruby-lang.org
On Tuesday, 29 September 2015 00:46:14 UTC+10, Aaron Patterson wrote:
I don't really follow.  If you have 5 request threads, and 5 connections
available, no request thread will contend on a lock for a connection
(unless you are using threads, which means you'd have 1 + new threads.count
connections used, which I said is off the beaten path).
 
Sorry, I meant in the case where you have, say, a pool of five connections and ten request threads (which is entirely possible given the default Rails and, say Puma configuration). Where you have a thread per connection, I'd expect the existing model to perform very slightly better because you only have to go through the process of acquiring the connection once per thread (rather than going through the locking/synchronising process multiple times). I'd expect them to wait about the same time in this situation when all (or at least a large majority) of the work being done by the app is in the database; the more time you spend processing in the app, the more you could benefit from releasing the connection between database work.

Working on that test case right now, will get back to you :)

Michael Vigilante

unread,
Sep 29, 2015, 1:01:23 AM9/29/15
to Ruby on Rails: Core, vigilant...@gmail.com, tende...@ruby-lang.org

Aaron Patterson

unread,
Sep 30, 2015, 1:12:25 PM9/30/15
to Michael Vigilante, Ruby on Rails: Core, tende...@ruby-lang.org
On Mon, Sep 28, 2015 at 10:01:23PM -0700, Michael Vigilante wrote:
> OK, mock-up is here:
> https://github.com/stranger-zedd/activerecord-connection-pool-tests

Ah, I see what you're saying now. TBQH, as it really seems fine if we
internally do check connections in. I just don't know how much of an
effort it will require. I suspect there are maybe 1 or 2 places that
we would have to change to get the most benefit. The main thing I worry
about is checking out a connection where we don't actually need to do
any database work (like getting the schema cache).

If you're interested in finding the places we need to fix, I'd be happy
to merge the patch! :)

Xavier Noria

unread,
Sep 30, 2015, 1:30:55 PM9/30/15
to rubyonrails-core, Michael Vigilante, tende...@ruby-lang.org
"it checks out a connection separately for each database action"

Transactions are per connection, how does that approach handle them?

James Coleman

unread,
Sep 30, 2015, 6:54:41 PM9/30/15
to rubyonra...@googlegroups.com, Michael Vigilante, tende...@ruby-lang.org
Xavier,

As far as I know, Rails already doesn't wrap the entire request in a transaction, so that shouldn't be an issue.

Unless what you're saying is that operations already wrapped in a transaction block would be affected.

On Wed, Sep 30, 2015 at 1:29 PM, Xavier Noria <f...@hashref.com> wrote:
"it checks out a connection separately for each database action"

Transactions are per connection, how does that approach handle them?

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
Visit this group at http://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

Michael Vigilante

unread,
Sep 30, 2015, 9:35:27 PM9/30/15
to Ruby on Rails: Core, vigilant...@gmail.com, tende...@ruby-lang.org
On Thursday, 1 October 2015 03:30:55 UTC+10, Xavier Noria wrote:

Transactions are per connection, how does that approach handle them?

From what I can tell, ActiveRecord only provides one way to manually open a transaction, which is a `transaction do ... end` block; in this case, if the connection is checked out and in around `transaction`, it will still work OK; you'll have the same connection and the same transaction for everything within that block.

Aaron, I have no idea if I'll be able to work that out, but I'll give it a shot.

Xavier Noria

unread,
Oct 1, 2015, 4:26:28 AM10/1/15
to rubyonrails-core
The only time I've seen one connection per thread being an issue was in one app that ran many processes and it started to reach the connection limit of their db server in traffic peaks. Even in that scenario, contention is also a risk if you reduce the pool.

Other than the mental image that you're using less resources (at the price of contention), I am not sure there is going to be any significant practical win. It could be, I am not saying it wouldn't (I have not done your study), but at first sight the cost/benefit is not clear to me in practical terms.

Regarding transactions, #execute is public AR interface, and

    AR::Base.connection.execute('START TRANSACTION')

is valid AR code, I am not sure if drivers know the connection is in a transaction and have API to check, but #transaction_open? returns false as of this writing. Why would anybody do that? I don't know, maybe because they are writing a heavy SQL oriented script and doing that manually feels natural... it doesn't matter, it can be done.

Another practical point is that when a connection is checked out the connection pool tests if it is alive issuing for example SELECT 1. That would mean that if a request performs today 200 queries, they would become 400 queries. I don't know if the alive check could be rethought to run less frequently, but it probably should to avoid that 2x.



Matt Jones

unread,
Oct 1, 2015, 10:45:47 AM10/1/15
to rubyonrails-core
If the concern is long-running threads holding DB connections they don't need, wouldn't a simpler solution be to explicitly return connections to the pool (via release_connection) before doing a long-running non-DB operation? Checking out and back in on most operations seems like optimizing for that uncommon case at a runtime cost for the common one.

--Matt Jones

richard....@gmail.com

unread,
Oct 1, 2015, 10:57:27 AM10/1/15
to rubyonra...@googlegroups.com, rubyonrails-core
We run into problems with connection pools on Heroku.

In Postgres a connection is very expensive. The cheap/free plans on Heroku Postgres all have very small connection limits. The hobby plan is limited to 20 connections https://devcenter.heroku.com/articles/heroku-postgres-plans.

Some people will build really small rack apps that use active record, and want to max out their processes/threads. It's not that common but it happens.

More commonly people don't know about this behavior and get ActiveRecord::ConnectionTimeoutError because they're using default Puma which is set to 16 threads and default AR which is set to 5 connections. I wrote these docs to help people understand the connection requirements of AR https://devcenter.heroku.com/articles/concurrency-and-database-connections

Our larger customers end up hitting our max conneciton limit of our "production" plans which is 500 connections. They end up having to run PgBouncer which does per machine connection multiplexing https://devcenter.heroku.com/articles/concurrency-and-database-connections#limit-connections-with-pgbouncer 



---
Richard Schneeman

James Coleman

unread,
Oct 1, 2015, 11:00:55 AM10/1/15
to rubyonra...@googlegroups.com
Agreed.

The biggest benefit is going to come to people using a DB like Postgres where each connection uses an entire backend (forked worker essentially), and, given the high cost of backends, it's recommended to keep the number of connections low.

Checking connections in/out of the pool as necessary would, I think, significantly many medium or larger applications using Postgres as it would make it easier to scale the number of Rails backends without impacting the number of connections to PG required.

Matt Jones

unread,
Oct 1, 2015, 3:28:01 PM10/1/15
to rubyonrails-core
On Thu, Oct 1, 2015 at 4:25 AM, Xavier Noria <f...@hashref.com> wrote:
Good point. That's an additional overhead, especially for systems with DBs that are farther away.

One other point, possibly MySQL-specific: there are connection-specific SQL variables. Under the current system, code can rely on them being set after they are set in the same thread:

ActiveRecord::Base.connection.execute("SET SQL_MODE = ''")
# now SQL_MODE for the connection is set to empty string
SomeModel.create(...)

With per-DB-interaction checkin/checkout, it doesn't appear to be possible to reliably manipulate these variables, as the connection used in the first statement may not match the one used in the second. More amusing / worrisome, somebody ELSE gets a connection with an altered SQL_MODE...

--Matt Jones

Michael Vigilante

unread,
Oct 4, 2015, 11:37:56 PM10/4/15
to Ruby on Rails: Core
On Friday, 2 October 2015 05:28:01 UTC+10, Matt jones wrote:

With per-DB-interaction checkin/checkout, it doesn't appear to be possible to reliably manipulate these variables, as the connection used in the first statement may not match the one used in the second. More amusing / worrisome, somebody ELSE gets a connection with an altered SQL_MODE...

I would personally actually prefer to have to open and close a connection around variables like this, if only because it would serve as a reminder to set, say, SQL_MODE back to what it was; it only feels slightly harder in the current system to hand somebody else a connection with an altered SQL_MODE.


On Thu, Oct 1, 2015 at 4:25 AM, Xavier Noria <f...@hashref.com> wrote:
Other than the mental image that you're using less resources (at the price of contention), I am not sure there is going to be any significant practical win. It could be, I am not saying it wouldn't (I have not done your study), but at first sight the cost/benefit is not clear to me in practical terms.

It's not so much the mental image that you're using less resources as the actual ability to use less resources. The practical benefit is that for applications which don't use the database to do all their grunt work (which seems to be most of them; rendering seems to take at least as long as a DB round trip for at least most things), you don't have to saturate your threads with database connections, which means you can scale your app further.

I guess my theory was that there would be negligibly more contention, and my research seems to have backed that up, especially given the below. Happy to be proven wrong!

On Thu, Oct 1, 2015 at 4:25 AM, Xavier Noria <f...@hashref.com> wrote:
Regarding transactions, #execute is public AR interface, and
 
     AR::Base.connection.execute('START TRANSACTION')
 
is valid AR code, I am not sure if drivers know the connection is in a transaction and have API to check, but #transaction_open? returns false as of this writing. Why would anybody do that? I don't know, maybe because they are writing a heavy SQL oriented script and doing that manually feels natural... it doesn't matter, it can be done.

This is certainly an issue; I'm not sure how to get around it without either connection-per-thread or #transaction_open?. It does feel like a bad thing to do, though; is keeping support for this really that important?


On Thu, Oct 1, 2015 at 4:25 AM, Xavier Noria <f...@hashref.com> wrote:
 Another practical point is that when a connection is checked out the connection pool tests if it is alive issuing for example SELECT 1. That would mean that if a request performs today 200 queries, they would become 400 queries. I don't know if the alive check could be rethought to run less frequently, but it probably should to avoid that 2x.

This is also an excellent point; I wasn't aware this was the case. It certainly feels like a waste to be doing this every time a database operation is run. I'm not sure what an appropriate level of alive-checking is; I'll see if I can find out how other systems do this.

Michael Vigilante

unread,
Oct 8, 2015, 6:04:51 PM10/8/15
to Ruby on Rails: Core

On Monday, 5 October 2015 14:37:56 UTC+11, Michael Vigilante wrote:
This is also an excellent point; I wasn't aware this was the case. It certainly feels like a waste to be doing this every time a database operation is run. I'm not sure what an appropriate level of alive-checking is; I'll see if I can find out how other systems do this.

OK; looks like other systems either do an alive-check on each checkout (I don't think this necessarily needs to run a query on the database; for example, JDBC provides a `connection#isAlive` method; I can't find any source for what this actually does though. Maybe it does just run a query), or they do it scheduled in a background thread, which probably isn't an option here because of the GIL.

My guess is that because my tests were run with the database so close to the app (on the same machine), this wasn't really an issue; I guess latency would be the real performance-killer in this case. I'll rerun the tests with an external database just to be sure.

The other issue I noticed (which I just realised I hadn't mentioned) was that I had to disable the query cache middleware in order to make checking in and out connections possible; I'm not entirely sure what that middleware is for (it appears to just semi-temporarily enable the query cache on all requests?), but it poses a problem to checkin-checkout (even in the manual sense).
Reply all
Reply to author
Forward
0 new messages