[RFC] Connection pools for AR

42 views
Skip to first unread message

Nick Sieger

unread,
Apr 20, 2008, 12:37:13 PM4/20/08
to rubyonra...@googlegroups.com
I've started a refactoring working towards creating a proper
connection pool class/object for ActiveRecord. Progress can be found
in my connection_pool branch on github [1]. Some notes about the work:

- One connection pool object created and cached per
AR::Base.establish_connection call.

- Connection pool manages the individual connections. Currently it's
still a hash of connections per thread but it should be easy to move
to a proper fixed pool with connection acquire/release functionality.

- At some point, in order to leverage fixed-size connection pools,
we'll need to find a way for application code to notify the pool that
it's done with the connection. A controller after_filter is one
possibility, I'm interested to hear other thoughts as well.

- Connection API has not changed significantly, but I hope to at least
make the connection pool API more sane and deprecate and/or remove a
lot of the cruft like
active_connection/clear_active_connections!/clear_reloadable_connections!/verify_active_connections!
etc. Anyone who uses these or knows the original intent of these,
please let me hear more about it. Their purpose seems less clear in
light of the refactoring so far.

- Synchronization monitors have been introduced around both connection
pool and connection access through a new active_support Module
extension I wrote that lets you easily apply synchronization at the
method level.

- allow_concurrency now simply flips between a real and null monitor
for connection pool access.

- The synchronization overhead is small, enough that I'm wondering
what people think about possibly making allow_concurrency default to
true (favoring safety over a small boost in speed of connection
acquisition). I ran the AR tests for mysql, postgresql and sqlite3
with master and my branch:

master
======
test_mysql: 43.058775 seconds.
test_sqlite3: 39.885374 seconds.
test_postgresql: 36.002755 seconds.

nicksieger/connection_pool
======
test_mysql: 44.72238 seconds.
test_sqlite3: 40.68397 seconds.
test_postgresql: 37.23015 seconds.

nicksieger/connection_pool w/ default allow_concurrency = true
======
test_mysql: 45.641348 seconds.
test_sqlite3: 43.405034 seconds.
test_postgresql: 37.850598 seconds.

(This is of course not an exhaustive benchmark, but at least gives you an idea.)

This might seem like YAGNI for a lot of you, but I think it has the
dual benefit of cleaning up the connection_specification code, plus
supporting the endgame for my standpoint, which is making the
connection pool implementation pluggable so that I can use the Java
application server's connection pool when running with JRuby.

Comments appreciated!
/Nick

[1]: http://github.com/nicksieger/rails/commits/connection_pool

Michael Koziarski

unread,
Apr 20, 2008, 7:39:16 PM4/20/08
to rubyonra...@googlegroups.com
> - One connection pool object created and cached per
> AR::Base.establish_connection call.

Excellent!

> - Connection pool manages the individual connections. Currently it's
> still a hash of connections per thread but it should be easy to move
> to a proper fixed pool with connection acquire/release functionality.
>
> - At some point, in order to leverage fixed-size connection pools,
> we'll need to find a way for application code to notify the pool that
> it's done with the connection. A controller after_filter is one
> possibility, I'm interested to hear other thoughts as well.

This will be the biggest change, we can do implicit checkout of a
connection when trying to access the database for the first time in a
thread, but implicit checkin seems to be asking for trouble. We can
handle it for ActionController easily enough by checking the
connections back in after the request has been dispatched.

So my first thought is:

Foo.find(:first) # retrieves connection
Thread.new do
Foo.find(:first) # retrieves another
Bar.find(:first) # uses existing
ActiveRecord::Base.release_connection
Foo.find(:first) # retrieves another
end

With retrieving blocking until the connection is available.

> - Connection API has not changed significantly, but I hope to at least
> make the connection pool API more sane and deprecate and/or remove a
> lot of the cruft like
> active_connection/clear_active_connections!/clear_reloadable_connections!/verify_active_connections!
> etc. Anyone who uses these or knows the original intent of these,
> please let me hear more about it. Their purpose seems less clear in
> light of the refactoring so far.

There are two cases that I'm aware of that lead to those 'interesting' methods.

* Reloading in development mode (the classes get undefined, so the
connection is gone)
* re-establishing timed out connections.

By moving the connections to the pool rather than the classes that get
reloaded, we avoid the first problem. Obviously checking out a
connection should imply that its been verified.

So with a proper 'pool' we can avoid all those methods, and improve
the code for the simple one-thread case too.

> This might seem like YAGNI for a lot of you, but I think it has the
> dual benefit of cleaning up the connection_specification code, plus
> supporting the endgame for my standpoint, which is making the
> connection pool implementation pluggable so that I can use the Java
> application server's connection pool when running with JRuby.

The code cleanup is pretty desperately needed, it's some hokey nasty
stuff ;). The dual benefit of getting jruby connection handling a
little easier is a definite win.

> Comments appreciated!

Rather than swapping in and out a different monitor, why not have two
different connection handler classes, one of which returns and manages
a single connection, and another which manages a pool of a fixed size.
The single connection version can expose the same API (checkout /
check in / release) and perhaps raise exceptions if accessed from
threads other than the one which created the connection.

new-connection-per-thread isn't a particularly great idea and I'd be
open to dropping it entirely.

Thanks for your work on this, it'll be good to tidy this up.

There are still a bunch of outstanding issues to be decided,
especially relating to objects retrieved in one thread and being
worked-on in another. This messes up :lock=>true etc. But let's tidy
up the code first, then decide what level on concurrent use we want to
support.

> /Nick
>
> [1]: http://github.com/nicksieger/rails/commits/connection_pool
>
> >
>

--
Cheers

Koz

Josh Peek

unread,
Apr 20, 2008, 11:17:05 PM4/20/08
to Ruby on Rails: Core
Fucking Awesome!

Now what the hell am I going to work on for GSoC? O yeah, AP is still
full of cruft.

Nick Sieger

unread,
Apr 21, 2008, 1:06:53 PM4/21/08
to rubyonra...@googlegroups.com
On Sun, Apr 20, 2008 at 6:39 PM, Michael Koziarski
<mic...@koziarski.com> wrote:
> This will be the biggest change, we can do implicit checkout of a
> connection when trying to access the database for the first time in a
> thread, but implicit checkin seems to be asking for trouble. We can
> handle it for ActionController easily enough by checking the
> connections back in after the request has been dispatched.

Agreed. Implicit check-in seems bad. Pratik suggested in
#rails-contrib that we might use a Dispatcher hook instead of a
controller filter, which makes sense. Get it out of the way of the
application code so it doesn't get touched inadvertently.

> So my first thought is:
>
> Foo.find(:first) # retrieves connection
> Thread.new do
> Foo.find(:first) # retrieves another
> Bar.find(:first) # uses existing
> ActiveRecord::Base.release_connection
> Foo.find(:first) # retrieves another
> end
>
> With retrieving blocking until the connection is available.

I realized we might also be able to make
ActiveRecord::Base.transaction do its own checkout/checkin of the
connection. There will probably be more opportunities to optimize this
later too. For example, making ActiveRecord::Base.connection accept a
block and start to change places in the framework to use the block
form instead of the return the connection/implicit checkout version. I
haven't looked too closely to see what the potential is there, though.

/Nick

Nick Sieger

unread,
Apr 21, 2008, 1:07:08 PM4/21/08
to rubyonra...@googlegroups.com
On Sun, Apr 20, 2008 at 10:17 PM, Josh Peek <josh...@gmail.com> wrote:
>
> Fucking Awesome!
>
> Now what the hell am I going to work on for GSoC? O yeah, AP is still
> full of cruft.

LOL, thanks! Heh, sorry to steal your thunder. I'm still very much in
favor of your proposal, and the rest of the guys on the JRuby team are
too, as it's a big deal for JRuby w/ non-GIL native threading. There's
still a lot to do, and this work in AR is actually pretty small in
comparison.

Cheers,
/Nick

Josh Peek

unread,
Apr 21, 2008, 7:27:01 PM4/21/08
to Ruby on Rails: Core
You're doing a better job than I would of. You know what you want from
the connection pool to do the JRuby voodoo you were talking about.
That makes you more qualified ;)

On Apr 21, 12:07 pm, "Nick Sieger" <nicksie...@gmail.com> wrote:

Tom Ward

unread,
Apr 22, 2008, 3:24:39 AM4/22/08
to rubyonra...@googlegroups.com

You might want to look at this very old patch, which includes another
(old) adapter pool implementation and some tests:

http://dev.rubyonrails.org/attachment/ticket/2162/connection_pool_test.rb

It used the approach of acting as an actual adapter, using
method_missing to handle checkin/checkout before delegating to the
underlying connection. The disadvantage of this is clearly that the
overhead of checkin/checkout occurs for every transaction. The
advantage though is that the rest of the application need not be aware
of connection management at all.

Tom

Reply all
Reply to author
Forward
0 new messages