Notification on new database connections (ticket #6064)

Matthew D. Hancher

unread,

Nov 16, 2008, 8:14:21 PM11/16/08

to django-developers digest subscribers

Hi all,

I've been playing with adding sqlite3 back-end support to GeoDjango,
using the SpatiaLite extension. This requires executing some magic
SQL each time you connect to the database, to enable the spatial
extensions. Ticket #6064 seems like this right way to do this, by
causing the database to send a signal each time a new connection is
opened, which GeoDjango can catch. This works like a charm, and I
just uploaded an updated patch to that ticket, since it hadn't been
touched for a year and had gotten quite stale.

I'm curious if anyone thinks there's a better way to do this (i.e.
cause magic SQL to be executed on each database connection). I'm also
curious whether the test in my patch passes for other back-ends, like
oracle and whatnot.

Finally, it occurred to me that a signal on *cursor* creation could
also be useful, and could easily be added at the same time. (In fact
in the old patch the connection_created signal actually erroneously
behaved like a cursor_created signal for some back-ends.)

Thoughts?

Matt

Matt Hancher
Intelligent Systems Division
NASA Ames Research Center
Matthew....@nasa.gov

Malcolm Tredinnick

unread,

Nov 16, 2008, 8:55:11 PM11/16/08

to django-d...@googlegroups.com

On Sun, 2008-11-16 at 17:14 -0800, Matthew D. Hancher wrote:
> Hi all,
>
> I've been playing with adding sqlite3 back-end support to GeoDjango,
> using the SpatiaLite extension. This requires executing some magic
> SQL each time you connect to the database, to enable the spatial
> extensions. Ticket #6064 seems like this right way to do this, by
> causing the database to send a signal each time a new connection is
> opened, which GeoDjango can catch. This works like a charm, and I
> just uploaded an updated patch to that ticket, since it hadn't been
> touched for a year and had gotten quite stale.

Modulo my question about signal overhead below, this seems like a decent
approach. It's kind of the reason for signals existing.

A random thought: is there any other information worth sending along
with the signal? Right now, the receiver is told "a connection was
created". Anything that's likely to vary and that could be useful as a
trigger for other actions? I can't immediately think of anything, but
I'll throw it out there in case I've overlooked something.

> I'm curious if anyone thinks there's a better way to do this (i.e.
> cause magic SQL to be executed on each database connection). I'm also
> curious whether the test in my patch passes for other back-ends, like
> oracle and whatnot.
>
> Finally, it occurred to me that a signal on *cursor* creation could
> also be useful, and could easily be added at the same time. (In fact
> in the old patch the connection_created signal actually erroneously
> behaved like a cursor_created signal for some back-ends.)

I'm not up to speed these days on the overhead for signal emission. We
create a lot of database connections, for better or worse (as you no
doubt realise, our connection management strategy is "one per request").
Is the impact noticeable for those doing nothing with the signals? I
suspect it's not hard to test, but I'm going to be lazy and not do it
myself (I suspect my laptop isn't quite production-quality hardware in
any case).

I'm generally in favour of the idea, though. Looks like a reasonable use
for signals and your use-case seems like a typical situation that will
need this.

Regards,
Malcolm

alex....@gmail.com

unread,

Nov 16, 2008, 9:11:47 PM11/16/08

to Django developers

In terms of signal overhead, the most likely case here is probably no
receivers, and in the signal refactor, that particular case is now
hugely faster.

On Nov 16, 8:55 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Malcolm Tredinnick

unread,

Nov 16, 2008, 9:20:54 PM11/16/08

to django-d...@googlegroups.com

On Sun, 2008-11-16 at 18:11 -0800, alex....@gmail.com wrote:
> In terms of signal overhead, the most likely case here is probably no
> receivers, and in the signal refactor, that particular case is now
> hugely faster.

"Hugely" being, of course, a highly scientific measurement lending
itself to accurate comparison against future changes. :-)

Yes, I know it's faster now and I realise that's the common case. But
what's the impact of adding one or two new signal calls per request. I
suspect it's minimal, but I haven't measured it and this is the first
time it's come up since the refactor, so it's probably time to know what
we're agreeing to for each new signal.

Regards,
Malcolm

Russell Keith-Magee

unread,

Nov 16, 2008, 9:49:09 PM11/16/08

to django-d...@googlegroups.com

This sort of benchmarking would be helpful to have kicking around. One
of the features on the table for v1.1 is to add signals for activity
on m2m operations. The biggest impediment to introducing these signals
is the overhead (or, at least, the perception that there will be an
overhead) associated with adding a signal for a common operation.

I would like to see m2m signals introduced (overhead permitting), so
I'll probably have to work up a set of benchmarks at some point so
that we can have the debate over some concrete numbers rather than
perceptions and speculation. If someone else were to save me the
effort of having to write these benchmarks myself, I would buy them a
lollipop* :-)

Russ Magee %-)

* Lollipop offer valid while stocks last. ISO standard size and
flavor. Offer void where prohibited.

alex....@gmail.com

unread,

Nov 16, 2008, 10:21:07 PM11/16/08

to Django developers

Well, Jeremy Dunck was kind enough to do a benchmark against the old
system: http://groups.google.com/group/django-developers/browse_thread/thread/815f76ad7e823cbf/b008a757fbdefa2b

On Nov 16, 9:49 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

> On Mon, Nov 17, 2008 at 11:20 AM, Malcolm Tredinnick
>
>
>

> <malc...@pointy-stick.com> wrote:

Malcolm Tredinnick

unread,

Nov 16, 2008, 10:53:02 PM11/16/08

to django-d...@googlegroups.com

On Sun, 2008-11-16 at 19:21 -0800, alex....@gmail.com wrote:
> Well, Jeremy Dunck was kind enough to do a benchmark against the old
> system: http://groups.google.com/group/django-developers/browse_thread/thread/815f76ad7e823cbf/b008a757fbdefa2b

Since we're no longer using the old system, that doesn't tell us how
much slower adding a new signal to the new system is. You seem to be
missing the point: It's not a huge deal, but we might as well work out
this impact now since, as Russell and I have both noted, it's the first
time we're looking at adding a new signal and there are other cases on
the table (I'm also in favour of the many-to-many case, for example).

Malcolm

Russell Keith-Magee

unread,

Nov 16, 2008, 11:03:44 PM11/16/08

to django-d...@googlegroups.com

On Mon, Nov 17, 2008 at 12:21 PM, alex....@gmail.com
<alex....@gmail.com> wrote:
>
> Well, Jeremy Dunck was kind enough to do a benchmark against the old
> system: http://groups.google.com/group/django-developers/browse_thread/thread/815f76ad7e823cbf/b008a757fbdefa2b

I knew about these. While these are great for establishing the
relative improvement offered by the new signals, they don't really
address the question of absolute performance. For an signal with no
listeners, the new signals may be 67% faster, but faster than what?

I know this is a horribly nebulous question (like all benchmarking),
and it's completely dependent on the speed of your machine and a
million other factors. However, if we are going to start adding
signals to very common operations (like m2m and opening connections),
we need to know what sort of overhead we are adding in absolute terms.

Yours,
Russ Magee %-)

alex....@gmail.com

unread,

Nov 16, 2008, 11:10:54 PM11/16/08

to Django developers

Right, I understand the need for absolute numbers, I just meant that
we don't need to start rewriting new benchmarks if Jeremy still has
the benchmark script.

Alex

On Nov 16, 11:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:
> On Mon, Nov 17, 2008 at 12:21 PM, alex.gay...@gmail.com

>
> <alex.gay...@gmail.com> wrote:
>
> > Well, Jeremy Dunck was kind enough to do a benchmark against the old

> > system:http://groups.google.com/group/django-developers/browse_thread/thread...

Matthew D. Hancher

unread,

Nov 17, 2008, 3:01:01 PM11/17/08

to django-developers digest subscribers

On Nov 16, 11:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com> wrote:

I know this is a horribly nebulous question (like all benchmarking),

and it's completely dependent on the speed of your machine and a

million other factors. However, if we are going to start adding

signals to very common operations (like m2m and opening connections),

we need to know what sort of overhead we are adding in absolute terms.

Okay, I decided to do a bit of profiling to keep the conversation moving. None of this is particularly rigorous stuff, but I figure it's a start.

(The uncertainties shown below are the standard deviations across repeated runs divided by the square root of the number of runs, with ten-percentile thresholds for outlier rejection, all of which makes for a rough but useful measure. This was all on my development machine, not a pristine test environment (which I don't have), so lord knows what external factors may have been involved.)

I began by examining the impact of adding the connection_created signal in a toy end-to-end test, with a local client requesting a reasonably normal page from a simple blog app, using sqlite3 and the development server. The signal listeners here were pretty simple, just incrementing a connection counter. Here is what I saw.

  Page request performance, in seconds per 100 queries:
  trunk (no listeners): 2.624 +/- 0.009
  6064 (no listeners):   2.624 +/- 0.005
  6064 (1 listener):     2.624 +/- 0.006
  6064 (100 listeners): 2.680 +/- 0.005

The differences (or lack thereof) between the first three timings here are not statistically significant, but it is clear that by the time I get to 100 listeners I'm seeing a real performance hit. In order to better measure the impact of adding the signal even when no listeners are present, I zoomed in on the database connection/disconnection cycle itself (again using sqlite3 with a database on disk).

  Database connect/close performance, in seconds per 10000 cycles:
  trunk (no listeners): 1.5664 +/- 0.0010
  6064 (no listeners):   1.5895 +/- 0.0011
  6064 (1 listener):     1.7115 +/- 0.0015
  6064 (100 listeners): 4.5087 +/- 0.0036

So, adding the connection_created signal had a roughly 1% impact on the speed of the connection/disconnection cycle, and adding a fairly simple listener to that signal resulted in a total of a 9% impact.

Now we can put the pieces together to get a little bit of perspective. Evidently the database connection/disconnection cycle accounts for roughly 0.6% of the processing time for the query in question (on this computer, with this configuration). We can use this to estimate the impact on the page request performance.

Estimated impact of 6064 on page request performance:
6064 (no listeners): 0.009%

6064 (1 listener): 0.055%
6064 (100 listeners): 1.12% (measured 2.13% +/- 0.54%)

There's a fair amount of hand-waving involved in those estimates, but by looking at the 100-listener case (where the impact on page requests was measurable) we can see that the estimate lands us within a factor of two of the measured value. The difference could be explained in any number of ways, but the upshot is that a true impact of 0.02% or more on page requests for the no-listener case would be very plausible.

Matt

alex....@gmail.com

unread,

Nov 17, 2008, 3:27:08 PM11/17/08

to Django developers

Matthew, would you mind sticking the script you used to test this up
on dpaste?

Alex

On Nov 17, 3:01 pm, "Matthew D. Hancher" <Matthew.D.Hanc...@nasa.gov>
wrote:

> Matthew.D.Hanc...@nasa.gov

Jacob Kaplan-Moss

unread,

Nov 17, 2008, 3:37:41 PM11/17/08

to django-d...@googlegroups.com

On Mon, Nov 17, 2008 at 2:01 PM, Matthew D. Hancher
<Matthew....@nasa.gov> wrote:
> Okay, I decided to do a bit of profiling to keep the conversation moving.

I did too; I took a stab at measure the raw speed of calling signals.
My code's at http://gist.github.com/25892; the output looks like:

Nothing : 0.00283 (0.000000028 percall)
Plain func call : 0.07054 (0.000000705 percall)
Signal; 0 handlers: 0.09331 (0.000000933 percall)
Signal; 1 handler : 0.79125 (0.000007912 percall)
Signal; 10 handlers: 3.36051 (0.000033605 percall)
Signal; 100 handlers: 27.56269 (0.000275627 percall)

The raw numbers are pretty much useless (run on a machine while doing
about seventy other things), but you can see that:

* In Python, calling a function (``handle()``) takes 25 times as long
as doing nothing (``pass``).

* Dispatching a signal when there's no handlers is about 1.3 times a
function call.

* Dispatching a signal when there's one handler is about 11 times the
cost of a function call.

* Addition of additional receivers scales linearly.

This is about what I'd expected: calling a un-handled signal is
extremely cheap -- 1.3x the function overhead is as close to free as
you can get. The first listener is expensive; remaining ones cost O(N)
time.

Given that, I'm generally going to be -1 on adding any non-essential
signal to Django that's *connected by default* -- the overhead is too
much, so internal uses should just use plain old function calls.
However, I don't see that it's *too* bad to add a signal (like
connection-created) with no listeners... but we should be careful in
the documentation to clearly explain the substantial overhead
involved.

Jacob

Matthew D. Hancher

unread,

Nov 17, 2008, 6:15:14 PM11/17/08

to django-developers digest subscribers

alex....@gmail.com wrote:
> Matthew, would you mind sticking the script you used to test this up
> on dpaste?

I'd love to, but it wasn't really a script per se, so much as a hodge-
podge that involved twiddling the server, restarting it, running some
tests, changing the server config again, and so forth. If I get a
moment to tidy it up into something dpasteable I'll do so.

Jacob Kaplan-Moss wrote:
> <Matthew....@nasa.gov> wrote:
>> Okay, I decided to do a bit of profiling to keep the conversation
>> moving.
> I did too; I took a stab at measure the raw speed of calling signals.

Sweet. To the limited extent that our tests are comparable, they
appear to be in rough agreement. For example, they both show a signal
with a single trivial listener costing about nine times as much as a
signal with no listeners.

Jacob Kaplan-Moss wrote:
> Given that, I'm generally going to be -1 on adding any non-essential
> signal to Django that's *connected by default* -- the overhead is too
> much, so internal uses should just use plain old function calls.
> However, I don't see that it's *too* bad to add a signal (like
> connection-created) with no listeners... but we should be careful in
> the documentation to clearly explain the substantial overhead
> involved.

Okay. Given all this, how do people feel about a connection_created
signal? What about a cursor_created signal, either instead or in
addition? (I have no use case for that, but if for some reason people
prefer it to connection_created it will still be sufficient to solve
my immediate problem.)

Malcolm Tredinnick wrote:
> A random thought: is there any other information worth sending along
> with the signal? Right now, the receiver is told "a connection was
> created". Anything that's likely to vary and that could be useful as a
> trigger for other actions?

I was thinking about this, too. Right now the most important thing is
the type of database connection being created, which you can determine
from the sender, and which you can determine from settings anyway.
However, the big question in my mind is how all of this relates to the
multiple-database support that folks seem to be working on. Does
anyone from that camp want to chime in?

Matt

Matt Hancher
Intelligent Systems Division
NASA Ames Research Center

Matthew....@nasa.gov

alex....@gmail.com

unread,

Nov 17, 2008, 7:23:32 PM11/17/08

to Django developers

As long as you get the connection itself, i would think that would be
sufficient, obviously the receiver should know which connection it is
in the multi-db scenario, as it won't be able to just infer that based
on the settings.

On Nov 17, 6:15 pm, "Matthew D. Hancher" <Matthew.D.Hanc...@nasa.gov>
wrote:

> alex.gay...@gmail.com wrote:
> > Matthew, would you mind sticking the script you used to test this up
> > on dpaste?
>
> I'd love to, but it wasn't really a script per se, so much as a hodge-
> podge that involved twiddling the server, restarting it, running some
> tests, changing the server config again, and so forth. If I get a
> moment to tidy it up into something dpasteable I'll do so.
>
> Jacob Kaplan-Moss wrote:

> Matthew.D.Hanc...@nasa.gov

Malcolm Tredinnick

unread,

Nov 18, 2008, 8:24:30 PM11/18/08

to django-d...@googlegroups.com

Dude, how broken is you mail client? It's attaching this thread
continuation
On Mon, 2008-11-17 at 15:15 -0800, Matthew D. Hancher wrote:
[...]

> Okay. Given all this, how do people feel about a connection_created
> signal? What about a cursor_created signal, either instead or in
> addition? (I have no use case for that, but if for some reason people
> prefer it to connection_created it will still be sufficient to solve
> my immediate problem.)

I'm in favour of connection_created. Since we don't have any compelling
use-case for it, I'm not in favour of cursor_created. There's stuff you
need to do when connecting to the database, so connection_created is
indeed useful. But until there's really a good idea of things that need
to be done when a new cursor is made, let's leave it out. We have a
fairly consistent policy of not including things just because we can.

> Malcolm Tredinnick wrote:
> > A random thought: is there any other information worth sending along
> > with the signal? Right now, the receiver is told "a connection was
> > created". Anything that's likely to vary and that could be useful as a
> > trigger for other actions?
>
> I was thinking about this, too. Right now the most important thing is
> the type of database connection being created, which you can determine
> from the sender, and which you can determine from settings anyway.
> However, the big question in my mind is how all of this relates to the
> multiple-database support that folks seem to be working on. Does
> anyone from that camp want to chime in?

I was contemplating this a bit more in the interim and realised the
multi-db stuff will probably want to send through the name (or
identifier -- whatever that means. I've been playing with a few ideas
and what the ident is varies from thought to thought) when the
connection is made.

However, I also realised my question was a bit silly. We've set things
up (by requiring **kwargs in the signal receiving functions) precisely
so that we can add parameters later. This doesn't need to be resolved
now, because it's not going to cause any compatibility issues. I
withdraw even the random thought; it's really irrelevant to this
situation. We can punt this until it becomes a requirement.

Regards,
Malcolm

Reply all

Reply to author

Forward