I've been playing with adding sqlite3 back-end support to GeoDjango,
using the SpatiaLite extension. This requires executing some magic
SQL each time you connect to the database, to enable the spatial
extensions. Ticket #6064 seems like this right way to do this, by
causing the database to send a signal each time a new connection is
opened, which GeoDjango can catch. This works like a charm, and I
just uploaded an updated patch to that ticket, since it hadn't been
touched for a year and had gotten quite stale.
I'm curious if anyone thinks there's a better way to do this (i.e.
cause magic SQL to be executed on each database connection). I'm also
curious whether the test in my patch passes for other back-ends, like
oracle and whatnot.
Finally, it occurred to me that a signal on *cursor* creation could
also be useful, and could easily be added at the same time. (In fact
in the old patch the connection_created signal actually erroneously
behaved like a cursor_created signal for some back-ends.)
Thoughts?
Matt
Matt Hancher
Intelligent Systems Division
NASA Ames Research Center
Matthew....@nasa.gov
Modulo my question about signal overhead below, this seems like a decent
approach. It's kind of the reason for signals existing.
A random thought: is there any other information worth sending along
with the signal? Right now, the receiver is told "a connection was
created". Anything that's likely to vary and that could be useful as a
trigger for other actions? I can't immediately think of anything, but
I'll throw it out there in case I've overlooked something.
> I'm curious if anyone thinks there's a better way to do this (i.e.
> cause magic SQL to be executed on each database connection). I'm also
> curious whether the test in my patch passes for other back-ends, like
> oracle and whatnot.
>
> Finally, it occurred to me that a signal on *cursor* creation could
> also be useful, and could easily be added at the same time. (In fact
> in the old patch the connection_created signal actually erroneously
> behaved like a cursor_created signal for some back-ends.)
I'm not up to speed these days on the overhead for signal emission. We
create a lot of database connections, for better or worse (as you no
doubt realise, our connection management strategy is "one per request").
Is the impact noticeable for those doing nothing with the signals? I
suspect it's not hard to test, but I'm going to be lazy and not do it
myself (I suspect my laptop isn't quite production-quality hardware in
any case).
I'm generally in favour of the idea, though. Looks like a reasonable use
for signals and your use-case seems like a typical situation that will
need this.
Regards,
Malcolm
"Hugely" being, of course, a highly scientific measurement lending
itself to accurate comparison against future changes. :-)
Yes, I know it's faster now and I realise that's the common case. But
what's the impact of adding one or two new signal calls per request. I
suspect it's minimal, but I haven't measured it and this is the first
time it's come up since the refactor, so it's probably time to know what
we're agreeing to for each new signal.
Regards,
Malcolm
This sort of benchmarking would be helpful to have kicking around. One
of the features on the table for v1.1 is to add signals for activity
on m2m operations. The biggest impediment to introducing these signals
is the overhead (or, at least, the perception that there will be an
overhead) associated with adding a signal for a common operation.
I would like to see m2m signals introduced (overhead permitting), so
I'll probably have to work up a set of benchmarks at some point so
that we can have the debate over some concrete numbers rather than
perceptions and speculation. If someone else were to save me the
effort of having to write these benchmarks myself, I would buy them a
lollipop* :-)
Russ Magee %-)
* Lollipop offer valid while stocks last. ISO standard size and
flavor. Offer void where prohibited.
Since we're no longer using the old system, that doesn't tell us how
much slower adding a new signal to the new system is. You seem to be
missing the point: It's not a huge deal, but we might as well work out
this impact now since, as Russell and I have both noted, it's the first
time we're looking at adding a new signal and there are other cases on
the table (I'm also in favour of the many-to-many case, for example).
Malcolm
I knew about these. While these are great for establishing the
relative improvement offered by the new signals, they don't really
address the question of absolute performance. For an signal with no
listeners, the new signals may be 67% faster, but faster than what?
I know this is a horribly nebulous question (like all benchmarking),
and it's completely dependent on the speed of your machine and a
million other factors. However, if we are going to start adding
signals to very common operations (like m2m and opening connections),
we need to know what sort of overhead we are adding in absolute terms.
Yours,
Russ Magee %-)
I know this is a horribly nebulous question (like all benchmarking),
and it's completely dependent on the speed of your machine and a
million other factors. However, if we are going to start adding
signals to very common operations (like m2m and opening connections),
we need to know what sort of overhead we are adding in absolute terms.
I did too; I took a stab at measure the raw speed of calling signals.
My code's at http://gist.github.com/25892; the output looks like:
Nothing : 0.00283 (0.000000028 percall)
Plain func call : 0.07054 (0.000000705 percall)
Signal; 0 handlers: 0.09331 (0.000000933 percall)
Signal; 1 handler : 0.79125 (0.000007912 percall)
Signal; 10 handlers: 3.36051 (0.000033605 percall)
Signal; 100 handlers: 27.56269 (0.000275627 percall)
The raw numbers are pretty much useless (run on a machine while doing
about seventy other things), but you can see that:
* In Python, calling a function (``handle()``) takes 25 times as long
as doing nothing (``pass``).
* Dispatching a signal when there's no handlers is about 1.3 times a
function call.
* Dispatching a signal when there's one handler is about 11 times the
cost of a function call.
* Addition of additional receivers scales linearly.
This is about what I'd expected: calling a un-handled signal is
extremely cheap -- 1.3x the function overhead is as close to free as
you can get. The first listener is expensive; remaining ones cost O(N)
time.
Given that, I'm generally going to be -1 on adding any non-essential
signal to Django that's *connected by default* -- the overhead is too
much, so internal uses should just use plain old function calls.
However, I don't see that it's *too* bad to add a signal (like
connection-created) with no listeners... but we should be careful in
the documentation to clearly explain the substantial overhead
involved.
Jacob
I'd love to, but it wasn't really a script per se, so much as a hodge-
podge that involved twiddling the server, restarting it, running some
tests, changing the server config again, and so forth. If I get a
moment to tidy it up into something dpasteable I'll do so.
Jacob Kaplan-Moss wrote:
> <Matthew....@nasa.gov> wrote:
>> Okay, I decided to do a bit of profiling to keep the conversation
>> moving.
> I did too; I took a stab at measure the raw speed of calling signals.
Sweet. To the limited extent that our tests are comparable, they
appear to be in rough agreement. For example, they both show a signal
with a single trivial listener costing about nine times as much as a
signal with no listeners.
Jacob Kaplan-Moss wrote:
> Given that, I'm generally going to be -1 on adding any non-essential
> signal to Django that's *connected by default* -- the overhead is too
> much, so internal uses should just use plain old function calls.
> However, I don't see that it's *too* bad to add a signal (like
> connection-created) with no listeners... but we should be careful in
> the documentation to clearly explain the substantial overhead
> involved.
Okay. Given all this, how do people feel about a connection_created
signal? What about a cursor_created signal, either instead or in
addition? (I have no use case for that, but if for some reason people
prefer it to connection_created it will still be sufficient to solve
my immediate problem.)
Malcolm Tredinnick wrote:
> A random thought: is there any other information worth sending along
> with the signal? Right now, the receiver is told "a connection was
> created". Anything that's likely to vary and that could be useful as a
> trigger for other actions?
I was thinking about this, too. Right now the most important thing is
the type of database connection being created, which you can determine
from the sender, and which you can determine from settings anyway.
However, the big question in my mind is how all of this relates to the
multiple-database support that folks seem to be working on. Does
anyone from that camp want to chime in?
Matt
Matt Hancher
Intelligent Systems Division
NASA Ames Research Center
I'm in favour of connection_created. Since we don't have any compelling
use-case for it, I'm not in favour of cursor_created. There's stuff you
need to do when connecting to the database, so connection_created is
indeed useful. But until there's really a good idea of things that need
to be done when a new cursor is made, let's leave it out. We have a
fairly consistent policy of not including things just because we can.
> Malcolm Tredinnick wrote:
> > A random thought: is there any other information worth sending along
> > with the signal? Right now, the receiver is told "a connection was
> > created". Anything that's likely to vary and that could be useful as a
> > trigger for other actions?
>
> I was thinking about this, too. Right now the most important thing is
> the type of database connection being created, which you can determine
> from the sender, and which you can determine from settings anyway.
> However, the big question in my mind is how all of this relates to the
> multiple-database support that folks seem to be working on. Does
> anyone from that camp want to chime in?
I was contemplating this a bit more in the interim and realised the
multi-db stuff will probably want to send through the name (or
identifier -- whatever that means. I've been playing with a few ideas
and what the ident is varies from thought to thought) when the
connection is made.
However, I also realised my question was a bit silly. We've set things
up (by requiring **kwargs in the signal receiving functions) precisely
so that we can add parameters later. This doesn't need to be resolved
now, because it's not going to cause any compatibility issues. I
withdraw even the random thought; it's really irrelevant to this
situation. We can punt this until it becomes a requirement.
Regards,
Malcolm