Logging instead of connection.queries

6 views
Skip to first unread message

Ivan Sagalaev

unread,
Aug 3, 2009, 7:48:25 AM8/3/09
to django-d...@googlegroups.com
Hello!

A couple of days ago a sudden thought has struck me while thinking on
the matter of logging in Python libraries. I consider a good practice
for a library to log what it's doing into a named logger without setting
up logging by itself. It's then a responsibility of an application that
would use the library so set up logging as it see fit.

In this vein I propose to teach CursorDebugWrapper to log SQL queries
into a named logger instead of storing them into connection.queries.
Before writing a patch I want to discuss it.

This would look like this:

import logging
log = logging.getLogger('django.sql')

class CursorDebugWrapper(object):
def execute(self, sql, params=()):
# ....
log.debug('%.3f %s' % ((stop - start), sql))

Or actually this will allow to eliminate CursorDebugWrapper altogether,
do the logging in the standard cursor and control it by setting a
logging level on a handler in this fashion:

import logging
logging.basicConfig(
filename = '...',
level = settings.DEBUG and logging.DEBUG or logging.WARNING,
)

Some advantages of this approach are:

- Logs can be directed to several handlers. Tools like
django-debug-toolbar and firebug can have their handlers to catch SQL on
the fly.
- Logging happen in real-time. Currently most debugging tools analyze
connection.queries after a request has finished.
- Several logs can be combined into one handler. This helps to see what
code triggers SQL execution.
- We can log queries also before execution. This helps catching
killer-queries that take forever to execute.
- Such logging will eliminate a typical newbie question "Why Django
leaks memory" when people don't know that SQL is stored in memory in
DEBUG mode.

For backward compatibility there may be a custom logging handler that
would maintain connection.queries as is. But the user will have to
explicitly switch it on.

Russell Keith-Magee

unread,
Aug 3, 2009, 10:02:05 AM8/3/09
to django-d...@googlegroups.com

For the record, there is one notable use for the legacy behavior -
debugging at the console. i.e., issue a query at the Python prompt,
then print connection.queries[-1] to see what was executed. This is a
small thing, but it is a feature that I have used myself on more than
one occasion. If we're going to abandon or deprecate
connection.queries, we need to have a substitute for this use case.

However, more broadly, my concern with this proposal is the extent to
which it overlaps with logs already provided by the actual database.
Most databases have extremely configurable logging and analysis tools
- if Django started moving into this territory for logging queries, we
would be duplicating functionality pretty much from day 1. To pick
just one example: slow query analysers are a feature offered by most
serious databases, and provide much more detail than a single line in
a log file.

I'm not saying that what you are proposing this isn't worth doing -
just that we need to be careful that we're not duplicating effort on
tools that are already available (and better) elsewhere. The "memory
leak" issue, for example, is definitely something that we should
address. Ensuring that django-debug-toolbar (and others) have access
to adequate internal data is another admirable goal. Is logging the
way to fix these problems? I'm not necessarily convinced. I suspect
more discussion is required.

On a vaguely related topic, I believe that there is a big hole in the
Django documentation regarding logging - yes, you should be using the
Python standard library, but there is room for some solid
Django-specific Howto documentation on using logging in your Django
project.

Yours,
Russ Magee %-)

Yuri Baburov

unread,
Aug 3, 2009, 5:26:24 PM8/3/09
to django-d...@googlegroups.com
Hi Russell,

On Mon, Aug 3, 2009 at 9:02 PM, Russell
Keith-Magee<freakb...@gmail.com> wrote:
>
> On Mon, Aug 3, 2009 at 7:48 PM, Ivan Sagalaev<man...@softwaremaniacs.org> wrote:
>> In this vein I propose to teach CursorDebugWrapper to log SQL queries
>> into a named logger instead of storing them into connection.queries.
>> Before writing a patch I want to discuss it.

>> For backward compatibility there may be a custom logging handler that
>> would maintain connection.queries as is. But the user will have to
>> explicitly switch it on.
>
> For the record, there is one notable use for the legacy behavior -
> debugging at the console. i.e., issue a query at the Python prompt,
> then print connection.queries[-1] to see what was executed. This is a
> small thing, but it is a feature that I have used myself on more than
> one occasion. If we're going to abandon or deprecate
> connection.queries, we need to have a substitute for this use case.

Actually, for me, connection.queries caused more harm than good.

Django is lacking simple logger and logging client, that's why
developers are using connection.queries now as a replacement, not vice
versa as you're talking.
95% of cases, all I need is just to be able to see few last queries
from the dev server in console, in the same way as I see server
requests. I also want to be able to set up some builtin smtp server
for the same reason if i have no real server on my dev box. And I
believe these should be builtin features, since nearly all users will
use them.
tail -f queries.log would be ok, if someone logged it down for me already.

And filling all PC memory with connection.queries and
crashing/swapping when you do mass insert of few million records from
django shell into database is awful thing that needs to be dealt with
somehow!

> However, more broadly, my concern with this proposal is the extent to
> which it overlaps with logs already provided by the actual database.
> Most databases have extremely configurable logging and analysis tools
> - if Django started moving into this territory for logging queries, we
> would be duplicating functionality pretty much from day 1. To pick
> just one example: slow query analysers are a feature offered by most
> serious databases, and provide much more detail than a single line in
> a log file.

1) On shared hosting, you are not able to alter database
configuration, that means,
this option is not even possible.
2) On real world projects, you want different projects to use
different database logging levels, while having single database.
What I'm saying, is that despite the fact you are able to set up
logging on the database level, this is very rarely used, since not
always can be used, hard in customization and not always can be
customized to the level you need.

> I'm not saying that what you are proposing this isn't worth doing -
> just that we need to be careful that we're not duplicating effort on
> tools that are already available (and better) elsewhere. The "memory
> leak" issue, for example, is definitely something that we should
> address. Ensuring that django-debug-toolbar (and others) have access
> to adequate internal data is another admirable goal. Is logging the
> way to fix these problems? I'm not necessarily convinced. I suspect
> more discussion is required.

I see it in the way that django-debug-toolbar should set up a hook if
it needs some queries data.
connection.queries looks like a hack and can be easily eliminated and
substituted without losing any features. It's reliance on
settings.DEBUG option is also a bit weird. There's no option
QUERY_DEBUG, it's being set up in static way, and what we are supposed
to do with this option after django multidb integration?

--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

Malcolm Tredinnick

unread,
Aug 3, 2009, 6:53:42 PM8/3/09
to django-d...@googlegroups.com
On Mon, 2009-08-03 at 15:48 +0400, Ivan Sagalaev wrote:
> Hello!
>
> A couple of days ago a sudden thought has struck me while thinking on
> the matter of logging in Python libraries. I consider a good practice
> for a library to log what it's doing into a named logger without setting
> up logging by itself. It's then a responsibility of an application that
> would use the library so set up logging as it see fit.

Adrian, in particular, has been historically against adding logger
module hooks in Django. So you have to work around that.

>
> In this vein I propose to teach CursorDebugWrapper to log SQL queries
> into a named logger instead of storing them into connection.queries.
> Before writing a patch I want to discuss it.

[... snip discussion of how Python's logger module works...]

I have a patch I've been holding over for 1.2 that reduces the current
debug wrapper's log method to a ring buffer to avoid the accidental
massive memory usage case. The number of entries in that ring buffer is
configurable. Adding a hook to make it use a logger instead is about
three lines of code, I suspect (and one or two lines for people who want
to use it that way).

I'll send you what I have later on today, Ivan, and you can have a look.
Then we can polish it and see if it looks appropriate for 1.2.

Regards,
Malcolm

Ivan Sagalaev

unread,
Aug 4, 2009, 3:36:26 AM8/4/09
to django-d...@googlegroups.com
Russell Keith-Magee wrote:
> For the record, there is one notable use for the legacy behavior -
> debugging at the console. i.e., issue a query at the Python prompt,
> then print connection.queries[-1] to see what was executed.

Yes, and we can keep it. As I said a simple custom log handler can do this:

class QueriesHandler(logging.Handler):
def emit(self, record):
time, sql = record.split(' ', 1)
connection.queries.add({
'time': time,
'sql': sql,
})

Then we can invent a structure in settings to easily define handlers:

LOGGING_HANDLERS = [
('django.sql', 'django.logging.QueriesHandler'),
]

Or something more configurable with levels, params, etc. But these are
details... From this point my proposal is more like "let's make internal
implementation of SQL logging configurable".

> Most databases have extremely configurable logging and analysis tools
> - if Django started moving into this territory for logging queries, we
> would be duplicating functionality pretty much from day 1.

Apart from what Yuri said, there's another thing you cannot do with DB
logs: you cannot have it in one file with other logs from your Python code.

(Mostly agreed with your other notes.)

Ivan Sagalaev

unread,
Aug 4, 2009, 3:40:13 AM8/4/09
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> Adrian, in particular, has been historically against adding logger
> module hooks in Django. So you have to work around that.

Interesting... Adrian, can you elaborate?

> I have a patch I've been holding over for 1.2 that reduces the current
> debug wrapper's log method to a ring buffer to avoid the accidental
> massive memory usage case.

Actually this is not the main issue... I saw the discussion about
limiting the buffer. My main concern is that logging is way more
flexible there. Having ability to watch queries before execution or log
failed queries in non-debug mode -- things like that is what I really
pursue here.

Ales Zoulek

unread,
Aug 4, 2009, 4:58:20 AM8/4/09
to django-d...@googlegroups.com


> I have a patch I've been holding over for 1.2 that reduces the current
> debug wrapper's log method to a ring buffer to avoid the accidental
> massive memory usage case.

Actually this is not the main issue... I saw the discussion about
limiting the buffer. My main concern is that logging is way more
flexible there. Having ability to watch queries before execution or log
failed queries in non-debug mode -- things like that is what I really
pursue here.


Plus having ability to combine SQL logs with the application-level logs. It would be great to have them both in the same timeline in order of execution. (That could ease tracking down some SQLs called "magicaly" somewhere outside the views eg. in template rendering.)

------------------------------------------------------
Ales Zoulek
+420 604 332 515
Jabber: ales....@gmail.com
------------------------------------------------------

George Vilches

unread,
Aug 4, 2009, 9:22:41 AM8/4/09
to django-d...@googlegroups.com

On Aug 3, 2009, at 6:53 PM, Malcolm Tredinnick wrote:


On Mon, 2009-08-03 at 15:48 +0400, Ivan Sagalaev wrote:
Hello!

A couple of days ago a sudden thought has struck me while thinking on
the matter of logging in Python libraries. I consider a good practice
for a library to log what it's doing into a named logger without setting
up logging by itself. It's then a responsibility of an application that
would use the library so set up logging as it see fit.

Adrian, in particular, has been historically against adding logger
module hooks in Django. So you have to work around that.

I'd like to point at a ye olde ticket that Adrian did support for exactly this purpose: http://code.djangoproject.com/ticket/5415

Putting signals on a replacement CursorWrapper would give the same functionality from a user standpoint (the ticket outright says "This will enable all sorts of interesting and useful things, such as logging and debugging functionality"), and already has some blessing.  The patch will have to be brought up to date, but I did address all of Malcolm's concerns at the time with the most recent version of the patch, and the worries about signal performance in #4561 have since been resolved.  

What say you, Malcolm? :)

Malcolm Tredinnick

unread,
Aug 6, 2009, 2:24:16 AM8/6/09
to django-d...@googlegroups.com
On Tue, 2009-08-04 at 09:22 -0400, George Vilches wrote:
[...]

>
> Putting signals on a replacement CursorWrapper would give the same
> functionality from a user standpoint (the ticket outright says "This
> will enable all sorts of interesting and useful things, such as
> logging and debugging functionality"),

It's like he's trying to sell you a used car. Further interrogation
revealed that he wants one single feature, but dressed up this stepping
stone as good for all mankind. You have to watch out for people like
that. They're sneaky. :)

> and already has some blessing. The patch will have to be brought up
> to date, but I did address all of Malcolm's concerns at the time with
> the most recent version of the patch, and the worries about signal
> performance in #4561 have since been resolved.
>
>
> What say you, Malcolm? :)

I agree that things have moved on since the original ticket was opened
and your patch looks quite decent and worth working out how to get it
in. I'm on the record as agreeing with having a way to implement
assertNumQueries() in the test framework.

Unfortunately, this ticket doesn't solve Ivan's problem of the moment,
as far as I can see.

I can see after we create the 1.1.X, I'm going to be spending a bunch of
time reading and applying patches with G. Vilches as the author.

Regards,
Malcolm

Ivan Sagalaev

unread,
Aug 6, 2009, 4:22:50 PM8/6/09
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> Unfortunately, this ticket doesn't solve Ivan's problem of the moment,
> as far as I can see.

Yeah, it sends signals in addition to maintaining connection.queries. My
point (or one of them) is to replace current hard-coded implementation
with something lower-level and less intrusive and then reimplement
connection.queries on top of it. And the disable it by default.

I'm happy with signals by the way. This is consistent with the way
Django reports unhandled exception in core.handlers.base.

Ivan Sagalaev

unread,
Aug 13, 2009, 6:33:46 PM8/13/09
to django-d...@googlegroups.com
Hello!

I'd like to revive the discussion about handling debug SQL log since I
don't feel we've come to any conclusion. To summarize:

- Ticket [5415] deals with implementing signals upon SQL queries. Nobody
had any objections to it and it helps in solving my original problem of
logging.

- Malcolm has a patch that limits connection.queries to some finite size
which will solve perceived "memory leak" problem for newbies. After
additional thinking I decided to refrain my proposal of disabling
populating connection.queries by default since it wouldn't pose
immediate problems in this case.

If I haven't missed anything are we agree then that those two changes
should be in 1.2 on principal?

If yes, I have another little thing to discuss about ticket 5415
implementation. It introduces another cursor wrapper (apart from
CursorDebugWrapper) that actually sends signals. I think we could just
put this code into the Cursor itself and get rid of the wrapper. It
makes sense to have signals for both debug and production modes and the
common knowledge is that signals are cheap enough. Right?

[5415]: http://code.djangoproject.com/ticket/5415

Reply all
Reply to author
Forward
0 new messages