Simon Willison wrote:
> 1. We'll be able to de-emphasise the current default "e-mail all
> errors to someone" behaviour, which doesn't scale at all well.
In a recent thread[1] on a similar topic Russel has also emphasized that
we should improve documentation about doing logging.
> 3. Logging in Django core rather than a 3rd party app will encourage
> reusable applications to log things in a predictable way, standard
> way.
Talking about predictable and standard way I want to be sure that we
don't break existing logging.
I.e. we at Yandex now have many reusable Django apps that don't setup
loggers themselves but just log things into named loggers and expect
them to be setup by a project that uses them.
What I gather from your proposal is that you want the same model ("an
app logs, a project setups") plus a nice declarative syntax in
settings.py instead of boring creation of handlers, formatters and
loggers. Right?
> - We could replace (or complement) django.connection.queries with a
> log of executed SQL. This would make the answer to the common question
> "how do I see what SQL is being executed" much more obvious.
In the thread that I was referring to[1] we kind of agreed on using a
signal there. Then hooking a logger onto the signal is simple.
> 5. People might leave logging on, then find their server disk has
> filled up with log files and caused their site to crash.
We had this problem with standard logging. Then we switched to a
RotatingFileHandler which wasn't very good however because its behavior
is simplistic and is not controllable by admins with an interface that
they know, namely logrotate. Setting up logrotate also wasn't without
problems. When it rotates a file it should let an app know about it and
it uses SIG_HUP for that. However this instantly terminates Django's
flup-based FastCGI server which we use.
Now we've settled on a WatchedFileHandler ported from Python 2.6 logging
module. It watches for file descriptor change and doesn't require
SIG_HUP to pick up a new file. May be we should port it to Django and
use it as a default handler for logging to file system.
> 6. Logging levels are confusing - what exactly is the difference
> between warn, info, error, debug, critical and fatal? We would need to
> document this and make decisions on which ones get used for what
> within the framework.
May be we can just leave it for users to decide. It depends so much on
how much an app actually wants from logging.
The only standard thing I can think of is to have DEBUG = True imply
level = logging.DEBUG (which includes everything more sever). DEBUG =
False will imply logging.INFO then. What do you think?
> # If you want custom log handlers - not sure how these would interact
> with
> # channels and log levels yet
> LOG_HANDLERS = (
> 'django.core.log.handlers.LogToDatabase',
> 'django.core.log.handlers.LogToEmail',
> )
This is a hard problem really. Most handlers require different set of
arguments. File names, email credentials, system idents for SysLog etc.
Also there should be different formatters. For example there's no point
to waste space in a syslog message on a timestamp since syslog tracks it
itself...
[1]:
http://groups.google.com/group/django-developers/browse_frm/thread/9d0992e800cf7d68#
No disagreement here with any of these assertions.
In the absence of specifics, this makes me a little bit nervous. The
Python logging interface may be very Java-heavy and complex, but it is
a thoroughly known quantity, and it houses a lot of features.
I've seen several attempts to wrap Java loggers in a "nicer"
interface, and every one of them ended up hobbling some of the power
features of the logger. There is also the issue of our wrapper playing
nicely with the loggers already being used in the wild.
I'm also not entirely convinced that the answer here isn't just
documentation. The documentation for log4j has historically been
pretty awful, and while Python's documentation is an improvement, it
could certainly be better IMHO. Good documentation for how to use
logging in the context of Django could go a long way.
> 3. We risk people using logging where signals would be more
> appropriate.
This may be a better way to approach the problem - more details below.
Details notwithstanding, I'm +1 to the idea of adding logging to the
core framework - or, at least, making it easier to use logs for
reporting internal state and error conditions instead of email).
As for likely roadblocks: I've been led to believe that Adrian has
objections to framework-level logging. I have no idea as to the nature
of his objection, but ticket #5415 indicates that he is (or has been,
historically) in favor of adding signals that could be used for
logging or debugging purposes.
Yours,
Russ Magee %-)
To clarify - I think that documentation is the very least we should
do. As your comments indicate, there are a lot of things you need to
do in order to get logging right, so we should at the very least
provide some documentation on how to do it right.
Yours,
Russ Magee %-)
> As for likely roadblocks: I've been led to believe that Adrian has
> objections to framework-level logging. I have no idea as to the nature
> of his objection, but ticket #5415 indicates that he is (or has been,
> historically) in favor of adding signals that could be used for
> logging or debugging purposes.
I'm in favor of the signals approach as part of the core framework. In
addition, django.contrib.logging, which could provide a generic
solution, good for the
80 or 90% case would really rock.
First off - let me reinforce that I'm in your camp here - I like
Python's logger, and I think we should be adding logging to Django.
Any hesitation I have expressed is mostly a function of institutional
inertia, especially with regards to Adrian's historical position on
logging.
However, I would point out that IMHO, FUD is an accurate description
of the state of play - though probably not in the way you probably
meant.
Python's logging api _looks_ a lot like log4j in parts. This is at
least partially because there's a limit to how many ways you can
express 'log.debug()' before you start to copy. However, as a result,
there's a lot of Fear, Uncertainty and Doubt as to whether a framework
that apparently has Java heritage is going to be any good in Python.
Don't forget that a lot of us (myself included) got into writing
Python to get away from the stupidities of the Java world. Those scars
are deep, and aren't going away in a hurry. Speaking personally, log4j
is responsible for a lot of those scars, due in no small part to the
abysmal documentation for that project.
>> I'm also not entirely convinced that the answer here isn't just
>> documentation. The documentation for log4j has historically been
>> pretty awful, and while Python's documentation is an improvement, it
>> could certainly be better IMHO. Good documentation for how to use
>> logging in the context of Django could go a long way.
>>
>
> I'm working with Doug Hellmann (PyMOTW) to try and improve the layout
> of the logging documentation in Python. I'm not asking for patches
> (though it would be nice), but if you can give *specific* criticisms
> (e.g. what you think is missing, or unclear) then that will focus our
> efforts.
My comment was actually directed at Django's documentation, which is
currently silent on the issue of logging - and probably shouldn't be.
However, since you're interested in feedback, my suggestion would be
to look at every defense you've made of logging in this thread (and
any other threads where you've had similar arguments), and work out
why the current docs have allowed those viewpoints to be established
as apparent fact. Some examples:
* Acknowledge that there is some Java heritage, but point out that
this doesn't mean it's a bad thing, and that there is a lot that
_isn't_ Java based about Python's logger.
* Highlight the important architectural picture. As you noted in
another reply - the logger and the handler are quite separate, and
this gives a lot of power. However, the existence and significance of
that architectural separation isn't really a major feature of the
current docs. At present, the architectural bits are buried inside API
discussion, but understanding this architecture is important if you're
going to understand why logging works the way it does, and understand
that logging isn't just putting lines into a file.
* Make the simple example actually simple. IMHO, a single-file simple
logging example is good for exactly 2 things:
- showing how to configure the simplest possible case of logging
- explaining the "why don't I have any output" problem.
Tasks like configuring the logger to use a rotating file handler are
important, but can wait for much later - once issues of basic usage
and architecture have been established.
* Better examples of how logging works in the real world. All the
examples focus on single file projects. Most of the complexities I've
had with logging stem from how to use it in a multiple-file project,
yet as far as I can make out, there is very little discussion of how
logging should be used in a real multiple-file project.
- Should I have one logger instance per module? One per conceptual "task"?
- You've used "logging.getLogger(__name__)" in this thread, but this
pattern isn't mentioned once in the docs. Is this best practice, or a
quick-and-dirty hack?
- When I have multiple loggers across multiple files, how do I
configure logging? Should I be putting logging.config.fileConfig() at
the start of every python file, or should I put the logging config
into a single python file somewhere that configures logging, and
import that module as needed?
>> Details notwithstanding, I'm +1 to the idea of adding logging to the
>> core framework - or, at least, making it easier to use logs for
>> reporting internal state and error conditions instead of email).
>>
>
> You can have your cake and eat it. It's perfectly feasible in Python
> logging to send only certain events to nominated email addresses (all
> configurable at run-time, so emails can be turned on/off, sent to
> different/additional destinations etc.) as well as e.g. logging
> tracebacks to file for the same events.
Agreed.
>> As for likely roadblocks: I've been led to believe that Adrian has
>> objections to framework-level logging. I have no idea as to the nature
>> of his objection, but ticket #5415 indicates that he is (or has been,
>> historically) in favor of adding signals that could be used for
>> logging or debugging purposes.
>>
>
> They (logging and signals) are two different things. Python logging
> allows you to consider the dimensions "What happened?", "Where did it
> happen?", "How important is it?" and "Who wants to know?"
> intelligently, and in particular it treats "What happened" and "Who
> wants to know?" orthogonally. You get a lot of ways of getting
> information to *people* whereas signals is more about letting *code*
> know what's going on.
Granted, although the two aren't mutually exclusive. After all, the
way you let someone know "what happened" is with code; It isn't hard
to think of a setup where we emit a signal everywhere that we might
want to log, and then attach a logging signal handler to those signal.
I'm not suggesting that this would be a good architecture - merely a
possible one.
> Unfortunately, a lot of people have got the impression that Python
> logging is "Java-like" and "not Pythonic" just because I acknowledged
> some good ideas in log4j. It's not as if Python people have a monopoly
> on good ideas, is it? This "Java heritage" perception sometimes leads
> to prejudice against logging, for no good reason that I can see.
An idea isn't bad just because it comes from the Java world, but in
fairness - the Java world does have a history of producing some pretty
dumb ideas. "Based on a Java API" isn't generally a complement in the
Python world :-)
I understand entirely the frustration of having a project perceived
"the wrong way" by the public - Django Evolution has been stuck with a
"magic" moniker for reasons that I can't begin to fathom. However, at
the end of the day, you can't blame the community for their
perceptions. One or two people might accidentally misunderstand
something, but when it starts happening systemically, you need to
start looking inward for the cause.
> Of
> course there might be grievances - for example, people complain about
> "slow". As a single logging call which just has a file handler is of
> the order of some tens of microseconds, I don't know how bad this is -
> what do we compare against? The Tornado webserver (used by FriendFeed)
> is a high-performance solution which uses Python logging. SQLAlchemy
> uses Python logging. They are careful to consider performance and as a
> consequence logging doesn't present a problem in practice.
For the record, speed actually isn't one of my major concerns.
Yours,
Russ Magee %-)
The hard part is that basicConfig only works like that back to Python
2.4 yet Django supports 2.3. When I added logging to Python-Markdown,
this was the hardest part. Figuring out how to configure logging so
that it works in 2.3 as well. The documentation is not exactly helpful
in that regard.
In fact, it was for this very reason that we added our own wrapper
around logging. It didn't seem reasonable for our users to go through
the same pain that we did. Sure we got a few things wrong at first,
but with the help of a few people in the community we worked those out
and our wrapper seems to work ok now. Yes - ok - I get the sense it
could be better.
Ever since then, any mention of logging leaves a bad taste in my
mouth. Perhaps if I was working only in 2.6 or such, this wouldn't be
an issue, but we have promised support back to 2.3.
Of course, it is possible that I'm missing something obvious.
--
----
\X/ /-\ `/ |_ /-\ |\|
Waylan Limberg
As luck would have it, you are :-)
Django 1.2 will drop formal support for Python 2.3.
Yours,
Russ Magee %-)
Thanks! Didn't know that. However see my further comment.
>> Now we've settled on a WatchedFileHandler ported from Python 2.6 logging
>> module. It watches for file descriptor change and doesn't require
>> SIG_HUP to pick up a new file. May be we should port it to Django and
>> use it as a default handler for logging to file system.
>
> Why "port it to Django"? Do you mean, copy it into Django? I'm not
> sure it should be the default - not everybody uses logrotate. I'd
> leave this sort of decision for code in settings.py.
Using WatchedFileHandler is a safe default because it works as
FileHandler, just doesn't break with logrotate. I don't know of any
disadvantages of WatchedFileHandler before the old FileHandler. So I
don't think there's much value in giving people this choice in settings
because non-default behavior will be rare (and still possible anyway).
One of the reasons why I propose Django's own settings structure for
logging is because we can choose better defaults for logging and have
more compact syntax for them. Standard Python logging configuration has
a noticable gap between very simplistic basicConfig which configures
only a root channel and a verbose imperative definition of handler
objects, formatter objects and logger objects. I've found that my usage
of logging inevitably falls in between: I often need a few logging
channels but I almost never, say, reuse handler objects between them.
Here's a variant of a simple config that I had in mind lately:
LOGGING = {
'errors': {
'handler': 'django.logging.FileLogger', # WatchedFileLogger copy
'filename': '...',
'level': 'debug',
},
'maintenance': {
'handler': 'logging.handlers.HTTPHandler',
'host': '...',
'url': '....',
'format': '....'
},
}
Top-level keys are logger names. Values are dicts describing handlers.
These dicts have several keys that Django knows about:
- 'handler': a handler class. It's imported like any other stringified
classes in settings
- 'level': a level keyword that is translated into logging.* constants.
This is done to not make users import logging by hand.
- 'format': a format string for the logging.Formatter object. We can
have a more sensible default for this than the one in Python logging. Or
not :-)
These keys are pop'd out of the dict and the rest is used as **kwargs to
the handler class instantiation.
Django's default setup may look like this:
LOGGING = {
'': {'handler': 'logging.StreamHandler'}
}
This has an advantage of always configuring a root logger to avoid an
infamous warning from Python logging when the logger doesn't have any
handlers defined. Users wanting to configure all the logging themselves
may null-out this using `LOGGING = {}`.
Forgot one thing. As it stands now Django has this "historic" behavior
when it imports and executes settings module twice. This results in
breakage when you setup loggers and handlers by hand. We now circumvent
this by having a helper function that memoizes configured loggers and
call it from settings.py. Having a declarative config we can hide this
inside of Django and not scare users.
Runserver currently outputs each request to the commandline. Any
reason that couldn't be handled by a logger? I seem to recall people
complaining that they couldn't redirect that output elsewhere before
when logging discussions came up.
[snip]
>
> My philosophy here is that log messages should be ignored unless
> explicitly told otherwise. Django gets run in various different ways -
> runserver, test and in deployment under mod_wsgi/FastCGI/etc - and
> it's not at all obvious what the default log output behaviour should
> be. As long as we make it extremely easy to see log messages if we
> want them I think having them off by default makes sense. It also
> ensures no surprises for people upgrading from 1.1 to 1.2.
Yeah, this was an issue we ran into. People will deploy your code in
all sorts of situations and you can't make any assumptions about what
would constitute an appropriate handler. They have to define those
themselves. Even management commands can be called from scripts and
may need output to be logged someplace other than sdterr.
> 4. Add a way to configure loggers in Django's settings.py file, in
> particular for use in production. I'm not sure if these settings
> should be ignored when running under other circumstances (./manage.py
> test etc) in favour of the command line option, or if they should
> still take effect.
Well, a logger can have multiple handlers. I would think we would want
to leave the handlers defined in settings.py in tact and add the
manage.py handler (most likely output to stderr - but could be
different - see my comment above).
I was thinking that a calming, motherly "there there, it's all right,
the boogeymonster isn't real" would do the trick :-)
>> * Highlight the important architectural picture. As you noted in
>> another reply - the logger and the handler are quite separate, and
>> this gives a lot of power. However, the existence and significance of
>> that architectural separation isn't really a major feature of the
>> current docs. At present, the architectural bits are buried inside API
>
> That's because the docs are really pitched mainly as a a reference
> guide.
I suppose this is a big part of the problem. The logging module isn't
a trivial API, and you really need to understand how it works before
you can use it effectively. What is needed is a really good tutorial;
what we have is a reference guide with a little bit of a tutorial
tacked on the front. The reference is needed, but the tutorial is much
more important from the point of view of getting the message out and
encouraging people to use the API.
>> discussion, but understanding this architecture is important if you're
>> going to understand why logging works the way it does, and understand
>> that logging isn't just putting lines into a file.
>
> I've recently created a blog about Python logging, where I talk about
> logging from first principles and try to show why the design of Python
> logging is as it is. It's not perfect, but it's a start.
>
> http://plumberjack.blogspot.com/2009/09/python-logging-101.html
I saw this, and yes, it's a good start. However, it (or a version of
it) needs to be formalized and in the official docs, not lingering on
a blog somewhere.
>> * Make the simple example actually simple. IMHO, a single-file simple
>> logging example is good for exactly 2 things:
>> - showing how to configure the simplest possible case of logging
>> - explaining the "why don't I have any output" problem.
>
> I'm not sure what you're getting at. Sometimes, those two things is
> all that people want to know at that time.
Agreed, it's important that the simple use case is demonstrated.
However, in the docs, the simple example is then used as a staging
ground for demonstrating rotating file loggers and other things that a
simple example doesn't need.
Logging is a complex topic. A simple example doesn't provide the
complexity that is required to demonstrate the strengths of the API.
The simple example is useful for demonstrating the "hello world" case,
but not for demonstrating more complex features.
On the subject of examples, another suggestion: I'm wary of examples
that try to be too smart. Take the logging level demonstration (the
5th code snippet in 16.6.1.1). This code very cleverly writes a script
that can take a command line argument and turn it into a logging
level, but in being clever, it obscures the point of the example -
that you can output log messages at any level, but the configuration
determines which ones are output. When you're trying to demonstrate
the simple stuff, the less extraneous detail, the better. The next
example (which demonstrates multiple loggers) is much clearer - it's
obvious from first inspection what the code is doing.
>> > They (logging and signals) are two different things. Python logging
>> > allows you to consider the dimensions "What happened?", "Where did it
>> > happen?", "How important is it?" and "Who wants to know?"
>> > intelligently, and in particular it treats "What happened" and "Who
>> > wants to know?" orthogonally. You get a lot of ways of getting
>> > information to *people* whereas signals is more about letting *code*
>> > know what's going on.
>>
>> Granted, although the two aren't mutually exclusive. After all, the
>> way you let someone know "what happened" is with code; It isn't hard
>> to think of a setup where we emit a signal everywhere that we might
>> want to log, and then attach a logging signal handler to those signal.
>>
>> I'm not suggesting that this would be a good architecture - merely a
>> possible one.
>
> I agree - it's possible, but perhaps not the most simple or intuitive.
> If you can log something directly from where you are, why send a
> signal to some other code which, while processing that signal, will
> log something for you?
To avoid a dependency on the logging module, thereby keeping Adrian happy :-)
Seriously - I really do want to find out the nature of Adrian's
objection to logging (if it is even current). I'd rather have an
actual discussion than dance around a strawman.
Yours,
Russ Magee %-)
Vinay Sajip wrote:
> It's similar to Django's support for, say, simplejson. I think it's
> reasonable for Django to alias WatchedFileHandler so that it's
> available either bound to logging's own implementation (in
> sufficiently recent versions of Python) or else a copy in Django's own
> code. Then people can use it if they want to, even in older Python
> versions.
Agreed.
> I have no big problem with a configuration scheme such as you suggest
> - if it's felt that a lot of Django users are not Python-savvy enough
> and need some hand-holding
Uhm... No this is not about hand-holding or something like that. I
propose a format that is shorter and more convenient than the
ConfigParser's. This is the point.
> My reservation with Django's own take on it is simply
> that it goes against TOOWTDI and the Zen of Python, a little at least.
Yes, this is the bad part about it. However the Zen of Python is not a
dogma and its application is subjective. I think my proposal still
stands and the decision is left for core devs. As for me I'll most
probably play with the implementation of parsing of this format for our
own projects anyway. I'll file it as a ticket if I would still like it
by that time :-).
I see one problem with explicitly calling a function from settings.py.
This will execute instantiation of handlers and all the user staff very
early when settings aren't fully configured yet. This prevents a user to
have a handler that will use Django models for example. In general
settings should be as dumb as possible for easy bootstrapping. This is
why all those middleware/context processors/backends etc are strings and
not actual callables.
Is there some easy way to see the patch? Preferably a link which will
just show the diff between the latest of your branch and trunk, but
failing that, a recipe of commands using git (for those of us who
haven't bothered to learn git properly yet).
Cheers,
Luke
--
"We may not return the affection of those who like us, but we
always respect their good judgement." -- Libbie Fudim
Luke Plant || http://lukeplant.me.uk/
One big question I have is about performance: have you done any a/b
testing against trunk to see how much adding logging -- especially
upon every SQL query -- impacts performance?
For me, at least, performance is going to be the critical factor. It's
clear that even a do-nothing logger will add some overhead. If we're
talking fractions of percents here, fine... but if there's a
measurable performance hit that's going to be a big problem, and it's
probably best to be tracking speed as we go along here.
Jacob
> master: 314.442s
> logging: 317.096s
>
> Since there's nothing at all in my logging code that will speed
> anything up, I'm putting that down to chance (I did run the suite a
> few times first to "warm up" the VM I used, but like I said, this
> is pretty much the opposite of science).
Those numbers are the time in seconds, right? In which case your
branch is slower, not faster, as I think you are implying. It could
still be down to chance, but its only 1% which would be an acceptable
hit for me anyway.
Luke
--
"Where a person wishes to attract, they should always be ignorant."
(Jane Austen)
Luke Plant || http://lukeplant.me.uk/
It is perfectly possible to set up logging in such a way that all the
logs for several programs wind up in the same file, or there are
duplicate entries. The first happens/happened if/when several programs
used the same interpreter. The second is user error. Been there been
stumped by both. Users need to be protected from both, somehow, but
how... Might be that a standard way for doing it in Django will
suffice for case 2. Will a warning/docs be needed for case 1?
When I first used the logging-library (2002?) I had a problem in
threaded programs, some exceptions were swallowed that shouldn't have
been, leading to rather mysterious behaviour. The exception-hierarchy
have been changed (a couple of times ;) ) since then so it might no
longer be an issue but I'd sleep better if there was tests for this
too. Vinay?
HM