logging

924 views
Skip to first unread message

David Nemeskey

unread,
Mar 23, 2011, 6:12:44 AM3/23/11
to gensim
Hi,

I have just stumbled upon this library, and thought of giving it a go.
Thus far, the API looks very nice, it is very easy to understand and
use. Thanks for the great work!

That said, I feel there is one point that could be improved: logging.
I think it is not the best decision to have it on by default. It may
be my unfamiliarity with Python logging, but I found it hard to turn
it off. I tried

logging.root.setLevel(logging.CRITICAL)
and
logging.basicConfig(level = logging.CRITICAL)

to surpass the info messages going out to the console, but to no
avail. I had to manually remove the handler from the root logger.

Also, as far as I could see, the logging uses the "utils" domain,
which does not really tell where the message is coming from (esp. if
the application also has a "utils" module), and so it is not possible
to selectively surpass (or grep, for that matter) logging messages
coming from gensim. I believe it would be better define a "gensim"
upper domain, and then optionally have "gensim.utils",
"gensim.corpus", etc.

Please note that the suggested way according to the Python
documentation is, instead of using the root logger, to have a separate
logger for the library with a NullHandler on it. In this way, the user
can have full control over what log messages are sent and where. It
would be great if gensim followed this idea as well.

Thanks,
David

Radim

unread,
Mar 23, 2011, 2:25:28 PM3/23/11
to gensim
Hello David,

thanks for sharing your thoughts! I believe this is the first time
logging in gensim was brought up :-)

> Also, as far as I could see, the logging uses the "utils" domain,
> which does not really tell where the message is coming from (esp. if
> the application also has a "utils" module), and so it is not possible

Not sure what you mean by "domain". What appears in the message is
determined by the formatter (which you can/should configure yourself).
Each module in gensim has its own logger, and gensim doesn't define
any handlers or formatters -- that's up to the application using
gensim. So actually, by default you shouldn't see any messages! Only a
line automatically emitted by Python that says "No handlers could be
found for logger xyz", which means the gensim library uses logging but
your application doesn't.

Now if you see logging messages, like you say you do, it probably
means you already added handlers to your root logger (the parent of
all loggers) and the messages are getting through via that root.

> to selectively surpass (or grep, for that matter) logging messages
> coming from gensim. I believe it would be better define a "gensim"
> upper domain, and then optionally have "gensim.utils",
> "gensim.corpus", etc.

Excellent point! That way, we'll be able to remove (or rather, not
add) handlers only to the top `gensim` logger, without affecting the
root.

David, would you be willing to add this functionality to gensim? That
is, change all the loggers from e.g. `getLogger('ldamodel')` to
`getLogger('gensim.models.ldamodel')`. Or maybe even
`getLogger(__name__)`, but I'm not sure if that wouldn't break things
in modules that can also act as scripts (__name__ == '__main__' then).

That idea with a fake NullHandler is also nice. It could be added to
the root logger `gensim`, in the top-level __init__.py, to get rid of
the "No handlers could be found for logger xyz" warnings.

I raised this issue on github: https://github.com/piskvorky/gensim/issues/17
.

Cheers,
Radim


Radim

unread,
Mar 23, 2011, 2:33:43 PM3/23/11
to gensim
> Each module in gensim has its own logger, and gensim doesn't define
> any handlers or formatters -- that's up to the application using
> gensim. So actually, by default you shouldn't see any messages! Only a

I told a lie. I did a quick grep and some of the very old modules
(bleicorpus.py, mmcorpus.py, rpmodel.py) and some very new modules
(indexedcorpus.py) actually don't have their own logger, they use
plain logging.

So this will be an excellent opportunity to clean up logging and make
it consistent.

-rr

David Nemeskey

unread,
Mar 25, 2011, 7:59:50 AM3/25/11
to gensim
Hi Radim,

> Now if you see logging messages, like you say you do, it probably
> means you already added handlers to your root logger (the parent of
> all loggers) and the messages are getting through via that root.
Well, I thought I haven't, but apparently I had to, since I cannot
reproduce the problem now. Even though I certainly have not issued any
logging-related commands... strange.

> Not sure what you mean by "domain". What appears in the message is
> determined by the formatter (which you can/should configure yourself).
I guess I saw some kind of console handler in action. The point is,
what I saw on the console was "utils" and not "gensim.utils" or
something like that.

> David, would you be willing to add this functionality to gensim? That
> is, change all the loggers from e.g. `getLogger('ldamodel')` to
> `getLogger('gensim.models.ldamodel')`. Or maybe even
> `getLogger(__name__)`, but I'm not sure if that wouldn't break things
> in modules that can also act as scripts (__name__ == '__main__' then).
I'll see if I have time on the weekend. As for a static string vs.
__name__, maybe at least __package__ could be used?

> That idea with a fake NullHandler is also nice. It could be added to
> the root logger `gensim`, in the top-level __init__.py, to get rid of
> the "No handlers could be found for logger xyz" warnings.

> I raised this issue on github:https://github.com/piskvorky/gensim/issues/17
OK, thanks.

David
Message has been deleted

David Nemeskey

unread,
Mar 25, 2011, 9:46:37 AM3/25/11
to gensim
I could reproduce the problem after all.

$ python
Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.root.handlers
[]
>>> from gensim import corpora
>>> logging.root.handlers
[]
>>> d = corpora.Dictionary()
>>> logging.root.handlers
[]
>>> d.addDocuments(['aha', 'baha'])
INFO:dictionary:built Dictionary(3 unique tokens) from 2 documents
(total 7 corpus positions)
>>> logging.root.handlers
[<logging.StreamHandler instance at 0x2d56758>]
>>>

As you can see, no manual setting of logging anywhere: apparently
gensim installs it when I call Dictionary.addDocuments().

David

Radim

unread,
Mar 25, 2011, 12:32:31 PM3/25/11
to gensim
Yep! You're right David: there is a forgotten `logging.info` call
there (should be `logger.info`).

I checked the source of the standard `logging` module, and actually
any call to logging.debug/info/error/etc. automatically adds a handler
to the root logger if it's missing (by calling `basicConfig`
internally). So all these `logging.*` calls have to go. On the other
hand, they should be very easy to find and fix.

Cleaning up & making logging consistent will be a very commendable
effort :-) Let's continue with any potential technicalities on the
issue page on github.

Cheers,
Radim

Dieter Plaetinck

unread,
Mar 30, 2011, 8:35:38 AM3/30/11
to gen...@googlegroups.com
Cool that this is brought up.
I also have a bunch of issues to get logging working how I want (i.e. say I only want errors/critical messages); and I actually find the python logging module very hard to understand what's going on.

I added calls like
        logging.root.setLevel(logging.ERROR)
        logging.basicConfig(level=logging.ERROR)
to my code at various points (right after importing the logging module, after importing gensim modules, in my main function etc), but all to no avail, I keep getting info messages from gensim libraries, for example:
INFO:gensim.corpora.indexedcorpus:loaded corpus index from doc__corpus.index
INFO:gensim.corpora.bleicorpus:loading corpus from doc__corpus
INFO:gensim.utils:loading TfidfModel object from doc__tfidf

from what I read on http://docs.python.org/library/logging.html#logging.basicConfig, I would think logging.basicConfig takes precendence as long as I do it early enough, but that's not how it seems to work.

And I have this problem with the "old" code, as well as the latest from develop branch.
So, how do I best disable all messages (other then error/critical) from both my own code and gensim?

Dieter

Radim

unread,
Mar 30, 2011, 12:09:45 PM3/30/11
to gensim
On Mar 30, 7:35 pm, Dieter Plaetinck <die...@plaetinck.be> wrote:
> Cool that this is brought up.
> I also have a bunch of issues to get logging working how I want (i.e. say I
> only want errors/critical messages); and I actually find the python logging
> module very hard to understand what's going on.

It's based on a java library (log4j), so what do you expect :)

>
> I added calls like
>         logging.root.setLevel(logging.ERROR)
>         logging.basicConfig(level=logging.ERROR)
> to my code at various points (right after importing the logging module,
> after importing gensim modules, in my main function etc), but all to no
> avail, I keep getting info messages from gensim libraries, for example:
> INFO:gensim.corpora.indexedcorpus:loaded corpus index from doc__corpus.index
> INFO:gensim.corpora.bleicorpus:loading corpus from doc__corpus
> INFO:gensim.utils:loading TfidfModel object from doc__tfidf
>
> from what I read onhttp://docs.python.org/library/logging.html#logging.basicConfig, I would
> think logging.basicConfig takes precendence as long as I do it early enough,
> but that's not how it seems to work.
>
> And I have this problem with the "old" code, as well as the latest from
> develop branch.
> So, how do I best disable all messages (other then error/critical) from both
> my own code and gensim?

I checked source of the Python logging module (again!) and I see 3
ways:

1) you set the level of the *handler* that prints your messages. The
reason is, level of *loggers* only applies when the message is first
seen (in `getLogger('gensim.utils')` or whatever), and not where it's
actually emitted (which is prolly `logging.root`). Confusingly,
handlers also have their own levels, and this level applies to every
message they process, so that's what you want:
`logging.root.handlers[0].level = logging.WARNING`
2) we change all loggers' levels in gensim to NOTSET; according to the
source code, when a logger's level is not set, its parent's level is
used (recursively) when the message is first seen. So it would then be
enough to set `logging.root.level = logging.WARNING` and that would
automatically apply to all gensim loggers, unless their level (or the
level of any logger on the path to root) is set manually. It feels
more like a hack though, too brittle?
3) there was something else but i forgot now and the place is getting
flooded, electricity may get cut off soon, better send now. i'll
repost.

-rr

Dieter Plaetinck

unread,
Apr 1, 2011, 8:27:15 AM4/1/11
to gen...@googlegroups.com
thanks Radim,
this seems to work:
logging.basicConfig(level=logging.WARNING)

logging.root.handlers[0].level = logging.WARNING

note that if you leave out the first line, you get:

    logging.root.handlers[0].level = logging.WARNING
IndexError: list index out of range

I don't quite get it, but I think I don't even _want_to know... I'll leave it like this, as long as it works..
Good luck with the flood situation, I hope things are okay and there isn't too much damage.?

Dieter

Radim

unread,
Apr 3, 2011, 5:18:08 AM4/3/11
to gensim
Yay internet works again! :F

The third way I had in mind was you changing all logger levels
manually, but that is ugly.

I will implement option 2) in gensim in the next release and leave it
like that for now -- until someone reports another problem with that
approach.

> Good luck with the flood situation, I hope things are okay and there isn't
> too much damage.?

Well about 40 people died here... and tens of thousands had to leave
home. It didn't affect us personally that much, luckily.

Cheers,
Radim
Reply all
Reply to author
Forward
0 new messages