Printed output on clusters

40 views
Skip to first unread message

Ian Goodfellow

unread,
Oct 20, 2012, 6:00:15 PM10/20/12
to lisa_labo Labo, pylearn-dev
Does anyone have a good solution to the problem of logs on the cluster not appearing / not being in sync, etc?
It seems like most of the time I have a problem on the cluster, it is hard to debug, because a good portion of
the output from the program doesn't get copied into the log file, or the log file just never appears.
Usually this seems to be the fault of the job management software, since usually the filesystem is working just
fine, and if I change my code to write out files at a pre-specified location, the writes show up.
I'm thinking maybe we should make a logging module for pylearn2, so that the stuff pylearn2 currently prints to
stdout could be easily redirected to a file when running on the cluster.
Before I do that, I thought I should check if there is any existing logging system we should aim to be compatible with,
or just use. For example, is this something that jobman solves?

David Warde-Farley

unread,
Oct 20, 2012, 11:49:02 PM10/20/12
to pylea...@googlegroups.com
http://docs.python.org/library/logging.html

I've been meaning to move all print statements over to appropriate log
levels on a logger from the stdlib logging module, anyway. Multiple
destinations can be set up using the addHandler mechanism. I'll try to
find a tutorial-style introduction, I've read/watched a few in the
past.

IIRC there are other more specialized modules that implement this API
as well so we can always move to one of those.

David

Ian Goodfellow

unread,
Oct 21, 2012, 10:48:09 AM10/21/12
to pylea...@googlegroups.com
If the client is also using the logging module, would we overwrite their settings?
I seem to remember there being a discussion of something like that happening
on one of the theano mailing lists.

David Warde-Farley

unread,
Oct 21, 2012, 1:54:32 PM10/21/12
to pylea...@googlegroups.com
On Sun, Oct 21, 2012 at 10:48 AM, Ian Goodfellow
<goodfel...@gmail.com> wrote:
> If the client is also using the logging module, would we overwrite their
> settings?
> I seem to remember there being a discussion of something like that happening
> on one of the theano mailing lists.

I think that can only happen if we modify the "global logger", which
we may want to but we needn't necessarily. I'll have a look at the
mailing list to see if I can find the issue.

Razvan Pascanu

unread,
Oct 21, 2012, 1:57:35 PM10/21/12
to pylea...@googlegroups.com
I think the problem was that in certain places of Theano people did exactly that, changed the global logger. At least the email I'm thinking of, which is rather old. The problem was fixed, and theano defines its own logger, and at times, submodules of theano create their on loggers as well. 

As far as I know the logger system works pretty well, if used properly. But probably Fred or Pascal know more about it. 

Razvan

David Warde-Farley

unread,
Oct 21, 2012, 2:02:38 PM10/21/12
to pylea...@googlegroups.com
So here's the issue: the logging module has a concept of a "log
level", so messages are classified in a severity ordering. INFO is
higher priority than DEBUG, WARNING is higher than INFO, ERROR is
higher than WARNING, and CRITICAL is higher than all. Then you tell
the logger the minimum level you want to see.

What had happened in Theano was that someone had modified the global
logger state from minimum level NOTSET to minimum level DEBUG, so a
bunch of third party modules also started spewing debug messages. This
is pretty easily avoided.

Frédéric Bastien

unread,
Oct 22, 2012, 1:27:33 PM10/22/12
to pylea...@googlegroups.com
Hi,

The easy hack would be to change sys.stdin to a file descriptop that
you open in writing.

Logger could be good, but that should be used only for what in inside
pylearn2. This don't fix the problem of the stuff the user print
itself.

jobman already redirect the stdout/stderr correctly to a file. You do
not need a database to use it. You can use "jobman cmdline" to just do
the rediction. "jobman cmdline --help" for detail. This will also add
some debug information that we gatter automatically, like the 'job
scheduler' job id, the host name, etc. I think it is better to collect
automatically some debug info in only one system if possible. We can
discuss where.

Can you tell me what you did finally?

Also, on witch cluster you had the problem?

Fred

On Sun, Oct 21, 2012 at 2:02 PM, David Warde-Farley

Ian Goodfellow

unread,
Oct 22, 2012, 1:57:54 PM10/22/12
to pylea...@googlegroups.com
On Mon, Oct 22, 2012 at 1:27 PM, Frédéric Bastien <no...@nouiz.org> wrote:
Hi,

The easy hack would be to change sys.stdin to a file descriptop that
you open in writing.

Logger could be good, but that should be used only for what in inside
pylearn2. This don't fix the problem of the stuff the user print
itself.

jobman already redirect the stdout/stderr correctly to a file. You do
not need a database to use it. You can use "jobman cmdline" to just do
the rediction. "jobman cmdline --help" for detail. This will also add
some debug information that we gatter automatically, like the 'job
scheduler' job id, the host name, etc. I think it is better to collect
automatically some debug info in only one system if possible. We can
discuss where.

Can you tell me what you did finally?

In this most recent case, I told the admins about some of the problems I was having,
and they fixed something related to the "username server".
 

Also, on witch cluster you had the problem?

briaree. Pretty much any time I have a problem on briaree, the logs never appear,
or are missing a lot of output.
 

Frédéric Bastien

unread,
Oct 22, 2012, 2:46:51 PM10/22/12
to pylea...@googlegroups.com
ok,

if you have the problem again, can you tell me if "jobman cmdline" fix
the problem?

Fred

Ian Goodfellow

unread,
Oct 22, 2012, 3:25:33 PM10/22/12
to pylea...@googlegroups.com
Yeah, I'll try it next time.
Reply all
Reply to author
Forward
0 new messages