Scoobi run-time information and logging

43 views
Skip to first unread message

Alex Cozzi

unread,
Jan 5, 2012, 6:56:11 PM1/5/12
to scoob...@googlegroups.com
Scoobi is currently very restrained in informing the user of what is being run. According to Ben:


Ben Lever
Post reply
3:37 PM (5 minutes ago)
I've put zero effort into the Scoobi output to date, so I'm very open 
to improving it. 

Could you start a thread on scoobi-dev just outlining what you would 
like to see? Most of the output would be generated in Executor.scala 
and MapReducer.scala, so if you want to play with adding some better 
output, that might be a good way to start. 

Cheers. 


Generally I found run-time information messages useful in the following ways:
1) reassuring the user that things are OK while a long job is running
2) helping to optimize jobs by showing inappropriate or inefficient configurations
3) help debugging by locating the source of errors.

At a minimum I'd like to see some minimal output for each map-reduce job being executed, probably with the following information:

1) input paths and their size
2) number of mappers/reducers
3) which step in the jobs DAG: i.e.: Task 3 of 14.

and a job name that helps identify it in the task manager.

Beyond this I'd love to have some more information:
1. have a way to dump out a text representation of the entire Job DAG, i.e. a graph of map-reduce steps. Extra points if we could use a representation that we can easily plot with graphviz, for example
2. warning if any of the map-reduce steps is affected by strongly skewed key distributions (this leads to very inefficient jobs).
3. some statistics about memory consumption/ IO load for each job.
4. a way to go from job id to the lines of code.


What do you think? which kind of output would you like to see? And should we just start putting some log4j or similar logging statements in Executor.scala or MapReducer.scala or do we need something more than that?

All the best
Alex

Age Mooij

unread,
Jan 11, 2012, 5:59:04 PM1/11/12
to scoob...@googlegroups.com
Definitely +1 on being able to produce a visualization of the DAG. Diagrams like that are very useful for understanding what Scoobi is doing and where you might be doing something wrong.

Age

Joseph Beynon

unread,
May 2, 2012, 6:08:00 PM5/2/12
to scoob...@googlegroups.com
I guess I'm a bit late in the conversation, but I would also appreciate being able to view the execution plan for the job. Has any headway been made on this in the last few months?

Ben Lever

unread,
May 6, 2012, 8:50:17 PM5/6/12
to scoob...@googlegroups.com
Hi Joseph,

Some minor improvements have been made. The total number of MR jobs and which job is currently being executed are now logged. There is now also logging of input and output paths. The next step would be to log the entire DAG and what part of the DAG is being executed in each job. That will probably be a DEBUG log. Won't be in the next release (0.4.0) but we'll try for the next one.

Cheers,
Ben.
Reply all
Reply to author
Forward
0 new messages