Post reply ![]() | |
I've put zero effort into the Scoobi output to date, so I'm very open to improving it. Could you start a thread on scoobi-dev just outlining what you would like to see? Most of the output would be generated in Executor.scala and MapReducer.scala, so if you want to play with adding some better output, that might be a good way to start. Cheers. Generally I found run-time information messages useful in the following ways: 1) reassuring the user that things are OK while a long job is running 2) helping to optimize jobs by showing inappropriate or inefficient configurations 3) help debugging by locating the source of errors. At a minimum I'd like to see some minimal output for each map-reduce job being executed, probably with the following information: 1) input paths and their size 2) number of mappers/reducers 3) which step in the jobs DAG: i.e.: Task 3 of 14. and a job name that helps identify it in the task manager. Beyond this I'd love to have some more information: 1. have a way to dump out a text representation of the entire Job DAG, i.e. a graph of map-reduce steps. Extra points if we could use a representation that we can easily plot with graphviz, for example 2. warning if any of the map-reduce steps is affected by strongly skewed key distributions (this leads to very inefficient jobs). 3. some statistics about memory consumption/ IO load for each job. 4. a way to go from job id to the lines of code. What do you think? which kind of output would you like to see? And should we just start putting some log4j or similar logging statements in Executor.scala or MapReducer.scala or do we need something more than that? All the best Alex | |