Task Progress

Augusto Souza

unread,

Nov 4, 2015, 7:46:09 PM11/4/15

to disc...@googlegroups.com

Hello,

I have been working with disco and trying to profile some Jobs on it. My problem is that I need to see how the tasks that compose a job evolve during job execution. Something like:

=> N-th Impression:

JobA:

Timestamp1 - Map1 - 80%

Timestamp2 - Map2 - 70%

Timestamp3 - Map3 - 80%

=> N+1-th Impression:

JobA:

Timestamp1' - Map1 - 85%

Timestamp2' - Map2 - 75%

Timestamp3' - Map3 - 85%

Is it possible to write python mapreduce job that prints this kind of progress per task?

Thanks in advance!

Best regards,

Augusto Souza

Erik Dubbelboer

unread,

Nov 5, 2015, 2:34:23 AM11/5/15

to Disco-development

It's a bit ugly but for this you could use the same JSON endpoint as the web interface is using.

For example:
http://example.com:8988/disco/ctrl/jobinfo?name=1c4e6ec5f23d9d2239f5a06e6bc10779@5a6:ace30:3319d

The "1c4e6ec5f23d9d2239f5a06e6bc10779@5a6:ace30:3319d" part is the .name property of the Job object after you call .run() on it.

The JSON contains a pipeline object with information about each stage and how many workers are Pending, Waiting, Running, Done and Failed just like in the interface.
In theory you can calculate the total % using these.

If you also want to know the per worker % and you actually know this percentage inside your worker (you know how many lines you map function has processed, and still needs to process) (can be know using a custom map_reader)
You can just print this information and use the jobevents endpoint to fetch the log for the specific job:

http://example.com:8988/disco/ctrl/jobevents?name=1c4e6ec5f23d9d2239f5a06e6bc10779%405a6%3Aace30%3A3319d&num=100&filter=

This will return a JSON feed with log messages (including things you print) that you can then parse to get a more specific %.
This endpoint already contains lines like "MSG: [map:2] 1000000 entries mapped" produced by disco itself.

Hope that helps,
Erik

Augusto Souza

unread,

Nov 5, 2015, 5:55:33 PM11/5/15

to disc...@googlegroups.com

Hello Erik,

I think this is a great way to monitor task progress, thank you! I started working with your solution and it seems to work.

Best regards!

Augusto Souza

--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to disco-dev+...@googlegroups.com.
To post to this group, send email to disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/disco-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward