Running a very large number of tasks with Ruffus over a long period of time

14 views
Skip to first unread message

Bernard James Pope

unread,
Sep 13, 2016, 8:35:51 PM9/13/16
to ruffus_discuss
Hi everyone,

I've got a very large computation to do in the next few months (a few million CPU hours in total).

It is broken up into tens of thousands of tasks, and each task probably takes a couple of hundred CPU hours.

I was thinking of using Ruffus to orchestrate the computation, but I've only ever run Ruffus pipelines with hundreds of parallel tasks (not tens of thousands of parallel tasks).

So I was wondering if there was anything to watch out for?

One thing I can think of:

- I plan to use multithread instead of multiprocess, because I do not want to create thousands of processes at once. Will multithread be able to handle thousands of threads simultaneously?

Cheers,
Bernie

Leo Goodstadt 顧維斌

unread,
Sep 15, 2016, 6:07:03 PM9/15/16
to ruffus_...@googlegroups.com
Do not use multiprocess! Definitely. I have certainly used a thousand threads.
If you don't get any other recommendations from the community, why don't you write some dummy tasks?
I foresee that completion / startup storms might be one source of problem. This is where a huge number of tasks all completed at the same time and overloaded the cluster master.
They were a problem for multi processing, and went away with multithreading. I think the python GIL turns out to be friend in disguise by serialising everything.
Leo


--
You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bernard James Pope

unread,
Sep 28, 2016, 9:21:59 PM9/28/16
to ruffus_...@googlegroups.com
Hi Leo,

Thanks for your reply. The dummy tasks is a good idea.

After much consideration, I've decided to implement a bespoke job-submission system for this particular project. Mainly because I'm quite interested in producing lots of job monitoring statistics.

Cheers,
Bernie
> To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages