SGEGraph job dependencies

22 views
Skip to first unread message

Dave W

unread,
Jul 21, 2014, 11:59:57 AM7/21/14
to nipy...@googlegroups.com
Hi all,

I'm running a workflow using the SGEGraph plugin and my cluster admin has noticed that qmaster isn't reponding to requests intermittently, which can be caused by a large number of job dependencies.  

The memory and processing needed by the qmaster scales exponentially with the number of dependencies, so I was wondering if there is a way to set the job limit from Nipype?

Cheers,
Dave

Satrajit Ghosh

unread,
Jul 21, 2014, 1:09:10 PM7/21/14
to nipy-user
hi dave,

one thing to consider is to submit this as a job array. i've never done that in the past, but is feasible, since we know the number of jobs ahead of time.

the other question (knowing a little about your scenario) is whether you are submitting this per participant or across all participants.

cheers,

satra

--

---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hans Johnson

unread,
Jul 21, 2014, 1:21:48 PM7/21/14
to nipy...@googlegroups.com
Satra,

We were submitting per scan session (i.e. 1 time point for a given subject).  I think we can do two items manually. 

1) make the workflow more shallow for a first pass (i.e. only run 1/2 of the workflow, then run the whole workflow).  This will still run 1 job for each node that was previously finished, but they should run relatively quickly.

2) Only run a subset of sessions at a time.  Unfortunately this requires baby-sitting the cluster and submitting jobs slowly.

Hans

Satrajit Ghosh

unread,
Jul 21, 2014, 4:52:34 PM7/21/14
to nipy-user
hi hans,

from what i remember your workflow is not that complex for a single subject. both chris and i have run quite complex workflows as graphs. perhaps it's something we can tune with SGE options?

cheers,

satra

Johnson, Hans J

unread,
Jul 22, 2014, 4:53:38 PM7/22/14
to nipy...@googlegroups.com, Johnson, Glenn P, Kim, Eun Young, Welch, David M (UI Health Care)
Nipype Users,

We found a serious bug/problem with using the SGE graph based job submissions.  There is a pull request at https://github.com/nipy/nipype/pull/881

The problem is that job dependancies were being created using the job names at the time of submission.  This is OK if only 1 subject is run at a time.  However, it is common that many subjects would be submitted sequentially (i.e. from a bash script) and when the second subject was run, it would automatically generate job dependencies based on all matching job names currently running in SGE.  During the creation of the internal dependancies in SGE the second subject would match the jobs for both the first and second subject, creating extra unnecessary dependencies.  The 3 subject was even worse, and by the time 1000 subjects were submitted, the problem was unsustainable and would tax the SGE server into complete uselessness.

This also created unnecessary dependencies between subjects, and would prevent some of the nodes from running even if they were ready to complete.

Please consider reviewing and accepting this patch set ASAP.

Regards,
Hans J. Johnson

Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
Reply all
Reply to author
Forward
0 new messages