all subjobs going to one node of a pbs cluster

5 views
Skip to first unread message

Emilio Gallicchio

unread,
Jan 24, 2014, 1:07:07 PM1/24/14
to bigjob...@googlegroups.com
Hi

I set up a simple pbs-based cluster and attempted to execute a BigJob application (ASyncRE) on it. The issue is that all of the subjobs go to only one of the nodes, which becomes oversubscribed and overloaded. 

I see that a similar problem has been discussed recently:

https://github.com/saga-project/BigJob/issues/107

but I don't think I understand the fix if one was found. ASyncRE sets processes_per_node and I have tried different combinations. None seemed to have an effect.

I can share log files etc. if someone can point me in the right direction.

Thanks

Emilio


--
Dr. Emilio Gallicchio
Assistant Professor
Department of Chemistry
CUNY Brooklyn College

Reply all
Reply to author
Forward
0 new messages