Hi
I set up a simple pbs-based cluster and attempted to execute a BigJob application (ASyncRE) on it. The issue is that all of the subjobs go to only one of the nodes, which becomes oversubscribed and overloaded.
I see that a similar problem has been discussed recently:
https://github.com/saga-project/BigJob/issues/107
but I don't think I understand the fix if one was found. ASyncRE sets processes_per_node and I have tried different combinations. None seemed to have an effect.
I can share log files etc. if someone can point me in the right direction.
Thanks
Emilio
--
Dr. Emilio Gallicchio
Assistant Professor
Department of Chemistry
CUNY Brooklyn College