Trouble with makeflow + slurm partition

11 views
Skip to first unread message

Willem Marais

unread,
May 2, 2024, 7:12:43 AM5/2/24
to Cooperative Computing Tools
Hi

I'm using makeflow to manage slurm jobs, and the jobs are submitted to a specific slurm partition via the command:

makeflow --jx  makeflow.jx  -T slurm --retry-count=0 --batch-options "--partition=aces"

For whatever reason only one job runs at a time, whereas when I run my makeflow.jx file on my laptop (via -T local), multiple parallel jobs are executed. 

Is there a way to diagnose the problem to figure out why only one job is executed at a time on the slurm partition? 

I am using makeflow version 7.10.1 FINAL. 

Willem

bto...@nd.edu

unread,
May 2, 2024, 7:13:01 AM5/2/24
to Cooperative Computing Tools
Willem,
Do you see any line that says "max running remote jobs:" ?
You can also run with -dall to see if the jobs are being created, but not being executed by slurm. If this is the case, then perhaps it is a configuration issue that could be fixed by adding needed parameters to --batch-options.

The maximum number of jobs can be set with the -J option, or the MAKEFLOW_MAX_REMOTE_JOBS environment variable. Although it is unlikely, I'd see what  echo $MAKEFLOW_MAX_REMOTE_JOBS  prints, just to make sure it is not set.

Ben
Reply all
Reply to author
Forward
0 new messages