makeflow does not complete job with slurm

17 views
Skip to first unread message

sprock

unread,
Apr 11, 2022, 7:57:04 AM4/11/22
to Cooperative Computing Tools
Hellow,

Running this:

URL="/home/rmason/Scratch/elkjobs"
WD="1x1x1_220411_075247"
CP=/bin/cp

$(URL)/$(WD)/runspecies.sh: $(WD)
    LOCAL $(CP) -r $(WD) $(URL)

$(URL)/$(WD)/runspecies.txt: $(URL)/$(WD)/runspecies.sh
    LOCAL cd $(URL)/$(WD) && ./runspecies.sh > runspecies.txt

$(URL)/$(WD)/dirs: $(URL)/$(WD)/runspecies.txt
    LOCAL cd $(URL)/$(WD) && find . -type d -depth 1 > dirs

$(URL)/$(WD)/slurm-$(WD).out: $(URL)/$(WD)/dirs
    cd $(URL)/$(WD) && ./job.sh && touch slurm-$(WD).out

$(URL)/$(WD).tgz: $(URL)/$(WD)/slurm-$(WD).out
    LOCAL cd $(URL) && tar czf $(WD).tgz

with this command line:

makeflow -T slurm runslurm_1x1x1_220411_075247.mfl

makeflow does not detect completion of the slurm job and does not run the last rule.  Indeed, makeflow has to be killed from the command line.

Version:
makeflow version 8.0.0 DEVELOPMENT (released 2022-03-17 12:19:10 -0230)
        Built by rmason on 2022-03-17 12:19:10 -0230
        System: FreeBSD pyrope 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC  amd64
        Configuration: --with-swig-path /usr/local/bin/swig --with-readline-path /usr/local --prefix=/home/rmason/.local/cctools

Any ideas?

Thanks,
Roger

Ben Tovar

unread,
Apr 11, 2022, 8:11:49 AM4/11/22
to cctoo...@googlegroups.com
Roger,

Could you post the the output of running:

makeflow -dall -T slurm runslurm_1x1x1_220411_075247.mfl


Ben

--
You received this message because you are subscribed to the Google Groups "Cooperative Computing Tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cctools-nd+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cctools-nd/dd0b9401-d85b-43d3-9474-6c58db2d8f38n%40googlegroups.com.

Roger Mason

unread,
Apr 11, 2022, 9:30:00 AM4/11/22
to cctoo...@googlegroups.com
Hi Ben,

Ben Tovar <bto...@nd.edu> writes:

> Could you post the the output of running:
>
> makeflow -dall -T slurm runslurm_1x1x1_220411_075247.mfl

Attached.

Roger

fail.log.gz

Ben Tovar

unread,
Apr 11, 2022, 10:04:29 AM4/11/22
to cctoo...@googlegroups.com
Roger,

How long did you wait before canceling the workflow? I'm not sure that the slurm job got a chance to run at all. The message "batch: could not open status file "slurm.status.77"" may be printed while the job is queued in slurm, but not yet assigned to a compute node.

Could you re-run the workflow, and in some other terminal type:

squeue -u ${USER}

I want to see if the job was scheduled at all, etc.


Ben

--
You received this message because you are subscribed to the Google Groups "Cooperative Computing Tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cctools-nd+...@googlegroups.com.

Roger Mason

unread,
Apr 11, 2022, 10:13:54 AM4/11/22
to cctoo...@googlegroups.com

Ben Tovar <bto...@nd.edu> writes:

> How long did you wait before canceling the workflow?

Until I saw the load drop on the server as the slurm job completed.

> I'm not sure that the slurm job got a chance to run at all. The message "batch: could not open status file "slurm.status.77""
> may be printed while the job is queued in slurm, but not yet assigned to a compute node.
>
> Could you re-run the workflow, and in some other terminal type:
>
> squeue -u ${USER}
>
> I want to see if the job was scheduled at all, etc.

JOBID PARTITION NAME USER STATUS TIME NODES NODELIST(REASON)
78 imac makeflow rmason R 0:11 1 braidperthite

Thanks,
Roger

Roger Mason

unread,
Apr 20, 2022, 1:56:00 PM4/20/22
to cctoo...@googlegroups.com
Hi Ben,

Ben Tovar <bto...@nd.edu> writes:

> On Mon, Apr 11, 2022 at 10:13 AM Roger Mason <rma...@mun.ca> wrote:
>
> Ben Tovar <bto...@nd.edu> writes:
>
> > How long did you wait before canceling the workflow?
>
> Until I saw the load drop on the server as the slurm job completed.
>
> Could you send me the -dall output of such a run?
>
> Thanks!
>
> Ben

I sent this last week but I think it must have gone astray.

Best wishes,
Roger
fail_79.log.gz

Ben Tovar

unread,
Apr 20, 2022, 1:59:00 PM4/20/22
to cctoo...@googlegroups.com
Got it, thanks Roger.

Ben

--
You received this message because you are subscribed to the Google Groups "Cooperative Computing Tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cctools-nd+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages