Nextflow running inefficiently on SLURM mangaged HPC cluster

374 views

Skip to first unread message

Matthew Rich

unread,

Oct 2, 2021, 9:54:47 AM10/2/21

to Nextflow

Hi all,

I am a relatively new to nextflow and very new to running jobs on a cluster. I am having issues with my nextflow script sitting idle for hours at a time on the cluster not doing anything. Further, nextflow sends jobs to different nodes other than the one I reserved so I cannot request much since I quickly hit resource limits which really dampens the parallelization.

A bit more detail, my job has heavy I/O with light processing on a large number of files. Each file takes roughly 1-3 minutes. I took off all directives on my processes to simplify things

an example output from squeue https://pastebin.com/vDpVHR7T

what seff looks like https://pastebin.com/7C8D64G2

IT is not thrilled with my script as is since it bogs down the scheduler. I think this should be addressable. My script run faster on my laptop than the HPC due to these scheduling issues. On my laptop it would take about 18hrs, it takes 2 days on the cluster.

Where can I get more information in controlling how nextflow submits to SLURM?

Thanks in advance,

Matt

Here is my slurm submission script

############################################################

#!/bin/bash

#SBATCH -n 7
#SBATCH -N 1
#SBATCH --partition=general
#SBATCH --mem=8GB
#SBATCH -t 01-00:00:00
#SBATCH --mail-type=BEGIN,REQUEUE,FAIL,END
#SBATCH --mail-user= foo bar

work_dir=/pine/scr/m/j/mjrich
image_set=$1

$work_dir/nextflow -c $work_dir/nextflow.config run $work_dir/image_processing.nf --folder $image_set
###########################################################

here is my nextflow config

############################################################

process.container = './longleaf.sif'
singularity.enabled = true
singularity.autoMounts = true

executor {
name='slurm'
queueSize = 25
pollInterval = '10 sec'
dumpInterval = '10 min'
exitReadTimeout = '5 min'
killBatchSize = 50
}