Juicer on SLURM failing early steps without error

569 views
Skip to first unread message

Josh Wheaton

unread,
Apr 15, 2019, 11:34:21 AM4/15/19
to 3D Genomics
Hello Juicer team,

I am trying to run Juicer on our SLURM cluster using the provided scripts. I am currently using the MiSeq test sets for validation, and have been able to get the "CPU" version to work without issue. However, I am having some confusing problems with the cluster version. 

Specifically, it seems as though the head and alignment jobs are not running properly, although these jobs are queued, run, and SLURM reports a 'COMPLETED' job status. I have checked the associated .out and .err files in the debug directory and these files are blank. The split job does  complete successfully, but that's about it as far as I can tell.

Here is my submission command, seems pretty straightforward:

~/juicer/scripts/juicer.sh -D ~/juicer -t 4 \
> -g hg19 \
> -s MboI \
> -d $PWD


Here is the output, no errors yet:


```

(-: Looking for fastq files...fastq files exist

(-: Aligning files matching /hpchome/ciofanilab/jdw54/projects/juicer_test/fastq/*_R*.fastq*

in queue scavenger to genome hg19 with site file /dscrhome/jdw54/juicer/restriction_sites/hg19_MboI.txt

(-: Created /hpchome/ciofanilab/jdw54/projects/juicer_test/splits and /hpchome/ciofanilab/jdw54/projects/juicer_test/aligned.

(-: Starting job to launch other jobs once splitting is complete

(-: Finished adding all jobs... Now is a good time to get that cup of coffee..

```

But now, jobs are instantly executed and fail, while sacct shows them as having been completed:

       JobID                        JobName      State

------------ ------------------------------ ----------

52642742                    a1555341576_cmd  COMPLETED

52642742.ba+                          batch  COMPLETED

52642743     a1555341576HIC003_S2_L001_001+  COMPLETED

52642743.ba+                          batch  COMPLETED

52642744     a1555341576_align1_HIC003_S2_+  COMPLETED

52642744.ba+                          batch  COMPLETED

52642745     a1555341576_align2_HIC003_S2_+  COMPLETED

52642745.ba+                          batch  COMPLETED

52642746     a1555341576_merge_HIC003_S2_L+  COMPLETED

52642746.ba+                          batch  COMPLETED

52642747                  a1555341576_check  COMPLETED

52642747.ba+                          batch  COMPLETED

52642748              a1555341576_fragmerge     FAILED

52642748.ba+                          batch     FAILED

52642749            a1555341576_dedup_guard    PENDING

52642750                  a1555341576_dedup    PENDING

52642751             a1555341576_post_dedup    PENDING

52642752               a1555341576_dupcheck    PENDING

52642753                  a1555341576_stats    PENDING

52642754                    a1555341576_hic    PENDING

52642755                  a1555341576_hic30    PENDING

52642756         a1555341576_arrowhead_wrap    PENDING

52642757              a1555341576_prep_done    PENDING


Here are the contents of the debug dir:

a1555341576_alignfail  align2-52642745.out          count_ligation-52642743.out  head-52642742.out

align1-52642744.err    aligncheck-52642747.err      fragmerge-52642748.err       merge-52642746.err

align1-52642744.out    aligncheck-52642747.out      fragmerge-52642748.out       merge-52642746.out

align2-52642745.err    count_ligation-52642743.err  head-52642742.err


If I cat the results of all files in the debug directory, here is the result (output is after each filename):

a1555341576_alignfail

align1-52642744.err

align1-52642744.out

align2-52642745.err

align2-52642745.out

aligncheck-52642747.err

aligncheck-52642747.out

Mon Apr 15 11:20:13 EDT 2019

Checking /hpchome/ciofanilab/jdw54/projects/juicer_test/HIC_tmp/HIC003_S2_L001_001.fastq.gz3

***! Error in job a1555341576_merge_HIC003_S2_L001_001.fastq.gz Type squeue -j 52642746 to see what happened

Mon Apr 15 11:20:13 EDT 2019

count_ligation-52642743.err

count_ligation-52642743.out

fragmerge-52642748.err

fragmerge-52642748.out

Mon Apr 15 11:20:31 EDT 2019

***! Found errorfile. Exiting.

head-52642742.err

head-52642742.out

merge-52642746.err

merge-52642746.out


At this point, my squeue looks like this, which is due to a failure of the merge step because there was no alignment:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

         52642750 scavenger a1555341    jdw54 PD       0:00      1 (DependencyNeverSatisfied)

         52642751 scavenger a1555341    jdw54 PD       0:00      1 (Dependency)

         52642752 scavenger a1555341    jdw54 PD       0:00      1 (Dependency)

         52642753    common a1555341    jdw54 PD       0:00      1 (Dependency)

         52642754    common a1555341    jdw54 PD       0:00      1 (Dependency)

         52642755    common a1555341    jdw54 PD       0:00      1 (Dependency)

         52642756 scavenger a1555341    jdw54 PD       0:00      1 (Dependency)

         52642757 scavenger a1555341    jdw54 PD       0:00      1 (Dependency)

         52642749 scavenger a1555341    jdw54 PD       0:00      1 (JobHeldUser)



Why am I not seeing any error messages from the alignment jobs, or even the various "echo" calls within those jobs? The slurm out and err files are clearly generated, but are completely blank?

I am really puzzled by this, and cannot come up with a reason why this would occur. I have tested my 'bwa_load' statement in a separate sbatch script and it is definitely functional, so that is not the issue.

Any ideas or answers on this would be greatly appreciated!

All the best,

Josh



Neva Durand

unread,
Apr 15, 2019, 12:31:50 PM4/15/19
to Josh Wheaton, 3D Genomics
It’s hard to know what’s wrong but the fact that you don’t get anything in the header file is very strange and indicates to me that there’s something wrong with the setup (ie perhaps there are flags you need for your system that aren’t getting sent in). Did you try running qacct -j on some of those jobs? 

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/fe4bf8b8-00c7-4713-b6b9-104e651743bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Josh Wheaton

unread,
Apr 15, 2019, 12:53:13 PM4/15/19
to 3D Genomics
Hi Neva,

Thanks for your reply. Not sure what you mean by qacct (I think that's from SGE) but sacct doesn't really show any useful debugging info, only that the alignment jobs have an elapsed time of 0. That said, the SBATCH options are being properly set, it just seems like the body of the script is not being executed.

I have been iteratively changing options - focusing on the HEADER script for simplicity - and have now found that removing the "-l" option from the hashbang seems to correct the problem! Why this is the case I cannot say, but I think you must be right that it is some system-specific setting. I personally never use that flag when running batch scripts which is why I decided to try removing it. I have confirmed this works for the header and now also for the alignment scripts. So, I'd say my issue is technically resolved although I would love to know why this is happening so that I can avoid future issues - do you know why it is included to begin with? I understand that it forces a login shell but the logic behind doing so I am less clear on.

Thanks again,

Josh
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.

Neva Durand

unread,
Apr 26, 2019, 11:14:40 AM4/26/19
to Josh Wheaton, 3D Genomics
This is a bit of a relic and I'm not 100% sure why we do it. Something to examine.


To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/22b72bdb-a8c8-4e4e-969e-3f4d49d62531%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

thapap...@gmail.com

unread,
Apr 14, 2021, 7:33:45 PM4/14/21
to 3D Genomics
Hi, I am getting similar issue at the end. 

My script : bash /home/HiC/Juicer/scripts/juicer.sh -g PB440.asm.p_ctg -s none -y /home/HiC/Juicer/restriction_sites/PB440.asm.p_ctg.fasta_Arima.txt -p /home/HiC/Juicer/restriction_sites/PB440.asm.p_ctg.fasta_Arima.chrom.sizes -z /home/HiC/Juicer/reference/PB440.asm.p_ctg.fasta -D /home/HiC/Juicer -t 4 -q high -l high

everything runs ok but at the end I get the following message (no change).

32954501      high a1618423  PD        0:00      1 1   4G     (DependencyNeverSatisfied)
32954502      high a1618423  PD        0:00      1 1   8G     (DependencyNeverSatisfied)
32954503      high a1618423  PD        0:00      1 1   2G     (Dependency)

I than checked on "juicer.sh" i found that these are the last three jobs (Line 1200-1275).


    if [[  "$isNots" -eq 1 ]] 
    then
sbatch_req="#SBATCH --gres=gpu:kepler:1"
    fi
    jid=`sbatch <<- HICCUPS | egrep -o -e "\b[0-9]+$"
#!/bin/bash -l
#SBATCH -p $queue
#SBATCH --mem-per-cpu=4G
${sbatch_req}
#SBATCH -o $debugdir/hiccups_wrap-%j.out
#SBATCH -e $debugdir/hiccups_wrap-%j.err
#SBATCH -t $queue_time
#SBATCH --ntasks=1
#SBATCH -J "${groupname}_hiccups_wrap"
${sbatch_wait}
        $userstring

#${load_gpu} #Ben
#echo "load: $load_gpu" #Ben
${load_java}
date
nvcc -V
        if [ -f "${errorfile}" ]
        then 
            echo "***! Found errorfile. Exiting." 
            exit 1 
        fi 
${juiceDir}/scripts/juicer_hiccups.sh -j ${juiceDir}/scripts/juicer_tools -i $outputdir/inter_30.hic -m ${juiceDir}/references/motif -g $genomeID
date
HICCUPS`
    dependhiccups="afterok:$jid"
else
    dependhiccups="afterok"
fi

jid=`sbatch <<- ARROWS | egrep -o -e "\b[0-9]+$"
#!/bin/bash -l
#SBATCH -p $queue
#SBATCH --mem-per-cpu=8G #Ben 8G
#SBATCH -o $debugdir/arrowhead_wrap-%j.out
#SBATCH -e $debugdir/arrowhead_wrap-%j.err
#SBATCH -t $queue_time
#SBATCH --ntasks=1
#SBATCH -J "${groupname}_arrowhead_wrap"
${sbatch_wait}
        $userstring

${load_java}
date
        if [ -f "${errorfile}" ]
        then 
            echo "***! Found errorfile. Exiting." 
            exit 1 
        fi 
${juiceDir}/scripts/juicer_arrowhead.sh -j ${juiceDir}/scripts/juicer_tools -i $outputdir/inter_30.hic
date;
ARROWS`
dependarrows="${dependhiccups}:$jid"

jid=`sbatch <<- FINCLN1 | egrep -o -e "\b[0-9]+$"
#!/bin/bash -l
#SBATCH -p $queue
#SBATCH --mem-per-cpu=2G
#SBATCH -o $debugdir/fincln-%j.out
#SBATCH -e $debugdir/fincln-%j.err
#SBATCH -t 1200
#SBATCH -c 1
#SBATCH --ntasks=1
#SBATCH -J "${groupname}_prep_done"
#SBATCH -d $dependarrows
        $userstring

date
export splitdir=${splitdir}; export outputdir=${outputdir}; ${juiceDir}/scripts/check.sh
date
FINCLN1`


Any suggestion would be great.


Thanks

Neva Durand

unread,
Apr 14, 2021, 7:53:42 PM4/14/21
to thapap...@gmail.com, 3D Genomics
What does your aligned folder look like?



--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

Neva Durand

unread,
Apr 14, 2021, 7:54:08 PM4/14/21
to thapap...@gmail.com, 3D Genomics
And what does your debug/finclean say?

thapap...@gmail.com

unread,
Apr 16, 2021, 12:43:12 PM4/16/21
to 3D Genomics
Hi Neva,

My aligned folder: See picture 1
There is no finclean file in debug.

My hic360033680.hic error: 
/home/Juicer/scripts/juicer_tools: line 24:  7105 Killed                  java -Djava.awt.headless=true -Djava.library.path=`dirname $0`/lib64 -Ddevelopment=false -jar `dirname $0`/juicer_tools.jar $*
slurmstepd: error: Detected 1 oom-kill event(s) in step 33003680.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Thanks
Picture1.jpg

Neva Durand

unread,
Apr 16, 2021, 2:20:13 PM4/16/21
to thapap...@gmail.com, 3D Genomics
Hello,

Your hic file creation is failing due to out of memory. The other jobs in the queue are hiccups and Arrowhead, which won't run because you don't have a hic file. 

By any chance are you running Juicer in order to run 3D-DNA? 

thapap...@gmail.com

unread,
Apr 16, 2021, 3:21:05 PM4/16/21
to 3D Genomics
Hi Neva, 

I have updated the memory and am running now. Yes i am running Juicer to run 3D-DNA (run-asm-pipeline.sh) for heterozygous genome.

Thank you so much for quick response.

Best

Neva Durand

unread,
Apr 16, 2021, 3:44:45 PM4/16/21
to thapap...@gmail.com, 3D Genomics
You do not need the hic file for the next steps of 3D-DNA and indeed hic file creation will fail with hundreds of small contigs.

Generally run Juicer with the early exit flag (-e) when running 3D-DNA and then proceed with the merged_nodups file.

Olga Dudchenko

unread,
Apr 23, 2021, 11:54:08 AM4/23/21
to 3D Genomics
For more info on how to use our tools for assembly purposes please see the Genome Assembly Cookbook (dnazoo.org/methods). -Olga
Reply all
Reply to author
Forward
0 new messages