juicer error on SLURM

807 views
Skip to first unread message

Daofeng Li

unread,
May 17, 2017, 4:21:03 PM5/17/17
to 3D Genomics
Hi Dear group,

I am trying to run juicer on our HiC data, at an SLURM cluster, but get following error:
Could anyone please tell me how to fix this? Thanks a lot.

(-: Looking for fastq files...fastq files exist

(-: Aligning files matching ./work/nkcell//fastq/*_R*.fastq*

 in queue debug to genome mm10 with site file restriction_sites/mm10_MboI.txt

(-: Created ./work/nkcell//splits and ./work/nkcell//aligned.

 Splitting files

srun: job 5738765 queued and waiting for resources

srun: job 5738765 has been allocated resources

(-: Starting job to launch other jobs once splitting is complete

Submitted batch job 5738798

Submitted batch job 5738802

Submitted batch job 5738806

Submitted batch job 5738810

Submitted batch job 5738814

Submitted batch job 5738818

Submitted batch job 5738822

Submitted batch job 5738826

Submitted batch job 5738830

Submitted batch job 5738834

sbatch: error: Batch job submission failed: Invalid generic resource (gres) specification

sbatch: error: Batch job submission failed: Job dependency problem

(-: Finished adding all jobs... Now is a good time to get that cup of coffee..

Muhammad Saad Shamim

unread,
May 17, 2017, 11:05:16 PM5/17/17
to Daofeng Li, 3D Genomics
Hey Daofeng,

Hope you are doing well!

That error relates to the GPU requirement for HiCCUPS.
The HiCCUPS SLURM job requires a GPU, which we set with:

#SBATCH --gres=gpu:kepler:1

This line will need to be customized for your cluster.
The rest of the jobs though should run fine, it's just that HiCCUPS can't run without CUDA/NVIDIA GPUs.

​Best,​

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/161e4cf1-353e-4129-aa9f-9ea65667e748%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daofeng Li

unread,
May 17, 2017, 11:18:27 PM5/17/17
to Muhammad Saad Shamim, 3D Genomics
Hi Muhammad,

Thank you so much for the reply.
yeah, I am doing well.

I also checked with our Sys admin and we don't have any GPUs on our clusters.
Is it OK to just remove that line? the rest of code should run well?

Thanks again.

Daofeng

Muhammad Saad Shamim

unread,
May 17, 2017, 11:21:46 PM5/17/17
to Daofeng Li, 3D Genomics
Yes that should be fine. In that case, you may want to remove both the HiCCUPS job and its dependent job (MotifFinder on loop calls from HiCCUPS).

Daofeng Li

unread,
May 17, 2017, 11:48:57 PM5/17/17
to Muhammad Saad Shamim, 3D Genomics
Thanks Muhammad. I am trying to remove HiCCUPS related code.
Is ArrowHead one of HiCCUPS's dependent job?

Daofeng

Muhammad Saad Shamim

unread,
May 18, 2017, 12:04:41 AM5/18/17
to Daofeng Li, 3D Genomics
Arrowhead is independent of HiCCUPS and does not require a GPU.
So you should still be able to run it fine.

Daofeng Li

unread,
May 18, 2017, 12:28:29 AM5/18/17
to Muhammad Saad Shamim, 3D Genomics
Thanks Muhammad. I submitted the job, will see if it works.

Best,

Daofeng

Muhammad Saad Shamim

unread,
May 18, 2017, 12:30:28 AM5/18/17
to Daofeng Li, 3D Genomics
Just to clarify, the previous run should have worked as well and created hic files as well as the arrowhead results.
The error just indicated that hiccups wouldn't run, but everything else upstream should have ran fine.

Neva Durand

unread,
May 18, 2017, 4:14:52 AM5/18/17
to Muhammad Saad Shamim, Daofeng Li, 3D Genomics
Yes, in fact everything should have finished despite that error.


For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Daofeng Li

unread,
May 18, 2017, 11:29:13 AM5/18/17
to Neva Durand, Muhammad Saad Shamim, 3D Genomics
Thanks Neva and Muhammad.
Quick question, I specified paired end reads, but seems the alignment by BWA was done at single end mode, right? if yes, why?
Best,

Daofeng

Neva Durand

unread,
May 18, 2017, 11:40:36 AM5/18/17
to Daofeng Li, Muhammad Saad Shamim, 3D Genomics
Hello Daofeng,

Paired end alignment makes assumptions about the insert size that are not appropriate for Hi-C data.  Since we expect a ligation product, the read ends may be quite far from one another.  We align each read end separately and then combine them.  

Best
Neva


Daofeng Li

unread,
May 22, 2017, 11:18:02 AM5/22/17
to Neva Durand, Muhammad Saad Shamim, 3D Genomics
Hi Neva and Muhammad,

I got the following error:

Unrecognized option: -Xgcthreads1
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

my JAVA:
$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

Could I remove this option?
How could I resume the job without re-alignment? the stage parameter like -S final?

Thanks a lot.


Daofeng

Muhammad Saad Shamim

unread,
May 22, 2017, 11:41:31 AM5/22/17
to Daofeng Li, Neva Durand, 3D Genomics
It should be fine to remove that option.
See this thread as well:

And yes, -S final should be fine for resuming after the merged_nodups.txt has already been created.

​Best,​

Daofeng Li

unread,
May 22, 2017, 11:46:58 AM5/22/17
to Muhammad Saad Shamim, Neva Durand, 3D Genomics
Thanks Muhammad.

Another question, I probably did something wrong that I put all my fastq files from 3 samples in one folder...and I only see one merged_nodups.txt 
I guess I should create 3 work folder for them?

Best,

Daofeng

Muhammad Saad Shamim

unread,
May 22, 2017, 11:51:07 AM5/22/17
to Daofeng Li, Neva Durand, 3D Genomics
Are the fastqs from different hic libraries or just additional sequencing runs on one hic lib? If the latter, then you're fine.
If not - yes you'd need to build each library separately and then megamap them with the mega.sh script.

Daofeng Li

unread,
May 22, 2017, 12:06:06 PM5/22/17
to Muhammad Saad Shamim, Neva Durand, 3D Genomics
aha..they are from 3 libraries (samples). I need re-run them.

What does mega.sh do? I checked that script, seems there is not too much documentation.
I think I need find different domains/contacts between the 3 samples, which script could do that? :)
Thanks.

Daofeng

Neva Durand

unread,
May 23, 2017, 10:22:13 AM5/23/17
to Daofeng Li, Muhammad Saad Shamim, 3D Genomics
Hello Daofeng,

"mega.sh" combines different technical and biological samples into "mega maps".  It first creates a new "merged_nodups.txt" that is a merge of the samples, then runs stats and Juicer Tools to create the .hic files and annotate loops and domains.

For differences in loops, you can use HiCCUPSDiff:  https://github.com/theaidenlab/juicer/wiki/HiCCUPSDiff

For differences in domains, you can use the optional feature and control lists in Arrowhead to look at the corner scores for putative domains in a different dataset:  https://github.com/theaidenlab/juicer/wiki/Arrowhead

Best
Neva


Daofeng Li

unread,
May 24, 2017, 12:26:58 AM5/24/17
to Neva Durand, Muhammad Saad Shamim, 3D Genomics
Thanks Neva.

This is my first time processing HiC data using juicer.
Why the Below MAPQ threshold have 2 percentage values?
or how was my data look like? in terms of quality. thanks.

Experiment description: 
Sequenced Read Pairs:  29,826,364
 Normal Paired: 27,319,575 (91.60%)
 Chimeric Paired: 285 (0.00%)
 Chimeric Ambiguous: 1,305 (0.00%)
 Unmapped: 2,505,199 (8.40%)
 Ligation Motif Present: 3,260 (0.01%)
Alignable (Normal+Chimeric Paired): 27,319,860 (91.60%)
Unique Reads: 12,082,032 (40.51%)
PCR Duplicates: 15,236,263 (51.08%)
Optical Duplicates: 1,565 (0.01%)
Library Complexity Estimate: 14,123,240
Intra-fragment Reads: 90,093 (0.30% / 0.75%)
Below MAPQ Threshold: 11,666,303 (39.11% / 96.56%)
Hi-C Contacts: 325,636 (1.09% / 2.70%)
 Ligation Motif Present: 43  (0.00% / 0.00%)
 3' Bias (Long Range): 91% - 9%
 Pair Type %(L-I-O-R): 25% - 25% - 26% - 25%
Inter-chromosomal: 213,779  (0.72% / 1.77%)
Intra-chromosomal: 111,857  (0.38% / 0.93%)
Short Range (<20Kb): 74,500  (0.25% / 0.62%)
Long Range (>20Kb): 37,357  (0.13% / 0.31%)

Daofeng

Neva Durand

unread,
May 24, 2017, 12:33:23 AM5/24/17
to Daofeng Li, Muhammad Saad Shamim, 3D Genomics
Hello Daofeng,

The number of duplicates is alarmingly high, and the number of chimeric paired reads and ligation junctions is alarmingly low.  There are other numbers off as well in the Hi-C contacts, but overall you don't have very many Hi-C contacts to begin with (1% of sequenced reads - we expect more like 80%).  You lost half of the sequenced reads to duplicates and the other half to MAPQ 0.  The percentages for the numbers after duplicate removal are (% sequenced reads / % unique reads).

So overall this looks like a failed library.

Best
Neva

Daofeng Li

unread,
May 24, 2017, 1:14:07 AM5/24/17
to Neva Durand, Muhammad Saad Shamim, 3D Genomics
Thank Neva for fast response.
Looks like my other 2 samples have similar statistics...too bad.
Our HiC protocol is actually a capture-HiC, so only interested regions were sequenced. would that be fine?

Daofeng
Reply all
Reply to author
Forward
0 new messages