juicer error on SLURM

Daofeng Li

unread,

May 17, 2017, 4:21:03 PM5/17/17

to 3D Genomics

Hi Dear group,

I am trying to run juicer on our HiC data, at an SLURM cluster, but get following error:

Could anyone please tell me how to fix this? Thanks a lot.

(-: Looking for fastq files...fastq files exist

(-: Aligning files matching ./work/nkcell//fastq/*_R*.fastq*

in queue debug to genome mm10 with site file restriction_sites/mm10_MboI.txt

(-: Created ./work/nkcell//splits and ./work/nkcell//aligned.

Splitting files

srun: job 5738765 queued and waiting for resources

srun: job 5738765 has been allocated resources

(-: Starting job to launch other jobs once splitting is complete

Submitted batch job 5738798

Submitted batch job 5738802

Submitted batch job 5738806

Submitted batch job 5738810

Submitted batch job 5738814

Submitted batch job 5738818

Submitted batch job 5738822

Submitted batch job 5738826

Submitted batch job 5738830

Submitted batch job 5738834

sbatch: error: Batch job submission failed: Invalid generic resource (gres) specification

sbatch: error: Batch job submission failed: Job dependency problem

(-: Finished adding all jobs... Now is a good time to get that cup of coffee..

Muhammad Saad Shamim

unread,

May 17, 2017, 11:05:16 PM5/17/17

to Daofeng Li, 3D Genomics

Hey Daofeng,

Hope you are doing well!

That error relates to the GPU requirement for HiCCUPS.

The HiCCUPS SLURM job requires a GPU, which we set with:

#SBATCH --gres=gpu:kepler:1

This line will need to be customized for your cluster.

The rest of the jobs though should run fine, it's just that HiCCUPS can't run without CUDA/NVIDIA GPUs.

Best,

- Muhammad Saad Shamim

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/161e4cf1-353e-4129-aa9f-9ea65667e748%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daofeng Li

unread,

May 17, 2017, 11:18:27 PM5/17/17

to Muhammad Saad Shamim, 3D Genomics

Hi Muhammad,

Thank you so much for the reply.

yeah, I am doing well.

I also checked with our Sys admin and we don't have any GPUs on our clusters.

Is it OK to just remove that line? the rest of code should run well?

Thanks again.

Daofeng

Muhammad Saad Shamim

unread,

May 17, 2017, 11:21:46 PM5/17/17

to Daofeng Li, 3D Genomics

Yes that should be fine. In that case, you may want to remove both the HiCCUPS job and its dependent job (MotifFinder on loop calls from HiCCUPS).

- Muhammad Saad Shamim

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAGSXrvG3DcJ51QNR18OumpSq_Dr5%2BVb8cOvh3dR%3DeBa-5iLujg%40mail.gmail.com.

Daofeng Li

unread,

May 17, 2017, 11:48:57 PM5/17/17

to Muhammad Saad Shamim, 3D Genomics

Thanks Muhammad. I am trying to remove HiCCUPS related code.

Is ArrowHead one of HiCCUPS's dependent job?

Daofeng

Muhammad Saad Shamim

unread,

May 18, 2017, 12:04:41 AM5/18/17

to Daofeng Li, 3D Genomics

Arrowhead is independent of HiCCUPS and does not require a GPU.

So you should still be able to run it fine.

- Muhammad Saad Shamim

Daofeng Li

unread,

May 18, 2017, 12:28:29 AM5/18/17

to Muhammad Saad Shamim, 3D Genomics

Thanks Muhammad. I submitted the job, will see if it works.

Best,

Daofeng

Muhammad Saad Shamim

unread,

May 18, 2017, 12:30:28 AM5/18/17

to Daofeng Li, 3D Genomics

Just to clarify, the previous run should have worked as well and created hic files as well as the arrowhead results.

The error just indicated that hiccups wouldn't run, but everything else upstream should have ran fine.

- Muhammad Saad Shamim

Neva Durand

unread,

May 18, 2017, 4:14:52 AM5/18/17

to Muhammad Saad Shamim, Daofeng Li, 3D Genomics

Yes, in fact everything should have finished despite that error.

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAFrDNAmM8KPW1UmXOb6%3D7GOk-Jp6Wsher%2BX17S695FM6BbqhVA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Neva Cherniavsky Durand, Ph.D.

Staff Scientist, Aiden Lab

www.aidenlab.org

Daofeng Li

unread,

May 18, 2017, 11:29:13 AM5/18/17

to Neva Durand, Muhammad Saad Shamim, 3D Genomics

Thanks Neva and Muhammad.

Quick question, I specified paired end reads, but seems the alignment by BWA was done at single end mode, right? if yes, why?

Best,

Daofeng

Neva Durand

unread,

May 18, 2017, 11:40:36 AM5/18/17

to Daofeng Li, Muhammad Saad Shamim, 3D Genomics

Hello Daofeng,

Paired end alignment makes assumptions about the insert size that are not appropriate for Hi-C data. Since we expect a ligation product, the read ends may be quite far from one another. We align each read end separately and then combine them.

Best

Neva

Daofeng Li

unread,

May 22, 2017, 11:18:02 AM5/22/17

to Neva Durand, Muhammad Saad Shamim, 3D Genomics

Hi Neva and Muhammad,

I got the following error:

Unrecognized option: -Xgcthreads1

Error: Could not create the Java Virtual Machine.

Error: A fatal exception has occurred. Program will exit.

my JAVA:

$ java -version

java version "1.8.0_31"

Java(TM) SE Runtime Environment (build 1.8.0_31-b13)

Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

Could I remove this option?

How could I resume the job without re-alignment? the stage parameter like -S final?

Thanks a lot.

Daofeng

Muhammad Saad Shamim

unread,

May 22, 2017, 11:41:31 AM5/22/17

to Daofeng Li, Neva Durand, 3D Genomics

It should be fine to remove that option.

See this thread as well:

https://groups.google.com/d/msg/3d-genomics/4kCmZzjfHuY/6s-Kb0RoAwAJ

And yes, -S final should be fine for resuming after the merged_nodups.txt has already been created.

Best,

- Muhammad Saad Shamim

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAGSXrvHBAULO9brZfiuc-hnt%2BZ6FPjSuZmAALusGW4D_n%2BrT7g%40mail.gmail.com.

Daofeng Li

unread,

May 22, 2017, 11:46:58 AM5/22/17

to Muhammad Saad Shamim, Neva Durand, 3D Genomics

Thanks Muhammad.

Another question, I probably did something wrong that I put all my fastq files from 3 samples in one folder...and I only see one merged_nodups.txt

I guess I should create 3 work folder for them?

Best,

Daofeng

Muhammad Saad Shamim

unread,

May 22, 2017, 11:51:07 AM5/22/17

to Daofeng Li, Neva Durand, 3D Genomics

Are the fastqs from different hic libraries or just additional sequencing runs on one hic lib? If the latter, then you're fine.

If not - yes you'd need to build each library separately and then megamap them with the mega.sh script.

- Muhammad Saad Shamim

Daofeng Li

unread,

May 22, 2017, 12:06:06 PM5/22/17

to Muhammad Saad Shamim, Neva Durand, 3D Genomics

aha..they are from 3 libraries (samples). I need re-run them.

What does mega.sh do? I checked that script, seems there is not too much documentation.

I think I need find different domains/contacts between the 3 samples, which script could do that? :)

Thanks.

Daofeng

Neva Durand

unread,

May 23, 2017, 10:22:13 AM5/23/17

to Daofeng Li, Muhammad Saad Shamim, 3D Genomics

Hello Daofeng,

"mega.sh" combines different technical and biological samples into "mega maps". It first creates a new "merged_nodups.txt" that is a merge of the samples, then runs stats and Juicer Tools to create the .hic files and annotate loops and domains.

For differences in loops, you can use HiCCUPSDiff: https://github.com/theaidenlab/juicer/wiki/HiCCUPSDiff

For differences in domains, you can use the optional feature and control lists in Arrowhead to look at the corner scores for putative domains in a different dataset: https://github.com/theaidenlab/juicer/wiki/Arrowhead

Best

Neva

Daofeng Li

unread,

May 24, 2017, 12:26:58 AM5/24/17

to Neva Durand, Muhammad Saad Shamim, 3D Genomics

Thanks Neva.

This is my first time processing HiC data using juicer.

Why the Below MAPQ threshold have 2 percentage values?

or how was my data look like? in terms of quality. thanks.

Experiment description:

Sequenced Read Pairs: 29,826,364

Normal Paired: 27,319,575 (91.60%)

Chimeric Paired: 285 (0.00%)

Chimeric Ambiguous: 1,305 (0.00%)

Unmapped: 2,505,199 (8.40%)

Ligation Motif Present: 3,260 (0.01%)

Alignable (Normal+Chimeric Paired): 27,319,860 (91.60%)

Unique Reads: 12,082,032 (40.51%)

PCR Duplicates: 15,236,263 (51.08%)

Optical Duplicates: 1,565 (0.01%)

Library Complexity Estimate: 14,123,240

Intra-fragment Reads: 90,093 (0.30% / 0.75%)

Below MAPQ Threshold: 11,666,303 (39.11% / 96.56%)

Hi-C Contacts: 325,636 (1.09% / 2.70%)

Ligation Motif Present: 43 (0.00% / 0.00%)

3' Bias (Long Range): 91% - 9%

Pair Type %(L-I-O-R): 25% - 25% - 26% - 25%

Inter-chromosomal: 213,779 (0.72% / 1.77%)

Intra-chromosomal: 111,857 (0.38% / 0.93%)

Short Range (<20Kb): 74,500 (0.25% / 0.62%)

Long Range (>20Kb): 37,357 (0.13% / 0.31%)

Daofeng

Neva Durand

unread,

May 24, 2017, 12:33:23 AM5/24/17

to Daofeng Li, Muhammad Saad Shamim, 3D Genomics

Hello Daofeng,

The number of duplicates is alarmingly high, and the number of chimeric paired reads and ligation junctions is alarmingly low. There are other numbers off as well in the Hi-C contacts, but overall you don't have very many Hi-C contacts to begin with (1% of sequenced reads - we expect more like 80%). You lost half of the sequenced reads to duplicates and the other half to MAPQ 0. The percentages for the numbers after duplicate removal are (% sequenced reads / % unique reads).

So overall this looks like a failed library.

Best

Neva

Daofeng Li

unread,

May 24, 2017, 1:14:07 AM5/24/17

to Neva Durand, Muhammad Saad Shamim, 3D Genomics

Thank Neva for fast response.

Looks like my other 2 samples have similar statistics...too bad.

Our HiC protocol is actually a capture-HiC, so only interested regions were sequenced. would that be fine?

Daofeng

Reply all

Reply to author

Forward