hic problems running Juicer on SGE

705 views
Skip to first unread message

Remi Coux

unread,
Sep 21, 2018, 8:49:10 AM9/21/18
to 3D Genomics
Hi Juicer gang, first thanks a lot for developing and maintaining Juicer.

I'm trying to run Juicer on an SGE cluster thus tweaked the script a little bit (see attached), however I'm facing the following problem that ressembles the previously reported stats bug for low complexity data but I believe it starts during generation of the hic file:

The run doesn't complete and two jobs remain queued and are not listed in the jobs.out, I get a very small inter_30.hic file (16M), the dups, merged_nodups and merged_sorted are respectively 10, 17 and 28G.

I get the following hic30***.err 
Picked up _JAVA_OPTIONS: -Xmx32g
Error while reading graphs file: java.io.FileNotFoundException: /ifs/data/lehmannlab/couxr01/aligned/inter_30_hists.m (No such file or directory)
java.lang.NumberFormatException: For input string: "chrUn_DS484226v1"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:203)
        at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:247)
        at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:498)
        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:376)
        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:286)
        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:105)
        at juicebox.tools.HiCTools.main(HiCTools.java:98)

here's the the stats30**.err
id: cannot find name for user ID 1877
id: cannot find name for group ID 1877
id: cannot find name for user ID 1877
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/sge/var/spool/node026/job_scripts/5671321: line 1: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/cm/local/apps/sge/var/spool/node026/job_scripts/5671321: line 1: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

My data shouldn't be that low complexity: my fastq files contain ~500M reads per sample, 95% of which >=MAPQ30 (per illumina flowcell stats and HiCUP report). However, the inter.txt file only gives ~143M read pairs, but none of the align reports show errors. None of the other report files contain errors, the chimeric and count_ligations are all empty (I don't know if they should be).

Experiment description: Juicer version 1.5.6; BWA 0.7.7-r441; 1 threads; splitsize 45000000; openjdk version "1.8.0_144"; Juicer Tools Version 1.7.6; /ifs/data/lehmannlab/couxr01/juicer/scripts/juicer.sh -g dm6 -s DpnII -z /ifs/data/lehmannlab/couxr01/juicer/reference/dm6.fa -p /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6.chrom.sizes -y /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6_DpnII.txt -D /ifs/data/lehmannlab/couxr01/juicer
Sequenced Read Pairs:  143,139,799
 Normal Paired: 111,560,517 (77.94%)
 Chimeric Paired: 36,111 (0.03%)
 Chimeric Ambiguous: 12,364,736 (8.64%)
 Unmapped: 19,178,408 (13.40%)
 Ligation Motif Present: 147,987,325 (103.39%)
Alignable (Normal+Chimeric Paired): 111,596,628 (77.96%)
Unique Reads: 68,465,944 (47.83%)
PCR Duplicates: 41,651,617 (29.10%)
Optical Duplicates: 282,534 (0.20%)
Library Complexity Estimate: 105,911,879
Intra-fragment Reads: 18,678,354 (13.05% / 27.28%)
Below MAPQ Threshold: 26,037,549 (18.19% / 38.03%)
Hi-C Contacts: 23,750,041 (16.59% / 34.69%)
 Ligation Motif Present: 11,216,925  (7.84% / 16.38%)
 3' Bias (Long Range): 76% - 24%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 2,812,642  (1.96% / 4.11%)
Intra-chromosomal: 20,937,399  (14.63% / 30.58%)
Short Range (<20Kb): 16,860,143  (11.78% / 24.63%)
Long Range (>20Kb): 4,077,037  (2.85% / 5.95%)

Could you please let me know if this is the stats bug? Or if you have any idea on what could be causing this - I don't really know where to look for more detailed reports.

Thanks in advance
Remi 
juicer.sh
Capture d’écran 2018-09-21 à 14.37.23.png

Muhammad Saad Shamim

unread,
Sep 21, 2018, 11:10:03 AM9/21/18
to rx....@gmail.com, 3D Genomics
Hey Remi,

Are you running juicer.sh on all the samples in one fastq folder?
Each Hi-C library should be processed as separate runs, and later combined into megamaps as appropriate.
Is the data above for just one library or the aggregation of all of them?

Some of this also looks like an error on the cluster, which would require contacting your cluster administrator (looks like permissions for your account?).
id: cannot find name for user ID 1877
id: cannot find name for group ID 1877
id: cannot find name for user ID 1877
That may be why the stats jobs don't reflect an accurate count.

Also, can you share your changes to juicer.sh as a fork of github.com/theaidenlab/juicer on GitHub so that we can track/compare changes?

Best,


--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2e481208-593c-46e3-b86e-92336ca65472%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Remi Coux

unread,
Sep 21, 2018, 11:40:54 AM9/21/18
to 3D Genomics
Thanks for the prompt reply, I'm contacting mu cluster's admins, will update if they manage to fix the problem.

I'm running samples one by one and plan on merging them using the mega script afterwards, thanks. 

I created the forked as requested, you can find it here https://github.com/RXCoux/juicer/blob/master/UGER/scripts/juicer_SGE.sh

Have a great day
Remi

Remi Coux

unread,
Sep 27, 2018, 3:51:33 PM9/27/18
to 3D Genomics
Hi, the cluster admins replied that it was likely due to a RAM shortage on one of the nodes, I increased the hic and stats jobs' requested memory and juicer ran without a single error. However, I still get a small hic file (170mb), few domains called (200 vs ~500 published). The number of mapped read pairs in the inter.txt file is roughly 140M vs 190M given by HiCUP. Interestingly, two jobs remain queued but are not listed in debug/jobs*.out 

These experiments were done on Drosophila that has a very compact genome, I know that settings need to be modified to run other HiC packages, but I don't really see what I could tune in here.


Would you guys have any suggestion/idea/comments please?

Thanks in advance
Remi

Muhammad Saad Shamim

unread,
Sep 27, 2018, 4:38:47 PM9/27/18
to rx....@gmail.com, 3D Genomics
Hey Remi, 


Which jobs remain queued?
Is it 140M read pars or Hi-C contacts?
Which map are you referring to with the 500 domains / how many reads did they have?

Remi Coux

unread,
Sep 27, 2018, 5:20:32 PM9/27/18
to Muhammad Saad Shamim, 3D Genomics
Hi Muhammad, thanks for the prompt reply, 

The two jobs that remain queued (I've let them for > 48h in the past and they never completed) are a1538058429_hic (I get an inter_30.hic file but not inter.hic) and a1538058429_prep_finalize (see attached). Weirdly they do not appear in the jobs.out list.

I also attach the inter.txt stats that show Sequenced Read Pairs:  140,742,562, are they valid pairs? In comparison, HiCUP (report attached as well) gives 186,546,648 total read pairs but 147,622,223 valid pairs.

REgarding the domains, papers such as  
PMC5389536
  found ~ 500 TADs (and many more A/B compartments) but with ~ 1B reads and 600k contacts.

I will combine my replicates (I have 2 per conditions, roughly 300M sequenced pairs) and will report back.

What could explain such a small hic file?
--

RX Coux

5685788.txt
WT_DpnII_rep2_R1_2.HiCUP_summary_report.html
inter.txt
5685792.txt

Remi Coux

unread,
Oct 4, 2018, 5:35:35 PM10/4/18
to Muhammad Saad Shamim, 3D Genomics
Hi, I had a min_vmem option set super high for the hic which left it queued, my bad. I changed it and the script ran completely, I'm not getting 400-500M hic files and ~ 450 - 500 domains are called which is reasonable.

I however played with the splits number and obtained very different results:

Experiment description: Juicer version 1.5.6; BWA 0.7.7-r441; 1 threads; splitsize 45000000; openjdk version "1.8.0_144"; Juicer Tools Version 1.7.6; /ifs/data/lehmannlab/couxr01/juicer/scripts/juicer.sh -g dm6 -s DpnII -z /ifs/data/lehmannlab/couxr01/juicer/references/dm6.fa -p /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6.chrom.sizes -y /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6_DpnII.txt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
Sequenced Read Pairs:  325,032,995
 Normal Paired: 250,489,698 (77.07%)
 Chimeric Paired: 24,535 (0.01%)
 Chimeric Ambiguous: 17,007,191 (5.23%)
 Unmapped: 57,511,545 (17.69%)
 Ligation Motif Present: 270,587,500 (83.25%)
Alignable (Normal+Chimeric Paired): 250,514,233 (77.07%)
Unique Reads: 142,892,308 (43.96%)
PCR Duplicates: 106,909,089 (32.89%)
Optical Duplicates: 712,836 (0.22%)
Library Complexity Estimate: 200,706,330
Intra-fragment Reads: 6,623,902 (2.04% / 4.64%)
Below MAPQ Threshold: 77,068,008 (23.71% / 53.93%)
Hi-C Contacts: 59,200,398 (18.21% / 41.43%)
 Ligation Motif Present: 24,759,348  (7.62% / 17.33%)
 3' Bias (Long Range): 82% - 18%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 14,519,672  (4.47% / 10.16%)
Intra-chromosomal: 44,680,726  (13.75% / 31.27%)
Short Range (<20Kb): 20,582,814  (6.33% / 14.40%)
Long Range (>20Kb): 24,097,713  (7.41% / 16.86%)

543 domains called, finalcheck.out has the following error message
***! Error! The statistics do not add up. Alignment likely failed to complete on one or more files. Run relaunch_prep.sh
Stats don't add up.  Check /ifs/data/lehmannlab/couxr01/aligned for results (no other error messages)

If I change the split size to 25000000 (same fastq files)

Experiment description: Juicer version 1.5.6; BWA 0.7.7-r441; 1 threads; splitsize 22500000; openjdk version "1.8.0_144"; Juicer Tools Version 1.7.6; /ifs/data/lehmannlab/couxr01/2/juicer/scripts/juicer.sh -g dm6 -s DpnII -z /ifs/data/lehmannlab/couxr01/2/juicer/references/dm6.fa -p /ifs/data/lehmannlab/couxr01/2/juicer/restriction_sites/dm6.chrom.sizes -y /ifs/data/lehmannlab/couxr01/2/juicer/restriction_sites/dm6_DpnII.txt -C 22500000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Sequenced Read Pairs:  582,455,100
 Normal Paired: 473,366,678 (81.27%)
 Chimeric Paired: 46,517 (0.01%)
 Chimeric Ambiguous: 304,712 (0.05%)
 Unmapped: 108,737,193 (18.67%)
 Ligation Motif Present: 270,587,500 (46.46%)
Alignable (Normal+Chimeric Paired): 473,413,195 (81.28%)
Unique Reads: 220,188,897 (37.80%)
PCR Duplicates: 250,812,721 (43.06%)
Optical Duplicates: 932,004 (0.16%)
Library Complexity Estimate: 264,992,336
Intra-fragment Reads: 9,871,762 (1.69% / 4.48%)
Below MAPQ Threshold: 135,181,270 (23.21% / 61.39%)
Hi-C Contacts: 75,135,865 (12.90% / 34.12%)
 Ligation Motif Present: 30,950,965  (5.31% / 14.06%)
 3' Bias (Long Range): 82% - 18%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 18,437,030  (3.17% / 8.37%)
Intra-chromosomal: 56,698,835  (9.73% / 25.75%)
Short Range (<20Kb): 26,233,809  (4.50% / 11.91%)
Long Range (>20Kb): 30,464,761  (5.23% / 13.84%)

595 domains called

and I get 
***! Error! The sorted file and dups/no dups files do not add up, or were empty. Merge or dedupping likely failed, restart pipeline with -S merge or -S dedup
Dups don't add up.  Check /ifs/data/lehmannlab/couxr01/2/aligned for results

I understand from this post that this can be due to BWA issues, however in Helen's case, she only got small variations in #s of read pairs, I get almost the double and +-15M contacts, have you ever seen something like this and if so do you have any idea on what could explain it?

Thanks
--

RX Coux

Muhammad Saad Shamim

unread,
Oct 4, 2018, 6:54:07 PM10/4/18
to rx....@gmail.com, 3d-ge...@googlegroups.com
Can you check the debug folder for errors in the alignment jobs?
Most likely several jobs with the larger splitsize failed to finish aligning in the time limit, hence more reads with the smaller split size.

Remi Coux

unread,
Oct 5, 2018, 4:40:55 AM10/5/18
to Muhammad Saad Shamim, 3d-ge...@googlegroups.com
Hi, the only errors are in finalcheck

I reran with -C 337500000 and got read and contact # in between 45000000 and 22500000:

Experiment description: Juicer version 1.5.6; BWA 0.7.7-r441; 1 threads; splitsize 33750000; openjdk version "1.8.0_144"; Juicer Tools Version 1.7.6; /ifs/data/lehmannlab/couxr01/juicer/scripts/juicer.sh -g dm6 -s DpnII -z /ifs/data/lehmannlab/couxr01/juicer/references/dm6.fa -p /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6.chrom.sizes -y /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6_DpnII.txt -C 33750000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
Sequenced Read Pairs:  411,763,433
 Normal Paired: 320,653,357 (77.87%)
 Chimeric Paired: 31,593 (0.01%)
 Chimeric Ambiguous: 17,495,842 (4.25%)
 Unmapped: 73,582,610 (17.87%)
 Ligation Motif Present: 270,587,500 (65.71%)
Alignable (Normal+Chimeric Paired): 320,684,950 (77.88%)
Unique Reads: 169,688,696 (41.21%)
PCR Duplicates: 150,207,488 (36.48%)
Optical Duplicates: 788,766 (0.19%)
Library Complexity Estimate: 222,558,484
Intra-fragment Reads: 7,798,990 (1.89% / 4.60%)
Below MAPQ Threshold: 96,131,962 (23.35% / 56.65%)
Hi-C Contacts: 65,757,744 (15.97% / 38.75%)
 Ligation Motif Present: 27,333,504  (6.64% / 16.11%)
 3' Bias (Long Range): 82% - 18%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 16,148,224  (3.92% / 9.52%)
Intra-chromosomal: 49,609,520  (12.05% / 29.24%)
Short Range (<20Kb): 22,847,795  (5.55% / 13.46%)
Long Range (>20Kb): 26,761,504  (6.50% / 15.77%)

578 domains called

debug/finalcheck-a1538689117.out:***! Error! The statistics do not add up. Alignment likely failed to complete on one or more files. Run relaunch_prep.sh
--

RX Coux

Muhammad Saad Shamim

unread,
Oct 5, 2018, 8:21:58 AM10/5/18
to Remi Coux, 3d-ge...@googlegroups.com
Did you try running relaunch_prep.sh as the finalcheck file says on the directory when stats didn't add up? It'll figure out which alignments didn't run if that's the case
Reply all
Reply to author
Forward
0 new messages