Hi Juicer gang, first thanks a lot for developing and maintaining Juicer.
I'm trying to run Juicer on an SGE cluster thus tweaked the script a little bit (see attached), however I'm facing the following problem that ressembles the previously reported stats bug for low complexity data but I believe it starts during generation of the hic file:
The run doesn't complete and two jobs remain queued and are not listed in the jobs.out, I get a very small inter_30.hic file (16M), the dups, merged_nodups and merged_sorted are respectively 10, 17 and 28G.
I get the following hic30***.err
Picked up _JAVA_OPTIONS: -Xmx32g
Error while reading graphs file: java.io.FileNotFoundException: /ifs/data/lehmannlab/couxr01/aligned/inter_30_hists.m (No such file or directory)
java.lang.NumberFormatException: For input string: "chrUn_DS484226v1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:203)
at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:247)
at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:498)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:376)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:286)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:105)
at juicebox.tools.HiCTools.main(HiCTools.java:98)
here's the the stats30**.err
id: cannot find name for user ID 1877
id: cannot find name for group ID 1877
id: cannot find name for user ID 1877
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/modulecmd: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
/cm/local/apps/sge/var/spool/node026/job_scripts/5671321: line 1: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/cm/local/apps/sge/var/spool/node026/job_scripts/5671321: line 1: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
My data shouldn't be that low complexity: my fastq files contain ~500M reads per sample, 95% of which >=MAPQ30 (per illumina flowcell stats and HiCUP report). However, the inter.txt file only gives ~143M read pairs, but none of the align reports show errors. None of the other report files contain errors, the chimeric and count_ligations are all empty (I don't know if they should be).
Experiment description: Juicer version 1.5.6; BWA 0.7.7-r441; 1 threads; splitsize 45000000; openjdk version "1.8.0_144"; Juicer Tools Version 1.7.6; /ifs/data/lehmannlab/couxr01/juicer/scripts/juicer.sh -g dm6 -s DpnII -z /ifs/data/lehmannlab/couxr01/juicer/reference/dm6.fa -p /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6.chrom.sizes -y /ifs/data/lehmannlab/couxr01/juicer/restriction_sites/dm6_DpnII.txt -D /ifs/data/lehmannlab/couxr01/juicer
Sequenced Read Pairs: 143,139,799
Normal Paired: 111,560,517 (77.94%)
Chimeric Paired: 36,111 (0.03%)
Chimeric Ambiguous: 12,364,736 (8.64%)
Unmapped: 19,178,408 (13.40%)
Ligation Motif Present: 147,987,325 (103.39%)
Alignable (Normal+Chimeric Paired): 111,596,628 (77.96%)
Unique Reads: 68,465,944 (47.83%)
PCR Duplicates: 41,651,617 (29.10%)
Optical Duplicates: 282,534 (0.20%)
Library Complexity Estimate: 105,911,879
Intra-fragment Reads: 18,678,354 (13.05% / 27.28%)
Below MAPQ Threshold: 26,037,549 (18.19% / 38.03%)
Hi-C Contacts: 23,750,041 (16.59% / 34.69%)
Ligation Motif Present: 11,216,925 (7.84% / 16.38%)
3' Bias (Long Range): 76% - 24%
Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 2,812,642 (1.96% / 4.11%)
Intra-chromosomal: 20,937,399 (14.63% / 30.58%)
Short Range (<20Kb): 16,860,143 (11.78% / 24.63%)
Long Range (>20Kb): 4,077,037 (2.85% / 5.95%)
Could you please let me know if this is the stats bug? Or if you have any idea on what could be causing this - I don't really know where to look for more detailed reports.
Thanks in advance
Remi