Sorted file and dups/no dups files do not add up

139 views
Skip to first unread message

Josh Grey

unread,
Oct 4, 2021, 12:12:14 PM10/4/21
to 3D Genomics
Hi all,

Thank you for this very helpful forum. I've looked through the posts and other resources, but can't seem to figure out why I repeatedly return this error.

I am running juicer for a HiC experiment and run into the following error code in the final output file:

***! Error! The sorted file and dups/no dups files do not add up, or were empty. Merge or dedupping likely failed, restart pipeline with -S merge or -S dedup
Dups don't add up.  Check /juicedir/scripts/aligned for results

Below is the resulting inter_30.txt stats:

Sequenced Read Pairs:  1,112,743,359
 Normal Paired: 451,669,235 (40.59%)
 Chimeric Paired: 536,976,217 (48.26%)
 Chimeric Ambiguous: 114,213,210 (10.26%)
 Unmapped: 9,884,697 (0.89%)
 Ligation Motif Present: 1,038,936,627 (93.37%)
Alignable (Normal+Chimeric Paired): 988,645,452 (88.85%)
Unique Reads: 674,967,732 (60.66%)
PCR Duplicates: 309,379,282 (27.80%)
Optical Duplicates: 3,301,021 (0.30%)
Library Complexity Estimate: 1,216,966,025
Intra-fragment Reads: 4,830,913 (0.43% / 0.72%)
Below MAPQ Threshold: 186,952,810 (16.80% / 27.70%)
Hi-C Contacts: 483,184,009 (43.42% / 71.59%)
 Ligation Motif Present: 447,610,743  (40.23% / 66.32%)
 3' Bias (Long Range): 88% - 12%
 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 171,921,541  (15.45% / 25.47%)
Intra-chromosomal: 311,262,468  (27.97% / 46.12%)
Short Range (<20Kb): 131,757,217  (11.84% / 19.52%)
Long Range (>20Kb): 179,505,057  (16.13% / 26.59%)

I have tried restarting at the merge and dedup step, but still run into this same issue. This does not happen when running the test data, nor when I run similar samples with fewer reads. I'm confused as to whether this is a problem for the results as the pipeline still runs to completion and the .hic files seem to be mostly okay. Is there something I am missing here?

Thanks,
Josh

Neva Durand

unread,
Oct 4, 2021, 12:31:13 PM10/4/21
to Josh Grey, 3D Genomics
Hello, 

If you sum the statistics in your inter_30.txt file, you see Unique + Duplicates = 987,648,035; but your alignable is 988,645,452, so you're missing ~1M reads. You're free to use what you've got but you are missing some reads from the hic file.

I would have a look in the debug folder and see if there are errors during merge. You can also count the lines yourself (wc -l merged_sort.txt) to see if it adds up to Alignable (which it should).

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/1c91d4e6-4679-4919-b586-c2968910e8d1n%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Senior Scientist |  Gene Regulation Observatory
Broad Institute of MIT and Harvard
Reply all
Reply to author
Forward
0 new messages