Juicer 2

371 views
Skip to first unread message

Edward Gilding

unread,
Oct 28, 2022, 1:28:27 AM10/28/22
to 3D Genomics
Hello,

I am having issues with the DNAzoo chromosome level assembly pipeline. the latest appears to be some issue near the end of the juicer part where there's compression decompression going on when creating the .hic files. I did change the juicer.sh to point to juicer_tools.2.18.00.jar

The output to STDERR balloons and continues into the hundreds of GB in size. It repeats ad nauseum with:

java.util.zip.DataFormatException: incorrect header check
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at org.broad.igv.util.CompressionUtils.decompress(CompressionUtils.java:87)
        at org.broad.igv.util.CompressionUtils.decompress(CompressionUtils.java:56)
        at juicebox.data.DatasetReaderV2.decompress(DatasetReaderV2.java:1032)
        at juicebox.data.DatasetReaderV2.readBlock(DatasetReaderV2.java:980)
        at juicebox.data.DatasetReaderV2.readNormalizedBlock(DatasetReaderV2.java:922)
        at juicebox.data.iterator.ContactRecordIterator.hasNext(ContactRecordIterator.java:85)
        at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.getWGVectors(GenomeWideNormalizationVectorUpdater.java:182)
        at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.updateHicFileForGWfromPreAddNormOnly(GenomeWideNormalizationVectorUpdater.java:132)
        at juicebox.tools.utils.norm.NormalizationVectorUpdater.updateHicFile(NormalizationVectorUpdater.java:159)
        at juicebox.tools.clt.old.AddNorm.launch(AddNorm.java:83)
        at juicebox.tools.clt.old.AddNorm.run(AddNorm.java:137)
        at juicebox.tools.HiCTools.main(HiCTools.java:97)
java.util.zip.DataFormatException: incorrect header check
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at org.broad.igv.util.CompressionUtils.decompress(CompressionUtils.java:87)
        at org.broad.igv.util.CompressionUtils.decompress(CompressionUtils.java:56)
        at juicebox.data.DatasetReaderV2.decompress(DatasetReaderV2.java:1032)
        at juicebox.data.DatasetReaderV2.readBlock(DatasetReaderV2.java:980)
        at juicebox.data.DatasetReaderV2.readNormalizedBlock(DatasetReaderV2.java:922)
        at juicebox.data.iterator.ContactRecordIterator.hasNext(ContactRecordIterator.java:85)
        at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.getWGVectors(GenomeWideNormalizationVectorUpdater.java:182)
        at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.updateHicFileForGWfromPreAddNormOnly(GenomeWideNormalizationVectorUpdater.java:132)
        at juicebox.tools.utils.norm.NormalizationVectorUpdater.updateHicFile(NormalizationVectorUpdater.java:159)
        at juicebox.tools.clt.old.AddNorm.launch(AddNorm.java:83)
        ^Zat juicebox.tools.clt.old.AddNorm.run(AddNorm.java:137)
        at juicebox.tools.HiCTools.main(HiCTools.java:97)

.......and so on.

merged_dedups.bam has been made and it appears to be on the .hic creation step. 

Any idea what is going on?

One more thing, is there a way to flag silent running of juicer?

Best wishes
Eddie :)

Olga Dudchenko

unread,
Nov 2, 2022, 11:44:12 AM11/2/22
to 3D Genomics
Hi,

If you want to run Juicer2 for assembly purposes, please add the --assembly flag to the juicer command. This will 1) generate the merged_nodups.txt file and 2) not try to run the parts of the Juicer pipeline that can only be done for the assembled genome (like buildin sandboxed hic map, annotating loops and domains).

Best,
Olga

Edward Gilding

unread,
Nov 5, 2022, 11:19:22 PM11/5/22
to 3D Genomics
Hello Olga,

Thank you for the helpful information, I must have missed that detail between git and DNAzoo. It made my day to get it working :)

Thanks again, especially for taking the time for the community
Eddie

Olga Dudchenko

unread,
Nov 7, 2022, 4:59:33 PM11/7/22
to 3D Genomics

Rafael Domínguez

unread,
May 29, 2024, 8:37:50 PM5/29/24
to 3D Genomics
I also experienced this, with a very conventional genome (Drosophila dm6), even trimmed to only have the main chromosomes 2L, 2R, 3L, 3R, X. The error messages are identical. 

Rafael Domínguez

unread,
May 29, 2024, 10:18:35 PM5/29/24
to 3D Genomics
I narrowed it down to statistics not working and not generating the inter_hists.m file. The .hic file is generated even though there is an issue with -g not working. But when addNorm is used it explodes. The error I get when running statistics is the following: 

Exception in thread "main" java.lang.NumberFormatException: For input string: "146.081.088" 

Rafael Domínguez

unread,
May 31, 2024, 5:18:34 AM5/31/24
to 3D Genomics
Update: 

I fixed for me the generation of the inter_hists.m file. It was the locale settings (the new cluster I am using is set in spanish and therefore the . instead of , in the inter.txt file).  I basically modified the stats_sub.awk script and substituted all instances of %'d for %d. 

Then statistics worked fine, but still getting the same error about the header. I find out it is fixed when using just 1 core for the .hic creation step. So it might be a threading issue or something in the index_by_chr.awk. I don't think I'm able or have the time to dig any further, so I will stick to single threaded hic creation for now. If you know something else or take a look please tell me.  

Reply all
Reply to author
Forward
0 new messages