Empty .hic file using Juicer pre

88 views
Skip to first unread message

Samuel Kean

unread,
Feb 21, 2025, 6:38:45 PMFeb 21
to 3D Genomics
Hi,

I've been having errors with the final steps of juicer, which I initially thought were due to a lack of memory, but now I'm not so sure.

I've run the pipeline on a test dataset, and it successfully generates a .hic file. When I run on my real dataset - the .hic is empty. However, I know this data has been used to generate a .hic file (it's available online) via juicer.

The code i'm running (now separate of the juicer.sh pipeline, for testing) is this:

java -Xmx128G -jar scripts/juicer_tools.jar pre -j 48 --threads 48 -s aligned/inter.txt -g aligned/inter_hists.m -q 1 aligned/merged_nodups.txt aligned/inter_test.hic restriction_sites/GCF_043380555.1_NfurGRZ-RIMD1_genomic.chrom.sizes

(I originally had the references/genome in place of restriction_sites/chrom.sizes, as that's what's written in the documentation - but this seems to be correct?)

Error:

-----------------------------------------------------------------------------------------------------------------------

(base) x@x~> cat juicer_pre.sh.*

java.lang.NumberFormatException: For input string: "38M"

at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:241)

at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:287)

at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:571)

at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:658)

at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:425)

at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:139)

at juicebox.tools.HiCTools.main(HiCTools.java:94)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

WARN [2025-02-19T14:23:13,307]  [Globals.java:138] [main]  Development mode is enabled

Using 48 CPU thread(s)

Not including fragment map

Start preprocess

Writing header

Writing body

-----------------------------------------------------------------------------------------------------------------------

The message ends there and generates a very small .hic (17kb).

Is this something to do with the 38M tag? I don't know much about cigars, but this tag is every column, which seems a little strange.

Here's the inter.txt

-----------------------------------------------------------------------------------------------------------------------

Sequenced Read Pairs:  159,623,344

 Normal Paired: 120,132,388 (75.26%)

 Chimeric Paired: 0 (0.00%)

 Chimeric Ambiguous: 22 (0.00%)

 Unmapped: 39,490,934 (24.74%)

 Ligation Motif Present: 0 (0.00%)

Alignable (Normal+Chimeric Paired): 120,132,388 (75.26%)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

WARN [2025-02-19T13:41:43,155]  [Globals.java:138] [main]  Development mode is enabled

Unique Reads: 118,840,720 (74.45%)

PCR Duplicates: 1,291,667 (0.81%)

Optical Duplicates: 0 (0.00%)

Library Complexity Estimate: 5,546,381,767

Intra-fragment Reads: 0 (0.00% / 0.00%)

Below MAPQ Threshold: 88,884,552 (55.68% / 74.79%)

Hi-C Contacts: 29,956,168 (18.77% / 25.21%)

 Ligation Motif Present: 0  (0.00% / 0.00%)

 3' Bias (Long Range): 50% - 50%

 Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%

Inter-chromosomal: 7,135,166  (4.47% / 6.00%)

Intra-chromosomal: 22,821,002  (14.30% / 19.20%)

Short Range (<20Kb): 7,159,066  (4.48% / 6.02%)

Long Range (>20Kb): 15,661,045  (9.81% / 13.18%)

-----------------------------------------------------------------------------------------------------------------------

Here's the merged_nodups.txt from the real data

-----------------------------------------------------------------------------------------------------------------------

0 NC_011814.1 1 0 0 NC_011814.1 7007 1 54 38M GTTAACGTAGCTTAAATAAAGCATGACACTGAAGCTGT 60 38M TGTCCTCCTCCTTATCACACTTTTGAAGAACCTGCATC SRR22443557.137681932 SRR22443557.137681932

0 NC_011814.1 8 0 0 NC_011814.1 6970 1 60 38M TAGCTTAAATAAAGCATGACACTGAAGCTGTTAAGATA 60 38M AGAATTAACTCAGTCAAATGTTGAATGGCTTCATGGCT SRR22443558.92016041 SRR22443558.92016041

0 NC_011814.1 11 0 0 NC_011814.1 11172 1 60 38M CTTTAATAAAGCATGACACTGAAGCTGTTAAGATAAAC 60 38M GCACTATGAGGAGTAATTATAACAGGATTAATCAGTCT SRR22443557.90173957 SRR22443557.90173957

0 NC_011814.1 16 0 0 NC_011814.1 18221 1 60 38M ATAAAGCATGACACTGAAGCTGTTAAGATAAACCTTAG 0 38M TCAAGCACTCTGCAAGTCAGTACCGTTGCACAGTAAGA SRR22443557.338720936 SRR22443557.338720936

0 NC_011814.1 18 0 0 NC_011814.1 1379 1 60 38M AAAGCATGACACTGAAGCTGTTAAGATAAACCTTAGCC 39 38M AGCCTAAAAAAGGGCAAACACGTCTCTGTGGCAAAAGA SRR22443558.206479359 SRR22443558.206479359

0 NC_011814.1 18 0 0 NC_011814.1 9553 1 60 38M AAAGCATGACACTGAAGCTGTTAAGATAAACCTTAGCC 60 38M GTAGTGTGACTTTTCCTCTATGTTTCTATTTATTGATG SRR22443558.45025694 SRR22443558.45025694

0 NC_011814.1 21 0 0 NC_011814.1 5237 1 60 38M ACATGACACTGAAGCTGTTAAGATAAACCTTAGCCTGG 60 38M AAAACAGACACTTTTATTAAGCTAAAGCCTTCTAGACG SRR22443557.316961428 SRR22443557.316961428

0 NC_011814.1 22 0 0 NC_011814.1 16933 1 60 38M CATGACACTGAAGCTGTTAAGATAAACCTTAGCCTGGT 60 28M1D10M AAAAAACACCTCATCACCTACAACTACACCCCCCCATA SRR22443557.48829574 SRR22443557.48829574

0 NC_011814.1 22 0 0 NC_011814.1 9279 1 60 38M CATGACACTGAAGCTGTTAAGATAAACCTTAGCCTGGT 60 38M ACGAAAACAAGCAATTCAATCATTAACTCTGACTATTA SRR22443557.270762499 SRR22443557.270762499

0 NC_011814.1 23 0 0 NC_011814.1 17026 1 60 38M ATGACACTGAAGCTGTTAAGATAAACCTTAGCCTGGTT 60 32M6S CCTGCTACTCCAGTAAATTAACACATCAGATCGATCAA SRR22443558.362115815 SRR22443558.362115815

-----------------------------------------------------------------------------------------------------------------------
And merged_nodups.txt from the test data, that generates a working .hic
-----------------------------------------------------------------------------------------------------------------------

0 NC_011814.1 269 0 0 NC_011814.1 269 1 31 80M AAGCTTGACCTAGTTATAGTTATTAGGGCCGGTAAAACTCGTGCCAGCCACCGCGGTTATACGAGAGGCTCAAATTGATC 31 80M AAGCTTGACCTAGTTATAGTTATTAGGGCCGGTAAAACTCGTGCCAGCCACCGCGGTTATACGAGAGGCTCAAATTGATC MG01HX07:650:H5JNMCCX2:7:2224:19918:9027.BXCCGTTAAT MG01HX07:650:H5JNMCCX2:7:2224:19918:9027.BXCCGTTAAT

0 NC_011814.1 4042 0 0 NC_011814.1 4042 1 52 129M23S AAGCTTTTGGGCCCATACCCCGAACACGTTGGTTAAAGTCCTTCCTCTACTAATGAGCCCTTTGACCCTTCTACCTATCGCATTCACCTTAATTCTTGGAACCACCATTACACTCATAAGTACCCATTGCTCCGGTGTCAATTTAAAGAAGG 52 129M23S AAGCTTTTGGGCCCATACCCCGAACACGTTGGTTAAAGTCCTTCCTCTACTAATGAGCCCTTTGACCCTTCTACCTATCGCATTCACCTTAATTCTTGGAACCACCATTACACTCATAAGTACCCATTGCTCCGGTGTCAATTTAAAGAAGG MG01HX07:650:H5JNMCCX2:7:2223:32086:1872.BXACCATAGA MG01HX07:650:H5JNMCCX2:7:2223:32086:1872.BXACCATAGA

0 NC_011814.1 4624 0 0 NC_011814.1 5165 1 60 120H32M GATTAATAAGACAACCAATATTAACTCCTTAT 60 123M29S AAGCTTTATATAAGAGTGCAAATCTCTTAACCCTTAAGACCTACAGGATACTAACCCACATCTTCTGCATGCAAAACAGACACTTTTATTAAGCTAAAGCCTTCTAGACGAGTAGGCCTCGATTAATAAGACAACCAATATTAACTCCTTAT MG01HX07:650:H5JNMCCX2:7:2222:17929:58708.BXTTTACTAT MG01HX07:650:H5JNMCCX2:7:2222:17929:58708.BXTTTACTAT

0 NC_011814.1 4811 0 0 NC_011814.1 5165 1 60 120H32M GCTCTATCAGCCCTACTCAGCTTGTACTTCTA 60 121M31S AAGCTTTATATAAGAGTGCAAATCTCTTAACCCTTAAGACCTACAGGATACTAACCCACATCTTCTGCATGCAAAACAGACACTTTTATTAAGCTAAAGCCTTCTAGACGAGTAGGCCTCGCTCTATCAGCCCTACTCAGCTTGTACTTCTA MG01HX07:650:H5JNMCCX2:7:2224:10795:11400.BXAATAAATG MG01HX07:650:H5JNMCCX2:7:2224:10795:11400.BXAATAAATG

0 NC_011814.1 5165 0 0 NC_011814.1 5165 1 49 50M AAGCTTTATATAAGAGTGCAAATCTCTTAACCCTTAAGACCTACAGGATC 49 50M AAGCTTTATATAAGAGTGCAAATCTCTTAACCCTTAAGACCTACAGGATC MG01HX07:650:H5JNMCCX2:7:2222:17878:60940.BXCGTATCCA MG01HX07:650:H5JNMCCX2:7:2222:17878:60940.BXCGTATCCA

0 NC_011814.1 5815 0 0 NC_011814.1 5815 1 60 113M AAGCTTCTGACTTCTGCCCCCCTCCTTCCTTCTTCTATTAGCCTCCTCAGGGGTTGAAGCAGGCGCAGGGACAGGCTGGACAGTCTATCCCCCATTAGCAGGAAACATAGATC 60 113M AAGCTTCTGACTTCTGCCCCCCTCCTTCCTTCTTCTATTAGCCTCCTCAGGGGTTGAAGCAGGCGCAGGGACAGGCTGGACAGTCTATCCCCCATTAGCAGGAAACATAGATC MG01HX07:650:H5JNMCCX2:7:2223:6634:13721.BXTCGAGTAC MG01HX07:650:H5JNMCCX2:7:2223:6634:13721.BXTCGAGTAC

-----------------------------------------------------------------------------------------------------------------------

Thanks for the help!

Sam

Beisi Xu

unread,
Apr 3, 2025, 11:43:29 AMApr 3
to 3D Genomics
maybe  merged_nodups.txt corrupted

does each line have the same number of fields? 

cat  merged_nodups.txt | awk '{print NF}' | uniq

Reply all
Reply to author
Forward
0 new messages