speeding up processing for a large input file

88 views
Skip to first unread message

Zihao Zhu

unread,
Apr 30, 2025, 7:12:34 PMApr 30
to 3D Genomics
Hi, 
I am working with a very large input file generated from HiC-Pro (.allValidPairs file, ~260 GB). 
I used standard hicpro2juicebox.sh script to generate .hic file using juicer_tools.2.20.00.jar. However, the process appears to be stuck at the 'Writing body' step and may take weeks to complete.

To avoid OOM, I used the following java options:
export _JAVA_OPTIONS="-XX:+UseG1GC -Xmx1800G -XX:ParallelGCThreads=20"

I am wondering if there is a way to speed up the process.
Would it be possible to use the juice_tool pre directly on the generated 77251_allValidPairs.pre_juicebox_sorted file? Does this support multithreading?

Here are some logs:

HiC-Pro format > 2.7.5 detected ...
Generating Juicebox input files ...
Running Juicebox ...
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2025-04-30T01:14:38,420]  [Globals.java:138] [main]  Development mode is enabled
Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Start preprocess
Writing header
Writing body
.

Thanks,
Zihao


Moshe Olshansky

unread,
Apr 30, 2025, 7:20:00 PMApr 30
to 3D Genomics
Hi Zihao,

If you are not interested in interchromosomal interactions you may split you file into a set of separate files (one for each chromosome) and then run pre commands in parallel (provided that you have enough processors and RAM, otherwise limit the number of parallel processes by parallel -j). You will get individual hic files, but you can work with them and visualise them in juicebox.

Zihao Zhu

unread,
May 7, 2025, 7:16:28 PMMay 7
to 3D Genomics
Hi, thank you for this suggestion!

I used '77251_allValidPairs.pre_juicebox_sorted' as input for Pre and indicated '-d' and '-c chr1H' to only calculate map on specific chromosome.
However, I encountered these errors. I’m fairly certain the chromosome names are consistent across all files, but I’m unsure what might have gone wrong.

java.lang.NullPointerException: Cannot invoke "java.util.Iterator.hasNext()" because "this.currentIterator" is null
at juicebox.data.iterator.ListOfListIterator.hasNext(ListOfListIterator.java:44)
at juicebox.data.iterator.IteratorContainer.getNumberOfContactRecords(IteratorContainer.java:54)
at juicebox.data.iterator.ListOfListIteratorContainer.getIsThereEnoughMemoryForNormCalculation(ListOfListIteratorContainer.java:56)
at juicebox.tools.utils.norm.NormalizationCalculations.<init>(NormalizationCalculations.java:59)
at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.getWGVectors(GenomeWideNormalizationVectorUpdater.java:167)
at juicebox.tools.utils.norm.GenomeWideNormalizationVectorUpdater.updateHicFileForGWfromPreAddNormOnly(GenomeWideNormalizationVectorUpdater.java:132)
at juicebox.tools.utils.norm.NormalizationVectorUpdater.updateHicFile(NormalizationVectorUpdater.java:159)
at juicebox.tools.clt.old.AddNorm.launch(AddNorm.java:83)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:185)
at juicebox.tools.HiCTools.main(HiCTools.java:97)


these are the logs:

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2025-05-03T08:58:59,819]  [Globals.java:138] [main]  Development mode is enabled

Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Start preprocess
Writing header
Writing body
..
Writing footer
nBytesV5: 19333354
masterIndexPosition: 2624099745

Finished preprocess

Binning contact matrices took: 351285902 milliseconds
No normalization vectors

Calculating norms for zoom BP_2500000Now Doing INTER_SCALE


Thanks,
Zihao

Moshe Olshansky

unread,
May 7, 2025, 7:19:09 PMMay 7
to 3D Genomics
Hi Zihao,

please run the below:

grep -c chr1H 77251_allValidPairs.pre_juicebox_sorted

Also, could you please send your full command?

Thank you,
Moshe.

On Thursday, May 1, 2025 at 9:12:34 AM UTC+10 citro...@gmail.com wrote:

Zihao Zhu

unread,
May 8, 2025, 7:03:56 AMMay 8
to 3D Genomics
Hi Moshe,

grep -c chr1H 77251_allValidPairs.pre_juicebox_sorted
394923208

full command:
module load jdk/21.0.6
java -Xmx400g -jar /filer-5/agruppen/GGR/zhuz/hic/juicer_tools/juicer_tools.2.20.00.jar pre \ 
-t tmp -d -v \ 
-c chr1H \ 
-f 77251_resfrag.juicebox \ 
77251_allValidPairs.pre_juicebox_sorted \ 
Morex_chr1H.hic \ 
/filer-5/agruppen/GGR/zhuz/hic/Morex/chrom_sizes.txt


77251_* files were generated from hicpro2juicebox.sh

this is the content of chrom_size.txt:
chr1H 516505932
chr2H 665585731
chr3H 621516506
chr4H 610333535
chr5H 588218686
chr6H 561794515
chr7H 632540561
chrUn 29110253

Thanks,
Zihao

Moshe Olshansky

unread,
May 9, 2025, 1:08:36 AMMay 9
to 3d-ge...@googlegroups.com
Hi Zihao,

Could you please send me a few top lines of 77251_resfrag.juicebox (do: head -5 77251_resfrag.juicebox) and of 77251_allValidPairs.pre_juicebox_sorted?

P.S. Are you sure that the hic file was not created?

--
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/N00T99Gano0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/3d-genomics/5096333e-5fe1-49b5-8125-b8b28bc60825n%40googlegroups.com.

Zihao Zhu

unread,
May 9, 2025, 6:02:33 AMMay 9
to 3D Genomics
Hi Moshe,

There is a hic file created for chr1H and I am able to visualize that file in Juicebox. 
However, I am not sure if it was correctly normalized, as there were errors during the normalization step.

For 77251_resfrag.juicebox, I only display the first 10 columns
head -5 77251_resfrag.juicebox | cut -f1-10

chr1H 12 247 605 945 1298 1322 1415 1533 1640

chr2H 198 290 433 668 985 1220 1420 1655 1891

chr3H 110 226 342 460 577 695 788 813 976

chr4H 10 128 482 576 601 818 936 1172 1287

chr5H 228 357 464 582 700 819 936 1054 1289


head -5 77251_allValidPairs.pre_juicebox_sorted

A00550:446:HLWTTDRX5:1:2101:10004:11428 0 chr1H 503583755 2246360 0 chr1H 508325805 2267835 35 42

A00550:446:HLWTTDRX5:1:2101:10004:13182 0 chr1H 155391316 679889 1 chr1H 155415478 679986 42 34

A00550:446:HLWTTDRX5:1:2101:10004:15186 1 chr1H 308748822 1334871 0 chr1H 340726907 1480726 42 35

A00550:446:HLWTTDRX5:1:2101:10004:23265 1 chr1H 468987982 2086644 1 chr1H 481295330 2145592 31 42

A00550:446:HLWTTDRX5:1:2101:10004:27179 1 chr1H 427273815 1886948 1 chr1H 429124025 1895261 42 42



Thanks,

Zihao

Moshe Olshansky

unread,
May 17, 2025, 3:02:30 AMMay 17
to 3d-ge...@googlegroups.com
Hi Zihaom

Sorry for the delay.

Your files look right.

You can run 
java -jar your_juicer_tools.jar validate your_hic_file to see available normalisations.

Best regards,
Moshe.

Reply all
Reply to author
Forward
0 new messages