Hello, Thanks for your support and tool.
I figured out some differences between Juicer tool version 1 and 2, especially during use 'pre'.
I'm working with hg19 genome, and used DdeI and DpnII enzyme.
I used HiC-Pro 3.0 for mapping and generating allValidPairs file, so I tried several times to convert it into .hic format.
When I run 'hicpro2juicebox.sh' from HiC-Pro, the bash code generates "resfrag.juicebox" and "allValidPairs.pre_juicebox_sorted" file in 'tmp' directory.
however, tab delimited.
When I compared it with restriction site BED file for HiC-Pro, the all value got +1.
cut -f 1,2,3,4,5,6,7,8,9 /SAMPLE/tmp/resfrag.juicebox | head
chr1 10479 10508 10600 11140 11160 11268 11501 11931
chr2 10590 10713 10759 11514 11874 12160 12180 12227
chr3 60138 60184 60662 60696 60734 60788 61031 61331
chr4 10224 10334 10426 11729 12250 12275 12508 12670
chr5 12275 12333 13064 13188 13366 13424 13430 13559
"allValidPairs.pre_juicebox_sorted" format follows
short format.
ST-E00127:1189:HH3VFCCX2:7:1101:10003:24655 1 chr1 85348526 680274 0 chr1 99440210 779321 3 23
ST-E00127:1189:HH3VFCCX2:7:1101:10003:24901 0 chr1 225128578 1535272 0 chr1 240786795 1653830 37 23
ST-E00127:1189:HH3VFCCX2:7:1101:10003:37629 1 chr1 105381415 817078 0 chr1 107472201 829606 8 42
My chrom.sizes file looks like this. Every input has 'chr' in front of chromosome number.
chr1 249250621
chr2 243199373
chr3 198022430
chr4 191154276
chr5 180915260
chr6 171115067
chr7 159138663
chr8 146364022
chr9 141213431
chr10 135534747
chr11 135006516
chr12 133851895
chr13 115169878
chr14 107349540
chr15 102531392
chr16 90354753
chr17 81195210
chr18 78077248
chr19 59128983
chr20 63025520
chr21 48129895
chr22 51304566
chrX 155270560
chrY 59373566
chrM 16571
With those files, I run juicer tool pre in version juicer_tools_1.22.01.jar and juicer_tools.2.13.05.jar :
java -Xms512m -Xmx20480m -jar /TOOL/juicer_tools.'$version'.jar pre \
-f
/SAMPLE/tmp/resfrag.juicebox \
/SAMPLE/tmp/allValidPairs.pre_juicebox_sorted \
/OUTPUT/pre_sample.hic \
/Basic_data/hg19.chrom.sizes --threads 12
The commend works successfully with version 1, and I checked all types of normalization of fragment-delimited map including 1kb at Juicebox_2.13.05.
But I'm in trouble with version 2.
I used same input, but pre tool failed to calculate fragment-delimited map under 20kb.
Due to failure, result .hic file doesn't contain any normalization information.
Result and Error code :
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2021-11-19T15:35:12,108] [Globals.java:138] [main] Development mode is enabled
Using 12 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
No mndIndex provided
Using single threaded preprocessor
Start preprocess
Writing header
Writing body
......................................................................................................................................................................................................................................................................................................................................
Writing footer
nBytesV5: 13645422
masterIndexPosition: 1334130509
Finished preprocess
Calculating norms for zoom BP_2500000
Calculating norms for zoom BP_1000000
Calculating norms for zoom BP_500000
Calculating norms for zoom BP_250000
Calculating norms for zoom BP_100000
Calculating norms for zoom BP_50000
Calculating norms for zoom BP_25000
Calculating norms for zoom BP_10000
Calculating norms for zoom BP_5000
Calculating norms for zoom BP_1000
Calculating norms for zoom FRAG_500
Calculating norms for zoom FRAG_200
Calculating norms for zoom FRAG_100
Calculating norms for zoom FRAG_50
Calculating norms for zoom FRAG_20java.lang.NullPointerException
at juicebox.data.iterator.ListOfListIterator.hasNext(ListOfListIterator.java:44)
at juicebox.data.iterator.IteratorContainer.getNumberOfContactRecords(IteratorContainer.java:54)
at juicebox.data.iterator.ListOfListIteratorContainer.getIsThereEnoughMemoryForNormCalculation(ListOfListIteratorContainer.java:56)
at juicebox.tools.utils.norm.NormalizationCalculations.<init>(NormalizationCalculations.java:59)
at juicebox.tools.utils.norm.NormalizationVectorUpdater.updateHicFile(NormalizationVectorUpdater.java:184)
at juicebox.tools.clt.old.AddNorm.launch(AddNorm.java:83)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:174)
at juicebox.tools.HiCTools.main(HiCTools.java:94)
Restriction site file was tab-delimited, so it could be the reason.
So I generated Restriction site file with 'generate_site_positions.py'.
It's space-delimited, but there was difference between 'resfrag.juicebox'.
Only DpnII site has +1, and DdeI has same value with restriction site BED file for HiC-Pro.
It's weird, however, I changed pre -f input and re-run, but got same error.
chr1 10478 10507 10599 11139 11160 11267 11500 11930 12053 12344 12371 12411
chr2 10589 10712 10758 11514 11874 12160 12179 12226 12371 12588 12600 12636
chr3 60138 60183 60662 60695 60733 60788 61030 61330 61781 62134 62191 62460
chr4 10224 10333 10425 11728 12249 12274 12507 12669 12770 12779 12792 12840
chr5 12274 12332 13063 13187 13365 13423 13429 13558 14004 14125 14198 14207
chr6 60001 60026 60231 60295 60421 60641 60682 61228 61260 61350 61540 61933
chr7 10254 10569 10625 10853 10885 10891 11007 11159 11288 11354 11423 11521
chrX 60101 60256 60295 60437 60711 60789 61170 61227 61274 61533 61619 61741
chr8 10169 10285 10493 10815 10848 10892 11057 11133 11192 11540 11728 11894
chr9 10761 10807 10840 11273 11380 11613 12043 12114 12166 12457 12484 12524
Here is the question :
1. Is it OK to use tab-delimited Restriction site file? version 1 works well with this.
2. Do you know why the DpnII and DdeI site goes on different calculation(+1)?
BED file for HiC-Pro consider offset like C^TNAG(DdeI) or ^GATC(DpnII), but 'generate_site_positions.py' don't consider exact restriction position. It might affect to result, so I'd like to ask your opinion.
3. I searched previous discussion, and found some comments recommend to delete -f option, and ignore making fragment-delimited map. If I follow this, can I check normalized contact map below 10k at Juicebox? I need higher resolution map....
4. What is the major difference between Juicer tool version 1 and 2?
I'm using version 1, so have to upgrade set up latest Jucier pipeline, but I cannot find documents about it. I would appreciate your assistance in providing some information.
Sincerely,
Sang-ah Park