Problem using HiCKRy.py to generate bias files

220 views
Skip to first unread message

Colin Kern

unread,
Feb 25, 2020, 3:53:14 PM2/25/20
to Fit-Hi-C
Hello,

I'm trying to use HiCKRy.py to create the bias files. I'm working from a hic file generated by the Juicer pipeline.

First, I used the createFitHiCFragments-fixedsize.py script to generate a fragments file creates a file like this:

chr1    0       5000    1       1
chr1    
10000   15000   1       1
chr1    
20000   25000   1       1

I had to modify it to make the third column the fragment start instead of midpoint since that's what the Juicer dump tools outputs. Then I used the createFitHiCContacts-hic.sh script to make the contact files, which now look like this:
chr1    0       chr1    0       429.0
chr1    
0       chr1    10000   1.0
chr1    
10000   chr1    10000   545.0
chr1    
0       chr1    20000   5.0
chr1    
10000   chr1    20000   544.0
chr1    
20000   chr1    20000   4945.0

Then I ran HiCKRy.py with these two files, but I got this:
Creating sparse matrix...
Sparse matrix creation took 5.946332931518555 seconds
Removing 0.05 percent of most sparse bins
... corresponds to 5159 total rows
... corresponds to all bins with less than or equal to 0.0 total interactions
Sparse rows removed
Initial matrix size: 103191 rows and 103191 columns
New matrix size: 2026 rows and 2026 columns
Normalizing with KR Algorithm
WARNING
... Bias vector has a mean outside of typical range (0.5, 2).
Consider running with a larger -x option if problems occur
Mean    -0.9607330096616953
Median  -1.0
Std. Dev.       3.164438318044982
Almost all the values in the output file are -1.

What is the issue here? Do I need to pre-filter my contact files to remove the diagonal and near-diagonal contacts? Should the 4th column of the fragment file not be all 1s?

Alternatively, can I get these biases from the .hic file using Juicer dump? I'm not sure what exactly I should dump with the command, though. "dump observed KR"? "dump norm KR"? Something else?

Arya Kaul

unread,
Feb 26, 2020, 11:38:47 AM2/26/20
to Fit-Hi-C
Hey Colin!

So the error you are getting typically has to do with an excessively sparse contact matrix. Looking at your output it seems like only ~2,000 bins of the initial ~103,000 bins have non-zero total contact counts. This is abnormally sparse, so I would check that the .hic files generated by Juicer look correct. I'd also recommend bumping up the -x option (default is 0.05, try 0.1). If you're already generating the KR normalization through juicer then you could try dumping their bias vector. You would want the `dump norm KR` as the other command gives you the normalized matrix.

Best,
Arya

Hammad Farooq

unread,
Apr 9, 2025, 6:23:56 PM4/9/25
to Fit-Hi-C

Hello,

I’m encountering a similar issue. For most chromosomes, I’m not using a bias file since I rely on the KR normalization output from Juicer. However, Juicer does not provide KR normalization for certain chromosomes. For those cases, I’m attempting to generate a bias file using HiCKRy.py, following the guidelines provided above.

Even when I increase the -x parameter to 0.1, I still receive the same warning:

python fithic/utils/HiCKRy.py -i chr3_interactions_file.txt.gz -f /hg38_fragments_file_for_FitHiC.gz -o chr3_HiCKRy.out.gz -x 0.1
Creating sparse matrix...
Sparse matrix creation took 70.43 seconds
Removing 0.1 percent of most sparse bins
... corresponds to 61766 total rows


... corresponds to all bins with less than or equal to 0.0 total interactions
Sparse rows removed

Initial matrix size: 617,669 rows and 617,669 columns
New matrix size: 39,258 rows and 39,258 columns


Normalizing with KR Algorithm
WARNING... Bias vector has a mean outside of typical range (0.5, 2).
Consider running with a larger -x option if problems occur

Mean    -0.8729
Median  -1.0
Std. Dev.   22.3384

As a result, the p-values and q-values in the FitHiC output are all 1.

Do you have any suggestions for resolving this issue?
Would it be reasonable to use VC normalization from Juicer for these specific chromosomes instead?

Thank you for your help.

Thanks,,

Hammad

Ferhat Ay

unread,
Apr 10, 2025, 3:23:06 AM4/10/25
to fit...@googlegroups.com
Hi,
HICKRy.py also has some issues for some chromosomes. Using VC normalization from Juicer is perfectly fine. Another option is to use ICE normalization from HiCPro. 
Not much can be done when KR algorithm doesn't converge. 

--
You received this message because you are subscribed to the Google Groups "Fit-Hi-C" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fithic+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/fithic/2f405676-e06d-431c-adf6-9c36b36f4c68n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages