Majority of p-values and q-values are either 1 or nan

131 views
Skip to first unread message

Mark Mackiewicz

unread,
Sep 28, 2020, 2:32:47 AM9/28/20
to Fit-Hi-C
Greetings,
I have run FitHiC v2.0.7 on human whole-genome Hi-C data that we have generated in-house.  Raw matrix files were generated through HiCPro, and the utility scripts HiCPro2FitHiC.py and createFitHiCFragments-fixedsize.py were used to generate the corresponding input files.  Hickry was used to generate bias files with -x 0.1 (10% of all bias values were -1 or <0.5).  

The following settings were employed:
-r 10000  -p 2  -U 10000000 -L 20000 -x intraOnly -tL 0.4 -tU 3

When examining the significances.txt.gz file, the p-value and q-values are either nan or 1.000000e+00 for many different chromosomes and regions (if not the entire file) where mappability is not an issue.

Any suggestions on where I can begin troubleshooting or what the problem might be would be greatly appreciated.

Mark
 

Kaul, Arya

unread,
Oct 1, 2020, 12:52:20 PM10/1/20
to fit...@googlegroups.com
Hey Mark,

This seems like an issue with the -x choice in HiCKRy. Do you find a large number of '-1' values in the bias file? If so, I would recommend rerunning HiCKRy with different values of -x (0.05, 0.15). The Fit-Hi-C readme has more info on this.

Best,
Arya

--
You received this message because you are subscribed to the Google Groups "Fit-Hi-C" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fithic+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fithic/c8ffffbb-54ba-47bd-8fa1-656a9af87618n%40googlegroups.com.


--
Harvard Medical School PhD Candidate Bioinformatics and Genomics
UC San Diego B.S. Bioinformatics 2019

Mark Mackiewicz

unread,
Oct 2, 2020, 2:49:31 PM10/2/20
to fit...@googlegroups.com
Greetings Arya,
Thank you for your response.  I can try to increase the -x flag to 0.15 and see what happens.  At 0.1, 10% of my bias values were -1 and <0.5 combined.  If one uses the bias file generated by ice normalization through the HiC-Pro processing pipeline, and no significant contacts are detected with the intraOnly analysis, do you suggest that the ice bias file be discarded and a hickry-generated bias file be used instead, after determining a suitable -x value (e.g., 0.1 or 0.15) that minimizes the number of -1 values?  

Thank you, and I look forward to your comments/suggestions.
Mark



--
Mark Mackiewicz, Ph.D.
Senior Scientist, Myers Lab
HudsonAlpha Institute for Biotechnology
601 Genome Way, Huntsville, AL 35806
voice:  256-327-0440

Kaul, Arya

unread,
Oct 2, 2020, 7:11:56 PM10/2/20
to fit...@googlegroups.com
Hey Mark,

You can certainly try the ICE normalization bias file if you already have it, we provide the Knight-Ruiz algorithm implementation because for certain datasets performing ICE is prohibitively computationally expensive.

Just a point of clarification, you're not trying to minimize the number of -1 values in the Fit-Hi-C output. When HiCKRy is run, it should output a warning if it detects that the median bias value falls outside of the range 0.5-2. If so, this is an indication that KR failed to converge because of the matrix lacking total support. Fit-Hi-C then reads the bias values and any value that falls outside of the range specified (default is 0.5-2, in your case 0.4-3) is thrown out. This results in a -1 value. To make sure HiCKRy is converging properly, you can examine the histogram of bias values observed. If it looks like something approximating the normal distribution with a mean ~1 then the problem could be input data instead.

Hope this helps!

Best,
Arya


Mark Mackiewicz

unread,
Oct 5, 2020, 2:26:10 PM10/5/20
to fit...@googlegroups.com
Thank you for this information Arya.  One final question as a sanity check:  I used the HiCPro2FitHiC utility script to generate the interaction and fragments files from my HiCPro .matrix files without using a bias file as an input parameter.  After getting those files, I used the Hickry utility script to generate a bias file from those interaction and fragment files.  At this point, should one then proceed straight to the FitHiC analysis with these three files (interaction, fragment and bias), or should new interaction and fragment files be generated (now that I have a bias file) before performing the FitHiC analysis?

Mark

Kaul, Arya

unread,
Oct 5, 2020, 5:35:00 PM10/5/20
to fit...@googlegroups.com
Hey Mark,

Yep, that's correct! You can either generate all 3 files (bias, interaction, fragment) using HiCPro2FitHiC and proceed to analysis or generate interaction and fragment with the script, compute bias from HiCKRy, and then go to analysis.

Best,
Arya


Reply all
Reply to author
Forward
0 new messages