All GM12878 p-values and q-values for intra-chromosomal interactions have nan value

98 views
Skip to first unread message

Savio Chow

unread,
Feb 28, 2023, 3:28:08 AM2/28/23
to Fit-Hi-C
Hi,

I downloaded the GM12878 .hic file from 4DN (4DNFI1UEG1HD.hic).
I used juicer dump for each chromosome pair:
java -jar juicer_tools.jar dump observed NONE 4DNFI1UEG1HD.hic {1} {2} BP 50000 chr{1}_{2}.txt

then ran createFitHiCContacts-hic.sh to generate contacts:
createFitHiCContacts-hic.sh chr{1}_{2}.txt chr{1} chr{2} chr{1}_{2}.FitHiCContacts.gz

shifted the positions by 25000, and aggregated all the pairs:
zcat *.FitHiCContacts.gz | awk '{print $1,$2+25000,$3,$4+25000,$5}' | gzip -c > chrall.FitHiCContacts.gz

created a fragments file:
python3 createFitHiCFragments-fixedsize.py --chrLens hg38.chrom.sizes --resolution 50000 --outFile hg38.fixed.frags.gz

and ran FitHiC:
fithic -f hg38.fixed.frags.gz -i chrall.FitHiCContacts.gz -o GM12878 -r 50000 -x All -v -p 2 -l GM12878

Even without any bias file, I am getting all nan for p-values and q-values (entire column) in intra-chromosomal interactions, I have just printed a few to illustrate:
zcat GM12878.spline_pass1.res50000.significances.txt.gz | head
chr1 fragmentMid1 chr2 fragmentMid2 contactCount p-value q-value bias1 bias2 ExpCC
chr10 25000 chr10 25000 266 nan nan 1.000000e+00 1.000000e+00 10791.241750
chr10 25000 chr10 75000 503 nan nan 1.000000e+00 1.000000e+00 4699.268221
chr10 75000 chr10 75000 11920 nan nan 1.000000e+00 1.000000e+00 10791.241750
chr10 25000 chr10 125000 166 nan nan 1.000000e+00 1.000000e+00 2326.428827
chr10 75000 chr10 125000 4794 nan nan 1.000000e+00 1.000000e+00 4699.268221
chr10 125000 chr10 125000 11579 nan nan 1.000000e+00 1.000000e+00 10791.241750
chr10 25000 chr10 175000 96 nan nan 1.000000e+00 1.000000e+00 1684.900987
chr10 75000 chr10 175000 1799 nan nan 1.000000e+00 1.000000e+00 2326.428827
chr10 125000 chr10 175000 4892 nan nan 1.000000e+00 1.000000e+00 4699.268221

while inter-chromosomal interactions are not affected:
zcat GM12878.spline_pass1.res50000.significances.txt.gz | awk '$1!=$3{print}' | head
chr1 fragmentMid1 chr2 fragmentMid2 contactCount p-value q-value bias1 bias2 ExpCC
chr10 75000 chr11 125000 4 7.654125e-02 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 125000 chr11 125000 5 2.277809e-02 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 225000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 275000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 475000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 525000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 625000 chr11 125000 2 4.696878e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 725000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393
chr10 775000 chr11 125000 1 7.947226e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.583393

I got the same results even if I set -x intraOnly. I also observed the same nan problem in an H1 dataset (4DNFIIMZB6Y9.hic), while another HepG2 (4DNFICSTCJQZ.hic), H1 (4DNFIQYQWPF5.hic) and HeLa-S3 (4DNFICEGAHRC.hic) dataset report p-values and q-values normally (no nan found in the entire column, both intra- and inter- interactions).

Do you have any thoughts on a possible problem in the dataset / the way I ran fithic? Thank you very much!

Best,
Savio
Reply all
Reply to author
Forward
0 new messages