Hi Dr. Ay,
I ran Fit-Hi-C for a set of Hi-C samples across different conditions. When applying same FDR cutoff (say < 1e-6), I noticed that some conditions have dramatically more loops compared to the other. It can be due to input Hi-C matrix read count, Or it can be biological.
I am wondering
1) how much read count difference will bias the final loop call number in Fit-Hi-C? Have you ever done any down-sampling trials in the same condition to evaluate this?
2) Is there any easier way to solve big difference among final loop call numbers without subsampling Hi-C data at .bam file level? I start with .cool file for Fit-Hi-C call.
Here are the read count vs. loop call number across 3 conditions I have done at 4Kb resolution.
condition read # loop #
cell type A 3,202,881,486 124,634
cell type B 2,975,312,234 278,357
cell type C 4,364,008,271 781,486
Thanks,
Chun