compare Hi-C loop calls across multiple cell type

59 views
Skip to first unread message

Chun Su

unread,
Jun 2, 2021, 6:09:04 PM6/2/21
to Fit-Hi-C
Hi Dr. Ay,

I ran Fit-Hi-C for a set of Hi-C samples across different conditions. When applying same FDR cutoff (say < 1e-6), I noticed that some conditions have dramatically more loops compared to the other. It can be due to input Hi-C matrix read count, Or it can be biological. 

I am wondering 
1) how much read count difference will bias the final loop call number in Fit-Hi-C? Have you ever done any down-sampling trials in the same condition to evaluate this? 
2) Is there any easier way to solve big difference among final loop call numbers without subsampling Hi-C data at .bam file level? I start with .cool file for Fit-Hi-C call.

Here are the read count vs. loop call number across 3 conditions I have done at 4Kb resolution.

condition read # loop #
cell type A 3,202,881,486 124,634
cell type B 2,975,312,234 278,357
cell type C 4,364,008,271 781,486

Thanks,
Chun


Ferhat Ay

unread,
Jun 13, 2021, 10:54:12 AM6/13/21
to Fit-Hi-C
REad count has a quite strong impact on the loop calls. But as your A vs B comparison shows, it is not always only that. 
Downsampling the reads is generally a good solution. You can work out the math to do downsampling from the counts (.cool or otherwise).
You may want to see if there are other factors (copy number, rearrangements etc) impacting these counts. 
Visualizing the data side by side (A vs C) together with their loop calls would also give you some idea if there are systematic biases.
Reply all
Reply to author
Forward
0 new messages