will running fithic chromosome by chromosome at 1kb resolution bias the result?

172 views
Skip to first unread message

Chun Su

unread,
Apr 12, 2021, 7:11:22 PM4/12/21
to Fit-Hi-C
Hi, I created fragment and interaction files from a high resolution (1kb) .cool file and fit into HiCKRy to generate bias file, then run fithic.  Since I am only interested in intra-chromosomal interactions, to reduce the memory and time for run, I want to perform the whole process chromosome by chromosome, then merge the results at end and performed fdr correction on p-value. The code to run process chr-by-chr is listed below

```
# cooler dump (it will only dump intra-chromosomal)
cooler dump --join -r $chr $cool_file  > cooler_dump.bedpe

# create int.txt.gz
awk 'BEGIN{OFS="\t"; FS="\t"}{print $1,$2,$4,$5,$7}' cooler_dump.bedpe | gzip > $chr"_int.txt.gz"

# create frag.txt.gz
zcat $chr"_int.txt.gz" | awk 'BEGIN{OFS="\t"; FS="\t"}{a[$1"\t0\t"$2]+=$5; a[$3"\t0\t"$4]+=$5;}END{for (coord in a) { print coord, a[coord],1 }}' | sort -k1,2n | gzip > $chr"_frag.txt.gz"

# create KRbias.txt.gz
python /mnt/isilon/sfgi/programs/fithic/fithic/utils/HiCKRy.py -i $chr"_int.txt.gz" -f $chr"_frag.txt.gz" -o $chr"_KRbias.txt.gz"

# run fithic 
fithic -i $chr"_int.txt.gz" \
-f $chr"_frag.txt.gz" \
-o ./ \
-r 1000 \
-t $chr"_KRbias.txt.gz \
-x intraOnly
```

I have two questions:
1) will excluding inter-chromosomal contacts in fragment and Interaction files cause the bias on HiCKRy bias value generation? Is it wrong to do so? 

2) If 1) is wrong, what if I create full interaction-fragment-bias set by including inter-chromosomal contacts for each chromosome, then run fithic chromosome-by-chromosome, is it acceptable?

Thank you,
Chun

Ferhat Ay

unread,
Apr 13, 2021, 3:55:05 PM4/13/21
to Fit-Hi-C
Hi. I don't see any issue in running HiCKRy chr by chr. Maybe Arya can comment. Again the critical part may be to make sure you filter out a sufficient number of low coverage bins for normalization such that the resulting bias values are meaningful.
You may want to still transform them to have mean value of 1 per chr. 
You can also consider just computing coverage for each 1kb bin (per chr or genome-wide) then transform those values to have a mean of 1 and use them instead of bias values.
One thing I should mention, if these are high seq depth 1kb contact maps, you may want to give Mustache a try: https://github.com/ay-lab/mustache

Chun Su

unread,
Apr 14, 2021, 4:19:51 PM4/14/21
to Fit-Hi-C
Thank you for quick reply, Dr. Ay! 

In terms of filtering low coverage bins before HiCKRy bias calculation (as you suggested above) , should I filter by bin-pair contact count (interactions) or bin marginalized contact counts (fragment)? Also what is the appropriate contact count cutoff to use? 

Another way I can perform the filter step is to only use the contact within upperbound and lowerbound distance, which can be kept same for fithic setting. Will this way be more appropriate? 

I did use mustache to call loops at 1kb before running fithic :)  Since mustache is based on the local background model, I know I can call loops chromosome by chromosome.  Here I want to do a quick loop result comparison by using tools based on different algorithms. 

Ferhat Ay

unread,
Apr 15, 2021, 2:35:48 AM4/15/21
to Fit-Hi-C
By the marginalized counts (fragment). Certainly any row/column with zero counts needs to go you may want to look at the distribution/histogram of such counts to decide a better threshold if needed.  
You can apply the same idea to a specific distance range, that would work too. 

Kaul, Arya

unread,
Apr 15, 2021, 11:10:49 AM4/15/21
to fit...@googlegroups.com
Just to add on, the -x option is used to select the bottom x% of sparsest rows/cols to remove. You can tweak that as you see fit, or just filter it beforehand yourself. In terms of running HiCKRy per chr, I see no reason why that shouldn't work.

--
You received this message because you are subscribed to the Google Groups "Fit-Hi-C" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fithic+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fithic/668b024a-4621-45d1-9a71-20fcbd849dfan%40googlegroups.com.


--
Harvard Medical School PhD Candidate Bioinformatics and Genomics
UC San Diego B.S. Bioinformatics 2019

Chun Su

unread,
Apr 15, 2021, 12:12:13 PM4/15/21
to Fit-Hi-C

This is very helpful! Thank you both!
Reply all
Reply to author
Forward
0 new messages