Hi,
I am trying to calculate the bisulfite conversion rates for whole genome bisulfite sequencing dataset.
The bisulfite conversion rate of each base (non-CpG) can be calculated as T / (T + C) * 100, where T is thymine and C is cytosine read numbers on that base.
Below is from the aligned (by bismark) and methylation-called (by methylKit) file of one sample (CHH context).
chrBase chr base strand coverage freqC freqT
scaffold1.1005 scaffold1 1005 F 12 0.00 100.00
scaffold1.1006 scaffold1 1006 F 13 0.00 100.00
scaffold1.1016 scaffold1 1016 F 17 0.00 100.00
scaffold1.1024 scaffold1 1024 F 18 0.00 100.00
scaffold1.1039 scaffold1 1039 F 17 11.76 88.24
scaffold1.1046 scaffold1 1046 F 16 0.00 100.00
scaffold1.1067 scaffold1 1067 F 23 0.00 100.00
.....
To calculate overall conversion rate of this sample, i think i should calculate below.
1. all C = sum of (freqC * coverage) in this sample
2. all T = sum of (freqT * coverage) in this sample
3. overall conversion rate = all T / (all C + all T)
Is it correct?
Also, what is the acceptable range of non-conversion rates?
Thank you very much!