CpG CHG and CHH context related query

Dr Muktesh Chandra

unread,

Apr 13, 2023, 4:06:20 PM4/13/23

to methylkit_discussion

Hi,

I am analyzing plant WGBS, I ran bismark, and % of C methylated were maximum in CpG > CHG >CHH context. I used awk '($6=="CG")' filename.CX.report.txt to get CpG context report and similarly I did for CHG and CHH context. My question is on counting the lines in each context order was CHH>CHG>CpG; is it normal?.

As having high %C methylated in CpG context, number of lines should be higher in CpG context. Any suggestions

I will appreciate your suggestions or explanations.

Thank you

-

Muktesh

Message has been deleted

Alexander Blume

unread,

Apr 19, 2023, 10:48:46 AM4/19/23

to methylkit_discussion

Hi Muktesh,

The awk command on the cytosine report returns all the lines for a given context independent of the methylation context, so there is no linear relationship between %C methylation and the number of lines per context.

Concerning your other question, I can say that for mammalian genomes the number of non-CpG-sites is way higher than CpG sites, so your ordering of CHH>CHG>CpG makes sense.

Best,

Alex

Dr Muktesh Chandra

unread,

Apr 19, 2023, 2:05:25 PM4/19/23

to methylkit_...@googlegroups.com

Hi Alex,

Thank you for your reply.

I reached out to a few internet resources, I used Bismark.coverage from Bismark extractor into methylkit. I want to use CHH, CHG content also, so, how do I define them in methread function, as bismark.cov report does not have any context information (if I use context = CHH/CHG, in methread() input does not have that information). Therefore, I used another method, by using --CX report and CpG/CHH/CHG context report, and made a methylkit supported format for all three context separately. But when I used this converted file issue come up in memory, to resolve it I did remove the duplicates and it worked fine for me (consulted thread "https://github.com/al2na/methylKit/issues/63").

But further getting deep into CpG context 28Million Cs were there before duplicate removal, but after removal, I only got 7M, is it correct I am losing 3/4th of the Cs?. Second I was wondering where the duplicate arise when we ran the deduplication step to remove PCR duplicates, in bismark.

I still want to figure out what would be the correct approach to have an analysis done from all three (CpG/CHH, CHG context).

I will appreciate your suggestion

Thank you

_

Muktesh

--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/zvqi1_vF72w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/a84dde11-2ed8-48d7-9c1d-04ad13a10a2bn%40googlegroups.com.

--

Muktesh

Alexander Blume

unread,

Apr 25, 2023, 7:34:30 AM4/25/23

to methylkit_...@googlegroups.com

Hi Muktesh,

We can only filter for contexts when using methRead with bismarkCytosineReport. However, you would still need to load the complete files into memory, thus I think your approach to use a single report file per context is already fine.
The Bismark coverage file does not have any context information, so again, you have to provide separate files for each context to use this.

I do actually not understand what “duplicate” issue you are talking about, but maybe what you are seeing is that after using methRead you keep fewer CpG sites than expected. This might happen since we are applying a default coverage filter of minimum 10 reads (argument ‘mincov'), to remove uninformative CpG sites.

I hope this helps.

Best,
Alex

To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/CAOWienZzPNhJfTSbhC69Eh6tHbjeq1kO2XMjkDdLR5%3Dvu91%3DGw%40mail.gmail.com.

Dr Muktesh Chandra

unread,

Apr 25, 2023, 8:50:12 AM4/25/23

to methylkit_...@googlegroups.com

Hi Alex,

Thank you for your insight.

It means I was following the correct approach.

Thank you!

-best

Muktesh

You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/12C69114-AD43-4D47-AA31-84F1AE87F611%40gmail.com.

--

Muktesh

Mohan Singh

unread,

Apr 25, 2023, 5:10:26 PM4/25/23

to methylkit_discussion

Hi Muktesh,

it is normal because number of CHH sites are far more than that of CHG and CG sites irrespective of methylation status. Therefore, overall number of mCs will be higher in CHH sites depending on plant species. In contrast, the proportion of mCs within the context will be highest in CG followed by CHG and CHH contexts. Hope you are cleared.

Reply all

Reply to author

Forward