Expected methylation distribution when using --CX (all cytosine contexts) in bismark_methylation_extractor + methylKit?

13 views
Skip to first unread message

Drielli Canal

unread,
Jun 9, 2026, 2:21:55 PMJun 9
to methylkit_discussion
Hello, I am analyzing whole-genome bisulfite sequencing data from a plant species using Bismark v0.21.0 and methylKit in R. My extraction command includes --CX and --comprehensive flags, which extract methylation information for all cytosine contexts (CpG, CHG, and CHH) into a single .bismark.cov.gz file: bismark_methylation_extractor \ --multicore 3 \ --comprehensive \ --bedGraph \ --CX \ --cytosine_report \ --report \ --genome_folder $GENOME \ --output $OUTDIR \ sample.bam I then read the files into methylKit using: myobj <- methRead(file.list, sample.id = as.list(sample.id), assembly = "my_assembly", treatment = treatment, context = "CpG", mincov = 7, pipeline = "bismarkCoverage") After tiling (1kb windows) and getMethylationStats(), the percent methylation histogram shows a flat/uniform distribution across 0-100%, rather than the expected bimodal pattern described in the methylKit documentation. My questions are: 1. Is this flat distribution expected when all three cytosine contexts (CpG + CHG + CHH) are mixed in a single file and read with pipeline = "bismarkCoverage"? In plants, CHH methylation is predominantly low (~0-5%) and CHG is intermediate, which I suspect dilutes the bimodal CpG signal. 2. Is context = "CpG" the correct parameter when reading a file containing all three contexts, or should a different approach be used? 3. Is it valid to perform differential methylation analysis across all contexts simultaneously using this approach, or should each context be analyzed separately? For reference, samples extracted without --CX (CpG only) show the expected bimodal distribution. Thank you.methylation.png

Alexander Blume

unread,
Jun 10, 2026, 9:13:37 AMJun 10
to methylkit_...@googlegroups.com
Hi,

Thanks for the detailed question and for sharing the plots.

In general, the three contexts should be analyzed separately. Coverage levels, effect sizes, and variance structures differ substantially between CpG, CHG, and CHH, and methylKit's models assume a homogeneous context. Also in terms of biology they serve different purposes, so you really should not combine them.

The coverage files have to be generated and stored per context, as they do not carry the context information in the file, so there is no way to separate them after loading.That being said, I cannot say anything about your observed distribution, but is likely caused by the presence of all contexts as you described.

Just use the cytosine report directly and switch to pipeline = "bismarkCytosineReport" when calling methRead. Here changing the context argument will filter the respective lines.

Hope that helps.

Best,
Alex


'Drielli Canal' via methylkit_discussion <methylkit_...@googlegroups.com> schrieb am Di., 9. Juni 2026, 20:21:
Hello, I am analyzing whole-genome bisulfite sequencing data from a plant species using Bismark v0.21.0 and methylKit in R. My extraction command includes --CX and --comprehensive flags, which extract methylation information for all cytosine contexts (CpG, CHG, and CHH) into a single .bismark.cov.gz file: bismark_methylation_extractor \ --multicore 3 \ --comprehensive \ --bedGraph \ --CX \ --cytosine_report \ --report \ --genome_folder $GENOME \ --output $OUTDIR \ sample.bam I then read the files into methylKit using: myobj <- methRead(file.list, sample.id = as.list(sample.id), assembly = "my_assembly", treatment = treatment, context = "CpG", mincov = 7, pipeline = "bismarkCoverage") After tiling (1kb windows) and getMethylationStats(), the percent methylation histogram shows a flat/uniform distribution across 0-100%, rather than the expected bimodal pattern described in the methylKit documentation. My questions are: 1. Is this flat distribution expected when all three cytosine contexts (CpG + CHG + CHH) are mixed in a single file and read with pipeline = "bismarkCoverage"? In plants, CHH methylation is predominantly low (~0-5%) and CHG is intermediate, which I suspect dilutes the bimodal CpG signal. 2. Is context = "CpG" the correct parameter when reading a file containing all three contexts, or should a different approach be used? 3. Is it valid to perform differential methylation analysis across all contexts simultaneously using this approach, or should each context be analyzed separately? For reference, samples extracted without --CX (CpG only) show the expected bimodal distribution. Thank you.methylation.png

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/95037d89-f5c7-45a8-aeae-27e9411f097fn%40googlegroups.com.

Drielli Canal

unread,
Jun 10, 2026, 10:05:08 AMJun 10
to methylkit_discussion

Thank you very much for the detailed explanation.

That makes perfect sense. I will regenerate the analyses using CpG, CHG, and CHH separately as recommended. Thank you again for your help and for pointing me in the right direction.

Reply all
Reply to author
Forward
0 new messages