min.per.group and cov.bases

14 views
Skip to first unread message

Kristina Santucci

unread,
May 13, 2026, 9:05:06 PMMay 13
to methylkit_discussion
Hi,

I was hoping to get some clarification on the min.per.group and cov.bases parameters (for a CpG only analysis). Lets say I have 4 controls and 4 treatments and I unite this with a min.per.group = 3, from my understanding CpG bases must be covered in 3/4 samples for both groups. Then when I tile regions of 200bp and set cov.bases = 2, does this mean at least 2 CpG bases must be present in the region for DMR calling (both having coverage in 3/4 samples as filtered earlier), OR do CpG sites in that 200 bp region must only be covered by two samples from either group to be considered for DMR calling? In either case, on what grounds would a 200bp region be discarded?

Thank you in advance for the clarification.

alex....@gmail.com

unread,
Jun 23, 2026, 4:50:18 AM (10 days ago) Jun 23
to methylkit_discussion
Hi Kristina,

Thanks for asking this question, this confuses a lot of people. 

Your first assumption about the min.per.group is correct. During uniting/merging of samples, a CpG or region is only kept if it has been detected by the minimum number of samples per group (min.per.group = 3) [see code ]. Please note that by default we keep only sites that are detected in all samples, irrespective of their groups.

For the regional analysis (tileMethylCounts, regionCounts), the first statement is true. The cov.bases are used to filter regions based on the number of CpGs they cover (see code). This allows us to keep low-coverage regions where samples may not cover the exact same CpGs, but close-by CpG within the same window. 

To answer your last question, the order will be important here. 
a) If you first merge and then tile, your samples need to have even coverage over the genome to retain a large amount of CpGs. Any CpG, which is not covered in three out of the four samples per group, will be discarded. Then, during the tiling step, the genome is chunked using sliding windows, and any window which does not at least overlap two CpGs from the merged data will be discarded as well. The tiling should increase the per-region coverage and can improve the statistical testing during DMR calling. As a note, if you recover a lot of CpGs during the merging, I would also recommend doing the per CpG DMC analysis and then using methSeg to aggregate them into DMRs.

b) You can also reverse the order of those two steps. Doing the tiling before merging will allow you to use recover more regions if the per CpG coverage is variable or you are dealing with low-coverage samples. The chance of missing a region during the merging be lower, and you should enter with more regions for DMR calling. However, this is usually only an issue when dealing with low-coverage probes, like cfDNA or similar, not so much when dealing with tissue samples etc.   

I hope that helps. 

Best,
Alex

Reply all
Reply to author
Forward
0 new messages