Inquiry regarding joint analysis of two breeds using fixed effects in methylKit

15 views
Skip to first unread message

Guilherme Oselame

unread,
Mar 17, 2026, 9:46:57 AMMar 17
to methylkit_discussion

Hi,

I am using methylKit for a DNA methylation analysis and I would like to clarify the best way to include 'Breed' as a fixed effect in my model.

Currently, I have a dataset with 19 animals. My experimental design consists of two groups (Affected vs. Control). I am processing them together in methRead using the treatment vector to identify cases and controls. The samples are structured as follows:

  • Affected Group (10 animals): 5 animals from Breed 1 and 5 animals from Breed 2.

  • Control Group (9 animals): 5 animals from Breed 1 and 4 animals from Breed 2.

In my preliminary attempt, I added the breed information as a covariate using numeric values (1 and 2). However, I want to ensure that the model treats 'Breed' strictly as a fixed effect and not as a continuous numeric covariate, especially since the sample sizes per breed are slightly different between groups.

I am relatively new to this type of statistical analysis in bioinformatics and would appreciate help with the following:

  1. Does calculateDiffMeth support treating categorical variables like 'Breed' as fixed effects?

  2. Since I used numbers (1 and 2) to represent the breeds, should I explicitly convert this column to a factor in R before passing it to the covariates argument to ensure it is treated as a fixed effect?

  3. Given my specific sample distribution (5+5 affected, 5+4 control), is the standard calculateDiffMeth approach with covariates the most robust way to handle this, or is there a better alternative in methylKit for multi-breed designs?

Thank you for your time and for the support!

Best regards,

Guilherme.

alex....@gmail.com

unread,
Apr 8, 2026, 6:14:01 PMApr 8
to methylkit_discussion
Hi Guilherme,

Thank you for your detailed question. Happy to help clarify each point.

1. Yes, calculateDiffMeth supports categorical fixed effects. The covariates data.frame is passed directly to R's glm(), which respects the standard factor/numeric distinction. There is a section in Altuna's book which explicitly states that covariate columns can be factor variables: https://compgenomr.github.io/book/extracting-interesting-regions-differential-methylation-and-segmentation.html

2. Yes, this is essential. Using numeric 1 and 2 tells the model to treat breed as a continuous slope, which is not what you want. You could define string labels for the breeds and use as.factor() to create the factor. With two breed levels, R will create one dummy variable (Breed2 vs. Breed1 as reference), which is correct. The order of samples in covariates_df must exactly match the order used in methRead.

3. For your design (5+5 affected, 5+4 control), calculateDiffMeth with Breed as a factor covariate is the standard methylKit approach. Your breed distribution is reasonably balanced across groups, which is good. The main limitation to be aware of is that calculateDiffMeth does not support interaction terms, so if you hypothesise that the Affected/Control effect differs between breeds, this model will not capture that. If you need a richer model (e.g., interactions, random effects for animal), you'd need to step outside methylKit and use something like DSS (which supports multi-factor designs with a formula interface) or limma/edgeR on summarized methylation values.

One important note: the row order of covariates must exactly match the sample order in your methylBase object. A mismatch will not throw an error but will silently assign breeds to the wrong samples.

Please feel free to follow up if anything is unclear.

Best,
Alex
Reply all
Reply to author
Forward
0 new messages