RnBeads interpretation question.

111 views
Skip to first unread message

Katie Kerr

unread,
Jun 15, 2020, 6:41:15 AM6/15/20
to Epigenomics forum
Hi all, 

I'm new to R and RnBeads, so apologies in advance for questions which are likely very basic.

Here are some details of the analysis we ran for context:
  • EPIC data (Sample group: 49 cases, 28 controls)
  • Covariate adjusted (sentrix ID, age, sex, smoking status)
  • Predefined option profile 450k_full
  • Normalisation method BMIQ, no background normalisation
I have a few questions about the interpretation of results;

(1) More of a stats question; when examining the gene level results in the .csv file, I noticed that one miRNA has a very large mean methylation difference (MIR3678, -0.75, screenshot below). Despite this large difference the FDR p value is nowhere close to significance (p = 0.68), could anyone shed some light on why this might be? 



(2) I understand that the combined rank is a measure of the strength of evidence for differential methylation. However, again in the gene level results file, when I sort from smallest to largest the smallest rank is 672. In fact, none of the combined rank scores start from 1 in any of the outputted results files (tiling, sites, cg islands or promoters). Why would this be?

(3) On a more general note, is there documentation available which clearly describes each of the column headings for each results file? 

Any advice at all is appreciated, as well as general tips on the interpretation. 

Thanks, 
Katie
 

Michael Scherer

unread,
Jun 16, 2020, 2:50:35 AM6/16/20
to Epigenomics forum
Dear Katie,

Thanks a lot for using RnBeads, I will try to answer your questions as good as possible.

1) The mean difference can be driven by only a few outliers in each of the groups, which could point to technical problems or genotypic variation. This is why the p-value that you obtain is not good, but the mean difference is still high. So, this gene is probably not a true differentially methylated gene.

2) The combined rank is computed as the worst rank from the three criteria p-value, mean difference, and log ratio. Since the ranks are not re-ranked (i.e. ranked from 1 to the number of sites), we rather keep the original, i.e. worst rank from the three criteria.

3) The column headings are best described in the RnBeads report, specifically in the differential_methylation.html. Here is what is shown there (for the region level):
  • id: region id
  • Chromosome: chromosome of the region
  • Start: start coordinate of the region
  • End: end coordinate of the region
  • [symbol]: associated gene symbol to the given region [only valid for gene associated regions]
  • [entrezID]: Entrez ID of the gene associated with the region [only valid for gene associated regions]
  • mean.mean.g1,mean.mean.g2: (where g1 and g2 is replaced by the respective group names in the table) mean of mean methylation levels for group 1 and 2 across all sites in a region
  • mean.mean.diff: Mean difference in means across all sites in a region
  • mean.mean.quot.log2: log2 of the mean quotient in means across all sites in a region
  • comb.p.val: Combined p-value aggregating p-values of all sites in the region using a generalization of Fisher's method [1]
  • comb.p.adj.fdr: FDR adjusted combined p-value
  • combinedRank: mean.mean.diff, mean.mean.quot.log2 and comb.p.val are ranked for all regions. This column aggregates them using the maximum, i.e. worst rank of a site among the three measures
  • num.sites: number of sites associated with the region
  • mean.num.na.g1,mean.num.na.g2: Mean number of NA methylation values accross all sites in group 1 and group 2 respectively
  • mean.mean.covg.g1,mean.mean.covg.g2: Mean value of mean coverage values (across all samples in a group) across all sites in a region
  • mean.nsamples.covg.thresh.g1,mean.nsamples.covg.thresh.g2: mean number of samples (accross all considered sites) that have a coverage larger than 5 for the site in group 1 and group 2 respectively
You might also want to check out our website (rnbeads.org) for further explanations and tutorials.

Hope that helps,

Michael

Katie Kerr

unread,
Jun 16, 2020, 4:47:51 PM6/16/20
to Epigenomics forum
Hi Michael, 

Thank you for getting back to me, that's very helpful. We just have two follow up questions please; 

  • With regards to outliers and the insignificant p value, we have checked the range of methylation values between case and controls, and they are very similar (0.05 for cases and 0.04 for controls). Is there another reason that the P value could be significant, other than outliers? E.g. Would the fact that MIR3678 only contains one CG site affect this?
  • Could you tell us if the combined ranking is additive or if the rankings must follow a similar pattern? I.e. We'd be more convinced by a marker that ranked 21, 22, and 23 in different rankings, than one that ranked 1, 2, 63, as both total 66 under an additive combined ranking but the first set is more consistent. 

Many thanks,
Katie

Michael Scherer

unread,
Jun 17, 2020, 2:44:11 AM6/17/20
to Epigenomics forum
Dear Katie,

I am happy to help.

  • Yes, the number of CpGs per region also has an influence on the computed p-value, so this could also be the reason.
  • The ranking is not additive, but the maximum (i.e. worst) rank of the three criteria. In your case this would be 23 for 21, 22, 23 and 63 for 1, 2 ,63. Please note that the ranking criteria should be highly correlated, since they essentially describe different sites of the same medal.
Hope that helps,

Michael
Reply all
Reply to author
Forward
0 new messages