How to begin to interpret significant differental methylation output from RnBeads

S

unread,

Jan 28, 2016, 4:14:29 PM1/28/16

to Epigenomics forum

Do you have suggestions for how to proceed after obtaining a large number of differentially methylated sites?

I have successfully obtained differential methylation output from RnBeads. I would like to map these CpG sites to genes, so I can understand how many unique genes the significant CpG sites are on, and also known function.

There are gene set analysis packages available, but unsure where to go first after obtaining an RnBeads result as there seem to be many different options.

I recognize that RnBeads has built in features in the pipeline - for enrichment analysis word cloud etc - but I am interested in learning how to customize this. Also having trouble understanding why the report is split into "tiling" "genes" "promoters" and "cpgislands" -- should I only be focusing on the "gene" ones?

I also understand that I can also look at differentially methylated regions to narrow down, but I would like to start with sites.

Thanks very much for any advice from the more experienced. I am a new R user, so any code and ideas you have for navigating this would be great.

Fabian

unread,

Jan 29, 2016, 3:02:39 AM1/29/16

to Epigenomics forum

Hi,
In RnBeads we use the GOstats package and are quite happy with it. With a bit of R programming experience you can compute the tables using the RnBeads function "performEnrichment.diffMeth" which returns a list of enrichment results for various comparisons, region types with gene association, direction of differential methylation and cutoffs.

Best regards,
Fabian

S

unread,

Jan 29, 2016, 9:22:41 AM1/29/16

to Epigenomics forum

Thanks, I will start there. Do you have an example of a paper where they used RnBeads GO to do all the interpretation of differentially methylated sites/regions? I am trying to follow some sort of standard practice for conducting these analyses.

Fabian

unread,

Feb 11, 2016, 3:16:40 AM2/11/16

to Epigenomics forum

Hi,
0) I am currently not aware of a published paper that used RnBeads GO analysis.
1) If you really want to reduce your dataset to genes of interest, you can use the remove.site(rnb.set, ...) method. You could select all sites outside of your genes of interest and remove them this way. However, in general I would recommend the more unbiased way of looking at all of them.
2) currently not in RnBeads. Try to see whether the GOstats package can do it.
Hope that helps.

Best,
Fabian

S

unread,

Feb 11, 2016, 1:18:45 PM2/11/16

to Epigenomics forum

Thank you!

I have some follow up questions.

1. Can you give me an example of how the automatic rank cutoff works? I realize this is the description but I don't think I am quite understanding -- what is the top list?

"automatically select a rank cutoff for given ranks and p-values current implementation: sort the p-values according to rank. select as rank cutoff the rank for which the worst (i.e. max) p-value in the top list is still smaller than the best (i.e. min) p-value of the group of worst-ranking p-values of equal size as the top-list"

2. The pre-defined regions include gene, promoter, CpG islands, and tiling regions. Can you explain what exactly the CpG islands and tiling regions are? I realize they are 5000 base pair windows - but where are they in relation to the genes and promoters? Would it be adequate to just focus on the gene and promoter regions since those are the ones linked to gene names and hence functions?

Fabian

unread,

Feb 12, 2016, 3:03:02 AM2/12/16

to Epigenomics forum

Hi sure.
1) the automatic rank cutoff is something that we have been experimenting with in order to find a threshold for our combined rank criterion. It's not validated statistically yet, but we made some positive experiences with it on our datasets. It works the following way: We look at the k best ranking sites or regions and we look at the k worst ranking sites or regions. Sort the p-values according to rank. Select as rank cutoff the rank for which the worst (i.e. max) p-value in the top list of k p-values is still smaller than the best (i.e. min) p-value of the group of k worst-ranking p-values.

2) tiling windows are generated by chopping up the genome into adjacent, non-overlaping 5kb windows. For CpG islands, we use the definition from UCSC.

Stephanie

unread,

Feb 12, 2016, 12:02:49 PM2/12/16

to Epigenomics forum

1) Are there other alternatives to looking at the automatic cutoff, looking at the top 100, or the top 500?

1a) Using the regions, is there a way to tell if there are more regions differentially methylated when you stratify by another factor (such as sex)? For example, for males there were 50 DMRs with p value <0.05 vs. only 2 DMRs for females.

2) Does that mean that tiling windows maybe inclusive of the genes, promoter, and CpG island regions? So the 4 categories are not mutually exclusive.

Thanks

Stephanie

Fabian

unread,

Feb 13, 2016, 3:29:21 AM2/13/16

to Epigenomics forum

1) The idea behind the ranks is, that you can be more or less strict depending on your problem. For some problems like for instance environmental studies it might make sense to be strict and only look at the top few DMRs. In other cases like cancer, we see large epigenome-wide changes. Here it might make sanse be less strict.
1a) You can include covariates such as gender directly in the differential methylation analysis. If you want to look at males and females seperately, make different comparisons for them and see whether the effect sizes and p-values differ.
2) yes. In fact they might also overlap within the same region type. For instance, take genes which might be overlapping

Reply all

Reply to author

Forward