Enrichment ratio for Overrepresentation analysis

640 views
Skip to first unread message

aparajitha89

unread,
Nov 19, 2018, 9:59:03 AM11/19/18
to webgestalt
Hi there, 

I have been using DEGs from an RNASeq experiment to predict pathways and process enrichment using the overrepresentation method. The results are presented as top 10 significant, ranked based on their enrichment scores. How are these scores calculated?

Thanks 

Yuxing Liao

unread,
Nov 19, 2018, 5:48:56 PM11/19/18
to aparaj...@gmail.com, webge...@googlegroups.com
ORA uses hypergeometric test to calculate p-values of the observed number of genes in one gene set versus the expected number of genes in that set from the reference. FDR is p-values corrected for multiple testing with BH method. Current default sorting in bar chart is based on enrichment ratio, which is just the ratio of the number of observed genes divided by expected value. You could change the parameter to show more top results or use a FDR cutoff.

Yuxing

--
You received this message because you are subscribed to the Google Groups "webgestalt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgestalt+...@googlegroups.com.
To post to this group, send email to webge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgestalt/17a04033-2131-4f73-9fe7-0682b38c00ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aparajitha89

unread,
Nov 19, 2018, 6:37:54 PM11/19/18
to webgestalt
Thank you. Do you recommend sorting by enrichment ratio or by FDR cut-offs? Sorry, I am currently writing a paper, and wondered what would be the best way to present my data.

Yuxing Liao

unread,
Nov 20, 2018, 1:41:04 PM11/20/18
to aparajitha89, webgestalt
Well, first you need to make sure the results you got, say top 10, are significant. If they all have small FDRs, then it is better to present the results sorted by enrichment ratio, since that is biologically relevant.

Also if the data came from RNASeq, it is better to use protein coding genes as the reference set instead of all genes.

Reply all
Reply to author
Forward
0 new messages