Discrepancy between the most significant categories depicted in the gsea barchart e those reported in the enrichment result file

Chiara MAGRI

unread,

Dec 6, 2023, 9:35:58 AM12/6/23

to webgestalt

Hi all,

I’m running WebGestaltR with the following parameters for the GSEA enrichment analysis:
Minimum number of IDs in the category: 10
Maximum number of IDs in the category: 500
Significance Level: FDR < 0.1
Number of permutation: 1000
The output states: “Based on the above parameters, 13 positive related categories and 82 negative related categories are identified as enriched categories, in which 20 most significant categories are shown in this report”. What does exactly "the most 20 significant categories" mean? In the bar chart I can see 42 gene sets: 13 up regulated and 29 down-regulated. Moreover, it seems that there is a discrepancy between the categories depicted in the bar chart and the categories listed in the file "enrichment_results.txt". The depicted categories are not those with the best False Discovery Rate (FDR) or the best Normalized Enrichment Score (NES), making it difficult to understand how they are considered the most significant categories.

Best

Chiara

Yuxing Liao

unread,

Dec 7, 2023, 3:27:53 PM12/7/23

to Chiara MAGRI, webgestalt

Hi Chiara,

In most cases, it is simple. 20 will show 10 positive and 10 negative sorted by FDR. It could be 10 positive and 5 negative if there are just 5 significant gene sets in the negative direction.

In your case, I think it is related to something I had to add later to make sure representatives of the clusters from redundancy reduction methods also in the reported list, or they will be broken if affinity propagation or weighted set cover is chosen. So if the database is highly redundant like GO and the reportNum parameter is small, it could turn out to be more than the set threshold. And it seems after rescuing these representatives, the result list is not sorted again. The latter can be fixed quickly, but we are going to change the redundancy reduction in a new version soon. I will make a GitHub issue for this problem.

The bottom line is the text file or the data frame returned by the R function call is always the correct result. The reporNum is just used to filter for some top results to show in the HTML report with some caveats.

Yuxing

Informativa sulla Privacy: https://www.unibs.it/it/node/1452

--
You received this message because you are subscribed to the Google Groups "webgestalt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgestalt+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgestalt/fe5313cc-49ad-4f9f-82e0-8ebf98de0ca8n%40googlegroups.com.

Chiara MAGRI

unread,

Dec 11, 2023, 2:51:38 AM12/11/23

to Yuxing Liao, webgestalt

Thank you very much for your kind reply.

Best regards

Chiara

Reply all

Reply to author

Forward