noRedundant list of terms

93 views
Skip to first unread message

seb...@gmail.com

unread,
Aug 8, 2024, 11:25:24 AM8/8/24
to webgestalt
Hi,

Trying to get my hands on the complete list of terms populating the algorithm-reduced Biological Process (noRedundant) used by WebGestalt. Thanks!

John Elizarraras

unread,
Aug 8, 2024, 4:45:08 PM8/8/24
to webgestalt
Hello,

Attached are the GO: BP no redundant files. The DES file contains the GO ID and the corresponding name. The GMT file has the GO ID and the genes that are in that set. We also allow you to download the GMT file in the results page. I attached a screenshot showing where the download link is located, in case you want to download more gene sets in the future. You can change the last part of the URL in the download link from the screenshot to change if you would like the DES file instead of a GMT.
shapes at 24-08-08 15.41.29.png

Let me know if you have any questions.

Best,
John
go_bp_no_redundant.tar.gz

Rüçhan Ekren

unread,
Oct 1, 2024, 7:58:16 AM10/1/24
to webgestalt
Hi,

I am curious about what kind of processing did you perform to obtain this non-redundant geneset list.

If possible, I would like to replicate the procedure to decrease the redundancy of different GMT genesets (in R).

Thanks,
Rüçhan

ilian atanassov

unread,
Nov 21, 2024, 6:40:47 AM11/21/24
to webgestalt
Hello,

I would like to second this request. It would be very beneficial to be able to reproduce the generation of non-redundant sets for GSEA since the list of organisms on Webgestalt that one can use is still very limited.

Please include a button/option/menu to transform a full .gmt to a non-redundant .gmt file. This will enable more users to use Webgestalt with their desired yet less popular organisms.

Best wishes,

Ilian

John Elizarraras

unread,
Jan 14, 2025, 3:29:17 PMJan 14
to webgestalt
Hello,

The algorithm for the GO terms based on the hierarchical structure of Gene Ontology. We essentially start from the leaves of the hierarchy and move up the structure until we find a node that is between 20 and 500 genes. This is the non-redundant set. Since most databases are not in a hierarchical structure, we can't apply this technique to most gene sets.

WebGestalt does have redundancy removal in the Advanced Parameters section below where you enter in your gene lists. This is not dependent on a hierarchy and can be used for any organism or database. If you would like to remove redundancy from a GMT file before analysis, you could try using one of these methods yourself. k-Medoid and affinity propagation would be my recommendations. cluster and apcluster are R packages that implement these methods and could be a good starting point.

Let me know if you have any questions.

ilian atanassov

unread,
Sep 19, 2025, 10:20:43 AM (4 days ago) Sep 19
to webgestalt
Dear John,

Thank you very much for the clarification. How do you deal with situations where the leaf node has less than 20 genes and the parent node has more than 500? Also, what about situations where the leaf node already has more than 500 genes. Do you keep the GO term or do you discard it?

Best wishes,

Ilian

Reply all
Reply to author
Forward
0 new messages