MSigDB GMT files with Ensembl gene ids instead of Entrez IDs or Symbols

106 views
Skip to first unread message

Edoardo “Dado” Marcora

unread,
Mar 7, 2024, 2:10:57 PM3/7/24
to gsea-help
Does the GSEA team provide MSigDB GMT files with Ensembl gene ids instead of Entrez IDs or Symbols? If not, why not... especially since the source gene annotation is Ensembl to begin with?

Thanks for the clarification

Anthony Castanza

unread,
Mar 7, 2024, 2:45:06 PM3/7/24
to gsea...@googlegroups.com
Hi Edoardo,

We do not provide a version of MSigDB in Ensembl IDs. NCBI/Entrez Gene IDs are provided largely for legacy reasons, but are also used in the process of building the gene annotation files to help ensure that large numbers of computationally annotated sequences with no corroborating evidence aren't added to MSigDB's files by requiring that each gene has been, in some form, annotated by both Ensembl and NCBI.
We generally recommend performing analysis in the Gene Symbols namespace (which is what our chip files are designed to map to) as this typically provides the most informative output for users.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

On Thu, Mar 7, 2024 at 11:10 AM Edoardo “Dado” Marcora <edoardo...@gmail.com> wrote:
Does the GSEA team provide MSigDB GMT files with Ensembl gene ids instead of Entrez IDs or Symbols? If not, why not... especially since the source gene annotation is Ensembl to begin with?

Thanks for the clarification

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a9e73a8b-63d3-409a-98dd-315c647a3d10n%40googlegroups.com.

Edoardo “Dado” Marcora

unread,
Mar 8, 2024, 6:57:26 PM3/8/24
to gsea-help
Thanks for the clarification.

Even though Gene Symbols may provide the most informative output for most users, they are not very good unique identifiers, especially when working across datasets and across time. Anyhow, since you provide a version with Entrez IDs I thought it wouldn't be much work to also provide a version with ENSG IDs also for users who prefer them over Gene Symbols.

Best regards,

Edoardo

Reply all
Reply to author
Forward
0 new messages