GSEA Gene Identifiers

79 views
Skip to first unread message

kelven

unread,
Aug 10, 2022, 2:51:49 PM8/10/22
to gsea-help
Hello, I am running GSEA on larvae/adult RNA-seq samples. I have run the Tuxedo pipeline which consists of mapping the reads against the genome using TopHat, then assembling the transcripts using Cufflinks and Cuffmerge. Then, Cuffdiff results in the differential expression data. GSEA requires specific gene identifiers (Ensembl ID, NCBI ID,...), however our data's gene identifiers are not among the accepted gene identifiers. How should we annotate our gene IDs with, for example, the correct NCBI IDs? Should we use BLASTx and choose the ID of the highest match?
Please note that the species of interest is novel, so no known protein IDs exist for it.

Thank you very much in advance.

Kelven

Anthony Castanza

unread,
Aug 10, 2022, 5:40:14 PM8/10/22
to gsea-help
Hi Kelven,

The gene sets we offer through MSigDB only officially support analysis of human, mouse, and rat data. The data you have from a non-mammalian model probably wouldn't be particularly meaningful mapped across such a wide evolutionary gap. However, if you can find gene sets for a model organism closer to your novel model, some sort of orthology mapping would generally be the way to go. 

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/97ae5a22-a2f2-4ab7-9b4a-ed17f9619b86n%40googlegroups.com.

kelven

unread,
Aug 16, 2022, 1:31:06 PM8/16/22
to gsea-help
Dear Anthony,

Thank you for your prompt reply, and sorry for not getting back to you sooner. I understand that I have to find a gene set of a closely related model organism, however I've tried searching and haven't been able to find gene sets available for download other than the MSigDB ones (which I obviously can't use). My species is Hymenopteran, so any Hymenopteran species would work. Is there another publicly available database for gene sets?
Alternatively, can I create my own gene set by grouping Gene Ontology terms? Or can I manually create a gene set simply by simply using BLAST? For example, I have the ABC transporter genes of my species which I found by BLASTing other species' sequences against my own. Can I consider this a gene set?

Thank you.
Kelven

Anthony Castanza

unread,
Aug 16, 2022, 1:38:43 PM8/16/22
to gsea...@googlegroups.com

Hi Kelven,

 

Unfortunately I don’t know of any specific resources that provide sets already in the formats needed for GSEA. However, yes, you can create your own gene sets using the assignments of genes to Gene Ontology terms, or any other method for creating groups of genes that are likely to function together as a signature of a particular process.

You might want to reach out to the Hymenoptera Genome Database, https://hymenoptera.elsiklab.missouri.edu/ , since they seem to have Gene Ontology annotations (https://hymenoptera.elsiklab.missouri.edu/hgd-go-annotation) and ortholog data available

They might be able to assist you with things like propagation of assignments up the ontology term tree.

 

Hope this helps,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

Reply all
Reply to author
Forward
0 new messages