help with error 1005 : The collapsed dataset was empty when used with chip

33 views
Skip to first unread message

Da-Sol Kuen (LJI)

unread,
May 3, 2023, 12:47:12 PM5/3/23
to gsea-help
I loaded both my .gct and .cls files successfully  and when I tried to run GSEA using following Gene IDs and the "Human_Ensembl_Gene_ID_MsigDB.v2023.1.Hs.chip" platform, I get the error 1005. I think it's because I need to trim out the version (".##" format) from the gene IDs. Here is a snippet of my data for your reference. 


ENSG00000012124.16 NA 0.894477968 0.102426381 8.043164408
ENSG00000142798.17 NA 3.376654328 1.075476998 19.28391711
ENSG00000164938.13 NA 1.520612545 0.358492333 7.182275251
ENSG00000183508.4 NA 9.481466457 1.664428688 22.03876241
ENSG00000087250.8 NA 3.175396785 2.330200163 28.92587567
ENSG00000119922.9 NA 7.938491962 2.714299091 31.16418748
ENSG00000110852.4 NA 26.65544343 8.578209391 84.12116903
ENSG00000204677.10 NA 0.424877035 0.409705523 5.854046266
ENSG00000185745.9 NA 1.162821358 0.332885738 5.017753942
ENSG00000124785.8 NA 3.756807464 2.944758448 26.07264303
ENSG00000145287.10 NA 28.13133208 10.1402117 83.45705454

Is this the right solution? If not, please advise on how I should change my ENSEMBL gene Ids or whether I should select a different platform option.
Screen Shot 2023-05-03 at 9.45.51 AM.png

Castanza, Anthony

unread,
May 3, 2023, 12:49:29 PM5/3/23
to gsea...@googlegroups.com

Hello,

Yes you’re correct. In order to use the MSigDB Ensembl Gene ID chip files you need to trim the .## versions from the base gene IDs.

 

-Anthony

 

Anthony S. Castanza, PhD

Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/d6c40415-f649-4735-95e0-c223fe0bb1cfn%40googlegroups.com.

 

Yuhan Zhang

unread,
Feb 6, 2024, 12:47:10 AMFeb 6
to gsea-help
Hi,

May I ask how can I trim the .## versions from the base gene IDs? I'm having the same error. Why do we need to do that?

Thank you!

Anthony Castanza

unread,
Feb 6, 2024, 5:25:25 PMFeb 6
to gsea...@googlegroups.com
Hello,

The Ensembl ID chip file that we provide does not have the Gene version IDs that are used as a suffix for Ensembl Gene IDs. The chip file only contains the "base" ID and uses this for mapping to gene symbols.
The best way to trim the gene version would depend on your technical abilities. The least technical way would be to use Excel's functions to split columns to split on the "." character. A more efficient way would be to perform a string split with, for example, R's strsplit function.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
Reply all
Reply to author
Forward
0 new messages