GeneSets should have unique names

134 views
Skip to first unread message

hoor.a...@googlemail.com

unread,
Aug 28, 2019, 10:25:04 AM8/28/19
to gsea-help
Hello,

I'm running GSEA on windows machine and I already tested it with some of the data available online (P53 dataset). Therefore I resumed to analyse my own data. However this error is occurring over and over again? my data has unique gene IDs and I'm analysing the data using the full list of available gene sets. When using the online version of the tool, GSEA produces results, therefore my assumption was the downloaded version should have the same capability?
the error message is too, attached the first 20 lines.

Thank you for your help

best

-----------------------------------------------------------------------------------

<Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:23746
GeneSets should have unique names. The looku ...

---- Stack Trace ----
# of exceptions: 1
------There were errors: ERROR(S) #:23746
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: GLI1_UP.V1_DN
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: GLI1_UP.V1_UP
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: E2F1_UP.V1_DN
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: E2F1_UP.V1_UP
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: EGFR_UP.V1_DN
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: EGFR_UP.V1_UP
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: ERB2_UP.V1_DN
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: ERB2_UP.V1_UP
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: GCNP_SHH_UP_EARLY.V1_DN
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: GCNP_SHH_UP_EARLY.V1_UP
GeneSets should have unique names. The lookup is case INsensitive. Found duplicate name: GCNP_SHH_UP_LATE.V1_DN

Anthony Castanza

unread,
Aug 28, 2019, 10:35:41 AM8/28/19
to gsea-help
Hello,

This error message implies that there are duplicate gene sets in the Gene Sets Database (.gmt) file. I was not able to replicate your issue with the P53 data set using the msigdb.v7.0.symbols.gmt. Which file are you using to run this analysis so that I can take a closer look at it.

Thanks,

-Anthony

Anthony S. Castanza
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

hoor.a...@googlemail.com

unread,
Aug 28, 2019, 10:40:16 AM8/28/19
to gsea-help
thanks,
I'm using my own data now (175 genes) but with all available geneSet (h, c1, 7*c2, 3*c3, 3*c4, 4*c5, c6 & c7). I even reduced the sets to only h, c1, 7*c2, c6,c7, but I got the same error.

acas...@gmail.com

unread,
Aug 28, 2019, 10:46:35 AM8/28/19
to gsea-help
What do you mean by "7*c2", "3*c3", etc?

If your intent is to run GSEA against the entire contents of the molecular signatures database (which we don't really recommend, due to the presence of significant redundancy between some of the collections, but can be fine if you're running in phenotype permutation mode), you should be using the msigdb.v7.0.symbols.gmt which contains all of the MSigDB gene sets.

Also, GSEA is designed to be run on entire gene expression data sets, not highly filtered lists of selected genes.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

hoor.a...@googlemail.com

unread,
Aug 30, 2019, 7:57:40 AM8/30/19
to gsea-help
thank you very much,

with the gmt file you suggested the error disappear and so it did when I chosed only the sets named with "all".
Totally agree with you, giving GSEA a pre-filtered list is not the right way giving its underlying algorithm. My confusion relies between giving GSEA the full GXP data (~20000 genes) despite knowing in advanse which subset are related to my phynotype, or reduse the noise and use the subset directly (e.g. ~2000 genes). Is there a way to use GSEA to continue where my analysis stopped instead of starting aparallel line? Please note that I'm not interested in generating hypothesis, but in interpreting the results.

thanks again
Reply all
Reply to author
Forward
0 new messages