GSEA run keeps ERROR-ing

1,368 views
Skip to first unread message

Felicia Surjono

unread,
Feb 24, 2021, 7:43:55 PM2/24/21
to gsea-help
hello,

not quite sure why my GSEA run keeps saying there's an error. I'm uploading my data in a txt format and I've followed the GSEA data upload. Below is what my format data looks like. I'm trying to use the gene database c7.all.v7.2.symbols.gmt and the error message keeps saying "no templates specified --- 0 length str array"

NAME DESCRIPTION AVG_LOGFC
Prelid3b na -0.39769032
Rab8b na -0.340940677
Gm43305 na -0.292387152
Psap na 0.353789115
Ripor2 na -0.267658835
Casd1 na 0.334084268
Ifi27l2a na -0.433358218
Vps41 na -0.274104305
Asap1 na -0.251603682
Trip12 na -0.322703866
Plac8 na -0.730417224
Eif4b na 0.434525223
Vps35 na 0.300696729
Bcl2a1b na -0.268721686
Plgrkt na -0.329519052
Med11 na -0.280647361
Bend4 na -0.459577834
Snrpd1 na -0.313963649
Gzmk na -0.506623915
Srgn na -0.256165104
Mmadhc na -0.25400907
Pim1 na -0.398152153
Wac na -0.363738795
Casp8 na -0.34649923
Trbc1 na -0.794899076
Ccni na -0.322136548
Il10rb na 0.267286216
Dgat1 na -0.33334217
Bet1l na -0.261191436

Anthony Castanza

unread,
Feb 24, 2021, 7:46:18 PM2/24/21
to gsea...@googlegroups.com

Hi Felicia,

 

If you have data like this that’s already been ranked by a metric like LogFC, it needs to be formatted in a .RNK file (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#RNK:_Ranked_list_file_format_.28.2A.rnk.29)

And run through the GSEAPreranked function.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/9a42155f-f8c6-46f4-b3b2-2567f670c831n%40googlegroups.com.

Felicia Surjono

unread,
Feb 24, 2021, 7:56:33 PM2/24/21
to gsea-help
Hi Anthony,

Thank you so much for your response! do you by any chance know how to convert csv to rnk? sorry, i'm very new to this 

Anthony Castanza

unread,
Feb 24, 2021, 8:04:06 PM2/24/21
to gsea...@googlegroups.com

Hi,

I would recommend converting it programmatically so that the gene symbols are preserved, but if that’s not possible, you can use excel if you’re careful by importing the data with the File>Import function and being sure to specify the gene symbol column as TEXT and not “General”. You’ll want to save the reformatted file as tab delimited text and change the file extension to .RNK

 

Since it also looks like your genes are from mouse, you’ll need to be sure to use the Mouse_Gene_Symbol orthology CHIP file for the version of MSigDB you’re selecting gene sets from with “Collapse” set when running GSEA Preranked.

Felicia Surjono

unread,
Feb 24, 2021, 8:19:07 PM2/24/21
to gsea-help
Hi,

Do you mean the "Mouse_Gene_Symbol_Remapping_MsSigDB.v7.0.chip?

I don't see the one that just says Mouse_Gene_Symbol

Anthony Castanza

unread,
Feb 24, 2021, 8:21:21 PM2/24/21
to gsea...@googlegroups.com

That is the chip file for Mouse symbols for the 7.0 release of MSigDB, the current release of MSigDB is 7.2 and the corresponding mouse chip file is: Mouse_Gene_Symbol_Remapping_Human_Orthologs_MSigDB.v7.2.chip

Felicia Surjono

unread,
Feb 25, 2021, 1:20:00 PM2/25/21
to gsea-help
Hi,

I got another error message. I don't see where I load my data in for the pre-ranked gene list. and I'm not sure what to use for the Gene sets database.  

Felicia Surjono

unread,
Feb 25, 2021, 1:22:26 PM2/25/21
to gsea-help
Nevermind, it says its running! 

Required Fields
Gene sets database /Users/surjonof/Desktop/GSEA/msigdb.v7.2.symbols.gmt
Number of permutations 1000
Ranked list 

Felicia Surjono

unread,
Feb 25, 2021, 1:30:37 PM2/25/21
to gsea-help
what does it mean when it says phenotype: na_pos vs na_neg? 

like the genes upregulated genes in X vs Y sample is na_pos and the down regulated genes na_neg? 

Anthony Castanza

unread,
Feb 25, 2021, 1:34:07 PM2/25/21
to gsea...@googlegroups.com

Hi Felicia,

 

Glad you were able to get the analysis to run! Yes, na_pos is the upregulated side of your list and na_neg is the downregulated side. In GSEA’s standard mode this would be populated by the phenotype data and directions from the CLS file, but in Preranked mode we don’t have that information so fill in placeholder values and base it solely on the supplied directions in the ranked list.

 

Feel free to reach out with other questions

Felicia Surjono

unread,
Feb 25, 2021, 2:34:30 PM2/25/21
to gsea-help
Hi Tony,

Is there a way to export the snapshot of enrichment results to an EPS os SVG format? 

Anthony Castanza

unread,
Feb 25, 2021, 2:40:26 PM2/25/21
to gsea...@googlegroups.com

Hi Felicia,

 

Yes, you can output GSEA’s image in SVG format. This option is under “Advanced fields” called “Create SVG plot images”.

 

One thing to note, you have to rerun GSEA in order to generate new plots, but this will result in a new random seed being used for permutation testing which will cause some variance in the results. If you want to create identical results you will need to copy the random seed value from the index.html page of your results. It should be at the very bottom under “Comments  Timestamp used as the random seed: [value]” You’ll need to copy that value to the GSEA “Seed for permutation field” replacing where it says “timestamp”.

Felicia Surjono

unread,
Feb 25, 2021, 4:06:05 PM2/25/21
to gsea-help
Hi Tony,

the SVG plot images was only for the global statistics, is there one for the enrichment results, so when i click snapshot of enrichment results - there is an option to download as svg? 

Anthony Castanza

unread,
Feb 25, 2021, 4:09:04 PM2/25/21
to gsea...@googlegroups.com

Hi Felicia,

 

If you look in the GSEA output directory for that analysis run on your hard drive using your file browser it should have written the SVG images there.

Felicia Surjono

unread,
Feb 25, 2021, 4:37:43 PM2/25/21
to gsea-help
thank you so much!!

Felicia Surjono

unread,
Feb 25, 2021, 7:49:49 PM2/25/21
to gsea-help
Hi Tony,

can the data upload include NA? or should i delete the NA and leave them blank? my data won't upload and i suspect its because of the NAs

Anthony Castanza

unread,
Feb 25, 2021, 7:52:36 PM2/25/21
to gsea...@googlegroups.com

For a ranked list, all the genes present in the file need to have a numeric value.

 

For gct files being run through standard GSEA, not GSEA preranked, a sample with a missing value for a given gene should have that missing value left blank.

Felicia Surjono

unread,
Feb 25, 2021, 7:53:17 PM2/25/21
to gsea-help
i just tried it with deleting the NA and leaving it blank and it didn't work

Anthony Castanza

unread,
Feb 25, 2021, 7:54:52 PM2/25/21
to gsea...@googlegroups.com

For a RNK file you need to delete the entire row.

If you’ve done this and are still getting error messages, please include the full text of the error.

Felicia Surjono

unread,
Feb 26, 2021, 2:23:00 PM2/26/21
to gsea-help
hi tony,

I'm trying to interpret the GSEA enrichment plot and for the ranked list metric the website says "The ranking metric measures a gene’s correlation with a phenotype. The value of the ranking metric goes from positive to negative as you move down the ranked list. A positive value indicates correlation with the first phenotype and a negative value indicates correlation with the second phenotype. For continuous phenotypes (time series or gene of interest), a positive value indicates correlation with the phenotype profile and a negative value indicates no correlation or inverse correlation with the profile." 

when the description says "phenotype" does that refer to my control vs experiment phenotype?  And do "hits" represent each gene? 

Felicia Surjono

unread,
Feb 26, 2021, 2:43:58 PM2/26/21
to gsea-help
hi tony,

how do i delete error runs from GSEA reports , its the window where i can "click 'status' fields for results"

Anthony Castanza

unread,
Feb 26, 2021, 3:41:31 PM2/26/21
to gsea...@googlegroups.com

The only way to delete error runs is to relaunch GSEA, which clears all runs from the session log.

 

For interpreting the enrichment plot; “phenotype” in the documentation is referencing the phenotype parameters from running in the standard GSEA mode. In GSEA Preranked this is just the order of the ranked list as you created it (so whichever was the positive side of the distribution and was assigned to na_pos is the positive phenotype and vice versa).

 

A “hit” refers to a gene in the gene set that is also in your ranked list.

Felicia Surjono

unread,
Feb 26, 2021, 5:54:32 PM2/26/21
to gsea-help
Thank tony. 

i tried to run another preranked list but the error message said ---- Full Error Message ----
After pruning, none of the gene sets passed size thresholds.

---- Stack Trace ----
# of exceptions: 1
------After pruning, none of the gene sets passed size thresholds.------
xtools.api.param.BadParamException: After pruning, none of the gene sets passed size thresholds.
at org.gsea_msigdb.gsea/xtools.gsea.AbstractGseaTool.checkAndBarfIfZeroSets(AbstractGseaTool.java:45)
at org.gsea_msigdb.gsea/xtools.gsea.GseaPreranked.execute(GseaPreranked.java:104)
at org.gsea_msigdb.gsea/edu.mit.broad.xbench.tui.TaskManager$ToolRunnable.run(TaskManager.java:435)
at java.base/java.lang.Thread.run(Unknown Source)

what does this mean?

Anthony Castanza

unread,
Feb 26, 2021, 6:09:00 PM2/26/21
to gsea...@googlegroups.com

This is caused by one of a couple different errors;

  1. You’ve selected gene sets that are smaller than the min size threshold in GSEA after filtering them to just the genes in the input dataset.
  2. There weren’t enough genes in your input list to run GSEA (GSEA expects full expression lists of all genes that were analyzed)
  3. The gene identifiers in your input file didn’t match the gene symbols in the gene sets you’ve selected (You need to use a CHIP file that matches the identifiers in your input data and collapses them to the Gene Symbols in MSigDB)

2 and 3 are technically possible causes of the root issue which in turn cause #1.

Felicia Surjono

unread,
Feb 26, 2021, 6:53:30 PM2/26/21
to gsea-help
hi tony

this is my short list of preranked 25 genes looks like the following: 

Gene  avg_logFC
Fabp50.61970159
Tnf-0.4208032
Apoe0.33881033
Acp5 0.30102532

I'm using this chip ftp.broadinstitute.org://pub/gsea/annotations_versioned/Mouse_Gene_Symbol_Remapping_Human_Orthologs_MSigDB.v7.2.chip

and I'm using this as gene sets databse
msigdb.v7.2.symbols.gmt

so maybe the gene list is too short?

Anthony Castanza

unread,
Feb 26, 2021, 6:56:13 PM2/26/21
to gsea...@googlegroups.com

GSEA expects a dataset of 10,000 to 20,000 (or more genes). This corresponds to all of the expressed genes of an entire microarray or RNA-seq experiment. 25 genes is not sufficient to run GSEA.

UNDI RAMBABU

unread,
Nov 19, 2021, 12:29:52 AM11/19/21
to gsea-help
Hi Surjono,

I am trying to analyze some ranked data similar to yours. In your case how did you generate the source file? I am getting an error showing " no templates specified-0 length str array"
Could you help me with this?

Thank you
Ram

Anthony Castanza

unread,
Nov 19, 2021, 2:05:49 PM11/19/21
to gsea...@googlegroups.com

Hi Ram,

 

Please double check for hidden file extensions. This error occurs when a RNK file has inadvertently been loaded in as a .TXT file. Once loaded in with the correct parser it should show up under the left side bar's GSEA Preranked option.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

Reply all
Reply to author
Forward
0 new messages