Error at loading gene dataset

284 views
Skip to first unread message

Helena Izquierdo

unread,
Sep 14, 2021, 1:15:15 PM9/14/21
to gsea-help
Good afternoon,

Upon trying to load my gene dataset in GSEA, the following error appears (please see below). I uploaded a .gct file (I attach the excel file "book5" from which I did the conversion). I already removed those genes with 0 expression values and checked issues with the spelling of the genes. Could you please help me identify what I am doing wrong?

Thanks so much in advance,
Best,

Helena


<Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:1
Parsing trouble
java.lang.NumberFormatException: ...

---- Stack Trace ----
# of exceptions: 1
------For input string: "PK !b�h^ � [Content_Types].xml"------
java.lang.NumberFormatException: For input string: "PK !b�h^ � [Content_Types].xml"
    at java.base/java.lang.NumberFormatException.forInputString(Unknown Source)
    at java.base/java.lang.Integer.parseInt(Unknown Source)
    at java.base/java.lang.Integer.parseInt(Unknown Source)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParseUtils._doIntParse(ParseUtils.java:114)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParseUtils.string2ints(ParseUtils.java:79)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parse(GctParser.java:128)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser.parse(GctParser.java:117)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:159)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:129)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:746)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:725)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserWorker.doInBackground(ParserWorker.java:51)
    at java.desktop/javax.swing.SwingWorker$1.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.desktop/javax.swing.SwingWorker.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)



Book5.xlsx

Anthony Castanza

unread,
Sep 14, 2021, 1:19:24 PM9/14/21
to gsea-help
Hi Helena,

From the error message, the document being parsed is an xml file, excel documents are internally xml files so what this error says to me is that when the book was saved from Excel it was saved as an xls/x file and then renamed GCT instead of being saved as tab delimited text (this is an option in Excel's save as dialogue) and then renamed from .txt to .gct.

Additionally looking at your attached file, it appears that all the samples have the same name, these should have unique names in the CGC and then be annotated as A or B like that in the CLS file.

If you still have issues after trying the above changes let me know

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/34a28356-229e-4101-834f-56ea9da07aabn%40googlegroups.com.

Helena Izquierdo

unread,
Sep 14, 2021, 1:50:30 PM9/14/21
to gsea-help
Hi Anthony,

Thanks for your quick reply.
Indeed, I did not save it first as tab delimited text, thanks.

I just did it now, and after, I did double click, get info, and changed the extension from .txt to .gct. Is this correct?
I also named WT1-5 and KO1-5.

I must have done something wrong again because I still get the same error. I send you a pdf with screenshots of both the .cls and the .txt file. Could you please help me with this?

Thanks a lot in advance,
Best,
Presentation1.pdf

Anthony Castanza

unread,
Sep 14, 2021, 2:08:21 PM9/14/21
to gsea...@googlegroups.com

It's difficult to tell from your screenshot what might be wrong as the simple text editor included with MacOS doesn't show any of the special formatting characters. I converted the original Excel file you sent and was able to load it into GSEA without errors, assuming you sent the complete file before, you should be able to use it as-is, but I would also recommend taking a close look at it seeing if you can determine how my file differs from yours.


Likewise I've prepared a CLS file that should work with your dataset.


I do have a technical question, I noticed a large number of negative values, this isn't typically expected with RNA expression data so is this expected for your dataset? By default GSEA computes a signal-to-noise ratio which might end up seeing a positive differential change between two groups of samples with negative means. If that isn't a desirable outcome for your data, I might suggest computing a differential expression between your phenotypes outside of GSEA using a data type appropriate method and then instead using GSEA Preranked.


Otherwise, let me know if you have any other issues with running GSEA.

 

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

CLS_for_Book5.cls
Copy_of_Book5.gct

Helena Izquierdo

unread,
Sep 14, 2021, 2:55:27 PM9/14/21
to gsea-help
Thanks a lot for your feedback. Indeed, using your files it worked but I cannot find what is different between book5 and my original excel. I have checked several times. I think it must be something related to the transformation into .gct. I send you my original excel file. Could you please save it as .txt and then do the transformation into .gct and send me both files please? if it is an issue with the way I am doing the transformation, your .gct should work and not mine. Thanks in advance and sorry for bothering you with this.

Regarding your technical question, I was also very surprised to see this. In fact, I will ask the bioinformatician who normalized the data, because I am not sure on how to interpret this exactly. Thanks for the suggestion.

Thanks again,
Best,

Helena
DESeq2_normalised_minus_0_value.xlsm

Anthony Castanza

unread,
Sep 14, 2021, 3:10:17 PM9/14/21
to gsea...@googlegroups.com

Hi Helena,

 

I downloaded the xlsm file you'd attached to your message, opened it in Excel, File > Save As, changed the File Format dialogue to "Tab delimited Text (.txt)". I then saved the file to my working directory. This intermediate file is attached as "Copy_of_DESeq2_normalised_minus_0_value.txt". I then duplicated the text file, opened the "Get Info" menu, and changed the .txt to .gct, that file is attached as "Copy_of_DESeq2_normalised_minus_0_value copy.gct". I made no other adjustments to the files.


As to the negative values, I wonder if they perhaps mean centered, or z-score normalized the data, that wouldn't generally be desired for GSEA. If you are able to get the raw counts (they should typically be integer values) from your bioinformatician in a tab delimited text matrix format, and are able to create a GCT from that, we have a module on GenePattern.org (a free cloud based bioinformatics platform that another arm of our lab runs) that will run DESeq2 (again, on the raw counts) and then output a ".normalized.counts.gct" that should work with GSEA and should meet the expectation of generally non-negative normalized counts.

Copy_of_DESeq2_normalised_minus_0_value.txt
Copy_of_DESeq2_normalised_minus_0_value copy.gct

Reza MMR

unread,
Sep 28, 2022, 7:43:21 PM9/28/22
to gsea-help
Hello, I'm having the same issue uploading my dataset. Could you kindly have a look at this file and let me know what the issue is? 

Thanks

Reza MMR

unread,
Sep 28, 2022, 7:44:25 PM9/28/22
to gsea...@googlegroups.com
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/ARVVV9Sh4Q4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a1d8d7a9-d121-418d-8ffc-4ca03c1679aen%40googlegroups.com.
GSEA.txt
Reply all
Reply to author
Forward
0 new messages