There were errors: ERROR(S) #:1 Parsing trouble java.lang.NumberFormatException: ...

1,353 views
Skip to first unread message

Mahtab Dastpak

unread,
Jul 1, 2022, 4:46:02 PM7/1/22
to gsea-help
Greetings,

I updated the new version of GSEA on mac and tried to upload my data. It includes the first column: Ensembl ID and the second column: Log2FC:

Gene Ensembl. ID    log2FoldChange
ENSG00000122548    5.79602174
ENSG00000135443    5.622378182
ENSG00000171346    5.469398459
ENSG00000236816    5.336128105
...

I did not have any errors with the older version. However now I found a new error with different data that I upload in the new version of GSEA:

<Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:1
Parsing trouble
java.lang.NumberFormatException: ...

---- Stack Trace ----
# of exceptions: 1
------For input string: "Log2FC"------
java.lang.NumberFormatException: For input string: "Log2FC"
    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source)
    at java.base/jdk.internal.math.FloatingDecimal.parseFloat(Unknown Source)
    at java.base/java.lang.Float.parseFloat(Unknown Source)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.AbstractParser.parseStringToFloat(AbstractParser.java:250)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.RankedListParser.parse(RankedListParser.java:72)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readRankedList(ParserFactory.java:556)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:771)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:737)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserWorker.doInBackground(ParserWorker.java:53)
    at java.desktop/javax.swing.SwingWorker$1.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.desktop/javax.swing.SwingWorker.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)


Would you please kindly let me know what the problem is?

Thanks.
Mahtab

Anthony Castanza

unread,
Jul 1, 2022, 4:50:33 PM7/1/22
to gsea-help
Hi Mahtab,

The issue would appear to be that there are unexpected text strings in the second column (the log2fc) column.

We generally expect that the header row in the .rnk format start with the # character which tells GSEA to not interpret whatever you've named that column as a gene name. Additionally, from the specific text of the error message it would appear like you might have a second instance of the string Log2FC somewhere else in that column.

Let me know if you still have issues after checking for these two things.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/57e2dccf-5f4f-4e19-97ab-f1bd91f9e4e0n%40googlegroups.com.

Mahtab Dastpak

unread,
Jul 5, 2022, 1:52:18 PM7/5/22
to gsea...@googlegroups.com

Hi Anthony,

Thanks so much for your help. After adding # to the header, it worked.
I have a new issue with the result. I found "---" for some essential pathways in NES and NOM p-val (attached). I was wondering if it means too much or too low. So, can I use these pathways in my interpretation?


Thanks.
Mahtab






GSEA-results.xlsx

Anthony Castanza

unread,
Jul 5, 2022, 1:59:14 PM7/5/22
to gsea...@googlegroups.com

Hi Mahtab,

 

This result indicates that there was something wrong with the null distribution that was generated for this data.

This can occur if there is a large skew in the dataset causing there to be no valid null generated for the given set on the same side as the true score.

How many genes are in the ranked list, and on the enrichment plot where is the “Zero cross” annotated?

You can also look at the null ES distribution plot at the bottom of the set report page.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

Mahtab Dastpak

unread,
Jul 5, 2022, 2:09:17 PM7/5/22
to gsea...@googlegroups.com
How many genes are in the ranked list? 33896 genes
How many of the enrichment plots where is the “Zero cross” annotated? I actually don't know how to find it.

I could find a warning at the bottom of the set report page.
  • Scoring produced infinite or NaNs values which may have prevented plotting for certain gene sets. See the log for more details.

Thanks.
Mahtab



--


 


Mahtab Dastpak

unread,
Jul 6, 2022, 12:10:14 PM7/6/22
to gsea...@googlegroups.com


Hi Anthony,

Here is the answers to your questions:


--


 


Anthony Castanza

unread,
Jul 6, 2022, 3:24:24 PM7/6/22
to gsea...@googlegroups.com

The zero cross should be annotated on all of the enrichment plots. Or it’s also the number of genes that have positive ranking metrics vs. the number of genes that have negative ranking metrics. A large skew in one direction or another can lead to this sort of issue.

 

Since you’re using preranked data you might also want to check the list for any genes with “Inf” “-Inf” “NaN”, etc values, these can be what the “Scoring produced infinite or NaNs values”  warning can be referring to.

 

If you like, you can send your ranked list and the gene sets you’re analyzing to gsea...@broadinstitute.org and I can see if I can figure out what precisely is causing this to occur and if there is an easy fix for it.

Esther palomino lago

unread,
Dec 27, 2022, 3:20:22 PM12/27/22
to gsea-help
Hi,

I am not sure if this group is still open, but I am getting crazy with the error message. I followed up all the manual and instructions and I still do not know where my error is. The input is showed below. I have the same issue with the label.cls file. I am doing something wrong, obviously. I can not figure out what is it. Could you kidly help me with it?

Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:1
Parsing trouble
java.lang.NumberFormatException: ...

---- Stack Trace ----
# of exceptions: 1
------For input string: "Description"------
java.lang.NumberFormatException: For input string: "Description"

    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source)
    at java.base/jdk.internal.math.FloatingDecimal.parseFloat(Unknown Source)
    at java.base/java.lang.Float.parseFloat(Unknown Source)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.AbstractParser.parseStringToFloat(AbstractParser.java:250)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.AbstractParser.parseFieldsIntoFloatArray(AbstractParser.java:361)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.AbstractParser.parseTextMatrixToDataset(AbstractParser.java:277)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.TxtDatasetParser.parse(TxtDatasetParser.java:108)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetTXT(ParserFactory.java:200)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:758)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:735)

    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserWorker.doInBackground(ParserWorker.java:53)
    at java.desktop/javax.swing.SwingWorker$1.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.desktop/javax.swing.SwingWorker.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

Castanza, Anthony

unread,
Dec 27, 2022, 4:06:04 PM12/27/22
to gsea...@googlegroups.com

Hi Esther,

 

We generally ask that people create new topics for their specific issue to prevent undesired notifications to the people that originally created a given topic.
That said, it appears that GSEA is having difficulty parsing your TXT formatted expression dataset. If you could, please double check and confirm that your file matches the specification here for .txt formatted files: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#TXT:_Text_file_format_for_expression_dataset_.28.2A.txt.29
Assuming that your file looks like it matches that format, and isn’t accidentally a .GCT file that has the wrong file extension, could you perhaps send a screenshot of the file open in a plain text editor? That would help us to diagnose what could be going wrong here.

 

One quick thing you might also try, assuming everything else looks good (although this shouldn’t be actually causing an error) would be to replace the text “Description” with “DESCRIPTION” (all uppercase, no quotes). The parser shouldn’t have trouble with the mixed-case version, but we’ve been working on the parser code recently so it’s always possible we’ve introduced some unintended behavior.

 

Thanks!

 

-Anthony

 

Anthony S. Castanza, PhD

Department of Medicine

University of California, San Diego

 

Reply all
Reply to author
Forward
0 new messages