help with gct file!

455 views
Skip to first unread message

Melise Edwards

unread,
Dec 3, 2021, 9:50:50 AM12/3/21
to gsea-help
I have run several GSEA in the downloaded GSEA 4.1.0 software and file formatting has not been an issue. This morning of course while I am in a hurry, my .gct file is not being accepted into the program. The error is "Parsing troublejava.lang.NumberFormatException." 

I have made sure that the first line is #1.2, the second line is my number of rows, then samples, followed by my normalized RNAseq counts. I cannot figure out what is happening here because it is exactly like the other .gct files I have used. 

Can anyone take a second look and see what might be going on?


GSEA_LET_CON_DEC2.gct

Melise Edwards

unread,
Dec 3, 2021, 10:57:57 AM12/3/21
to gsea-help

If we have GSEA version 4.1.0, is the first line of our .gct file still #1.2? 

When I tried to use the online gene pattern site to get the gct file instead of doing it through Rstudio, it gave me an error that the version was outdated?

I have double and triple checked the file format guidelines and cannot figure this out. This is the error when I upload my .gct file into the GSEA program downloaded onto my computer:

Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:1
Parsing trouble
java.lang.NumberFormatException: ...

---- Stack Trace ----
# of exceptions: 1
------For input string: "PK

Melise Edwards

unread,
Dec 3, 2021, 11:20:44 AM12/3/21
to gsea-help
I figured it out. That was painful. In this youtube video, the woman explains that you have to save it as a tab delimited txt file and just change the ending to .gct. 

https://www.youtube.com/watch?v=_zpH-DgE33U

This is confusing when you are likely editing it in excel. 

This was my process for future folks working with RNAseq data (and if anyone has helpful critiques). It is not clear on the myriad of forums, gene pattern website or GSEA manual:

- upload raw counts
- convert to DGElist 
- normalize data using edgeR (CalcNormFactors)
-  save counts per million into a new dataframe; for example using a dgelist called "e"
> e <- as.data.frame(cpm(e))
- export as an excel file or table
- manually insert the two rows needed at the top of the data (first row, #1.2; second row is number of genes followed by number of samples) 
- save as tab delimited .txt file but ensure the ending of the file is ".gct"

David Eby

unread,
Dec 3, 2021, 12:04:07 PM12/3/21
to gsea...@googlegroups.com
Hi Melise,

GSEA will actually accept a similar TXT format without the two extra header lines (basically just the data matrix); in this case, the extension would remain ".txt".  Since you are working in R anyway, you could skip Excel entirely and just go directly to this format using something like the write.table function.  Just be sure to change the field separator to tab and make sure to turn off quoting since GSEA won't recognize the quoted values.

To be clear, this is the only format GSEA will accept with the TXT extension; it won't take arbitrary TXT files.  But in your case it should produce the results you need.

We actually discourage the use of Excel for this kind of file manipulation, since unless you are very careful it's quite easy for your data to be accidentally changed by the auto-formatting features with e.g. something like MAR1 being turned into a date of March 1 (the same is true of LibreOffice / OpenOffice, FWIW).  Obviously Excel is tremendously convenient for this type of data transformation and it's probably the easiest way for non-programmers.  Just be sure to use the File->Import tool and set all columns to Text rather than General.

I hope that's helpful.

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/4b7b5bd5-4039-40c8-a64c-209507f11ccdn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages