heatmap and empty values

118 views
Skip to first unread message

Barbora Kvokačková

unread,
Apr 23, 2021, 3:19:17 AM4/23/21
to gsea-help
Hi,

I am doing GSEA analysis for RNAseq data. I replaced all cells with 0 values in gct files with empty cells. When I check the heatmaps from analysis I see black boxes, where there are empty cells (attached).  My colleague did some GSEA using same dataset and it is fine in her case (red/blue scale only). Do you know what might be issue? 

Thank you

Have a nice day

Barbora 


AKT_UP.V1_UP_79.png





Anthony Castanza

unread,
Apr 23, 2021, 3:28:26 AM4/23/21
to gsea...@googlegroups.com

Hi Barbora,

 

You should not replace “0” values with empty cells. Empty cells are “absent data” and a gene with absent data for a sample has that sample excluded from the gene’s differential expression calculation (black cells in the heatmap), a zero for RNA-seq is a real value and that value will be included in the differential expression calculation.

Additionally, just to confirm, you did perform some sort of between-sample normalization on this data, correct? If not, the raw counts (including any quantified zeros) should be normalized using something like DESeq2 and then the matrix of normalized counts used for GSEA. If you didn’t do this already and don’t know how to do this normalization manually, the DESeq2 module on GenePattern (https://cloud.genepattern.org/) or the DESeq2 function on Galaxy (https://usegalaxy.org/) are both capable of generating this counts matrix output, although I’d personally recomeng GenePattern since the output is in GCT format that can be used as-is with GSEA.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

 

 

 

 

 

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/ec4f71dc-20cc-494e-bbf6-c8bcabc59753n%40googlegroups.com.

Barbora Kvokačková

unread,
Apr 23, 2021, 6:50:24 AM4/23/21
to gsea...@googlegroups.com
Hi Anthony, 

thanks a lot for your response. Yes, data were normalized. 
But it seems like I am back at the start, because I have troubles loading "new file" with 0 values back. 

I have it in gct format and getting error:

Error Details>

---- Full Error Message ----
There were errors: ERROR(S) #:1
Parsing trouble
java.lang.NumberFormatException: ...

---- Stack Trace ----
# of exceptions: 1
------For input string: "309,4913788"------
java.lang.NumberFormatException: For input string: "309,4913788"
at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source)
at java.base/jdk.internal.math.FloatingDecimal.parseFloat(Unknown Source)
at java.base/java.lang.Float.parseFloat(Unknown Source)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parseHasDesc(GctParser.java:215)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parse(GctParser.java:167)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser.parse(GctParser.java:117)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:159)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:129)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:746)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:725)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserWorker.doInBackground(ParserWorker.java:51)
at java.desktop/javax.swing.SwingWorker$1.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.desktop/javax.swing.SwingWorker.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)


I tried also .txt but there is an issue with it as well...The 309,4913788" is the first value in my table and I guess it is in the wrong format? 

thanks

Barbora



pi 23. 4. 2021 o 9:28 Anthony Castanza <acas...@cloud.ucsd.edu> napísal(a):
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/6qChbgDOB94/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/BYAPR05MB57828E0F3DDA15807CE4225CF7459%40BYAPR05MB5782.namprd05.prod.outlook.com.

Anthony Castanza

unread,
Apr 23, 2021, 12:17:45 PM4/23/21
to gsea...@googlegroups.com

Hi Barbora,

 

Commas are not an expected character in numeric strings in the data format. Is it possible your data is comma separated instead of tab separated? Both our .TXT and our .GCT specifications (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29) expect tab delimited text.

If that isn’t the problem, please open your file in a plain text editor (notepad on a PC or TextEdit) and send me a screenshot of what it looks like. That should help us narrow down the problem.

David Eby

unread,
Apr 23, 2021, 1:59:16 PM4/23/21
to gsea...@googlegroups.com
Hi Barbora,

It's also possible that these numbers somehow were converted to a language-specific or locale-specific format, if for example the file was opened in a program like Excel that does automatic conversions.  While some countries use a comma as the decimal separator (like "309,4913788" instead of "309.4913788") unfortunately the GSEA file parsers do not take this account.

Regards,

Barbora Kvokačková

unread,
Apr 25, 2021, 10:38:54 AM4/25/21
to gsea...@googlegroups.com
Dear Anthony and David,

Thank you for your response. It looks like the problem occurs only when I work on Mac and try to prepare the file right there (I prepared files also on Win and everything works fine there and I can run analysis). 
In the case of Mac, I changed the decimal separator from comma to full stop also in Mac setting and replaced all dots with commas in excel files and converted it to gct (photo attached). But now I have a problem with gct format itself (picture attached). I have 58051 genes in my data...

Thank you 

Barbora 

pi 23. 4. 2021 o 19:59 David Eby <e...@broadinstitute.org> napísal(a):
GSEA_gctfile.png
gctfile.png

Anthony Castanza

unread,
Apr 25, 2021, 7:03:41 PM4/25/21
to gsea...@googlegroups.com
Hi Barbora,

Based on the error message, it looks like there are probably a couple extra blank lines at the bottom of your GCT file.
If you open it in a plain text editor and scroll all the way down to the bottom, you should see some extra whitespace after the last gene. Delete that whitespace and the file should load.


-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

Barbora Kvokačková

unread,
Apr 26, 2021, 5:18:00 AM4/26/21
to gsea...@googlegroups.com
Hi Anthony, 

thank you a lot. That was the issue. 

Have a nice day

Barbora 

po 26. 4. 2021 o 1:03 Anthony Castanza <acas...@cloud.ucsd.edu> napísal(a):
Reply all
Reply to author
Forward
0 new messages