Renaming rpt dir on error

230 views
Skip to first unread message

WANG Zifeng

unread,
Aug 26, 2021, 12:30:53 AM8/26/21
to gsea-help
Hi authors,

   Thanks for your great software. I run 3 groups analysis using the commond: 
awk '{print "gsea-cli.sh  GSEA -res 002_GSEA/"$7"_all.TMP.GeneName.gct   -cls 2v2.cls#Treat_versus_Ctrl   -gmx h.all.v7.4.symbols.gmt   -out  002_GSEA -rpt_label  "$7".hallmark    -collapse No_Collapse -permute gene_set -mode Max_probe -norm meandiv -nperm 1000  -rnd_type no_balance -scoring_scheme weighted  -metric Signal2Noise -sort real -order descending -create_gcts false -create_svgs false -include_only_symbols true -make_sets true -median false -num 100 -plot_top_x 50 -rnd_seed timestamp -save_rnd_lists false -set_max 500 -set_min 15 -zip_report false"}' Sample.info.2Rep | uniq | xargs -iCMD -P0 bash -c CMD

Two groups finished successfully, but one field. I confirm there is no mistake on input data format. 

/data/software/biosoftware/003_gsea/GSEA_Linux_4.1.0/gsea-cli.sh  GSEA -res 002_GSEA/Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle_all.TMP.GeneName.gct   -cls /data/home/000_index/024_GSEA_GeneSet/2v2.cls#Treat_versus_Ctrl   -gmx /data/home/000_index/024_GSEA_GeneSet/010_Homo_GeneSets_Broad_v20210420/h.all.v7.4.symbols.gmt   -out  002_GSEA -rpt_label  Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle.hallmark    -collapse No_Collapse -permute gene_set -mode Max_probe -norm meandiv -nperm 1000  -rnd_type no_balance -scoring_scheme weighted  -metric Signal2Noise -sort real -order descending -create_gcts false -create_svgs false -include_only_symbols true -make_sets true -median false -num 100 -plot_top_x 50 -rnd_seed timestamp -save_rnd_lists false -set_max 500 -set_min 5 -zip_report false
echo Using bundled JDK.
WARNING: package com.apple.laf not in java.desktop
WARNING: package com.sun.java.swing.plaf.windows not in java.desktop
WARNING: package sun.awt.windows not in java.desktop
720      [INFO  ] - Parameters passed to GSEA tool:
722      [INFO  ] - gmx /data/home/000_index/024_GSEA_GeneSet/010_Homo_GeneSets_Broad_v20210420/h.all.v7.4.symbols.gmt
722      [INFO  ] - res 002_GSEA/Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle_all.TMP.GeneName.gct
722      [INFO  ] - cls /data/home/000_index/024_GSEA_GeneSet/2v2.cls#Treat_versus_Ctrl
722      [INFO  ] - rpt_label Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle.hallmark
722      [INFO  ] - collapse No_Collapse
722      [INFO  ] - zip_report false
722      [INFO  ] - gui false
722      [INFO  ] - out 002_GSEA
722      [INFO  ] - mode Max_probe
722      [INFO  ] - norm meandiv
722      [INFO  ] - nperm 1000
723      [INFO  ] - permute gene_set
723      [INFO  ] - rnd_type no_balance
723      [INFO  ] - scoring_scheme weighted
723      [INFO  ] - metric Signal2Noise
723      [INFO  ] - sort real
723      [INFO  ] - order descending
723      [INFO  ] - include_only_symbols true
723      [INFO  ] - make_sets true
723      [INFO  ] - median false
723      [INFO  ] - num 100
723      [INFO  ] - plot_top_x 50
723      [INFO  ] - rnd_seed timestamp
723      [INFO  ] - save_rnd_lists false
724      [INFO  ] - create_svgs false
724      [INFO  ] - create_gcts false
724      [INFO  ] - set_max 500
724      [INFO  ] - set_min 5
792      [INFO  ] - Begun importing: Dataset from: Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle_all.TMP.GeneName.gct
>Eca109_Vec_Vehicle_Rep1<
java.lang.NumberFormatException: For input string: "Eca109_Vec_Vehicle_Rep1"
at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source)
at java.base/jdk.internal.math.FloatingDecimal.parseFloat(Unknown Source)
at java.base/java.lang.Float.parseFloat(Unknown Source)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parseHasDesc(GctParser.java:215)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parse(GctParser.java:167)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser.parse(GctParser.java:117)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:159)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:129)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:746)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:799)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:787)
at org.gsea_msigdb.gsea/xtools.api.param.PobParam.getPob(PobParam.java:79)
at org.gsea_msigdb.gsea/xtools.api.param.DatasetReqdParam.getDataset(DatasetReqdParam.java:29)
at org.gsea_msigdb.gsea/xtools.gsea.Gsea.createHeader(Gsea.java:195)
at org.gsea_msigdb.gsea/xtools.gsea.Gsea.execute(Gsea.java:115)
at org.gsea_msigdb.gsea/xtools.api.AbstractTool.module_main(AbstractTool.java:417)
at org.gsea_msigdb.gsea/org.genepattern.modules.GseaWrapper.main(GseaWrapper.java:287)
at org.gsea_msigdb.gsea/xapps.gsea.CLI.main(CLI.java:29)
# of elements = 1
/data/home/000_index/024_GSEA_GeneSet/2v2.cls#Treat_versus_Ctrl 
2014     [INFO  ] - Begun importing: Dataset from: Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle_all.TMP.GeneName.gct
>Eca109_Vec_Vehicle_Rep1<
java.lang.NumberFormatException: For input string: "Eca109_Vec_Vehicle_Rep1"
at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source)
at java.base/jdk.internal.math.FloatingDecimal.parseFloat(Unknown Source)
at java.base/java.lang.Float.parseFloat(Unknown Source)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parseHasDesc(GctParser.java:215)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser._parse(GctParser.java:167)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.GctParser.parse(GctParser.java:117)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:159)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.readDatasetGct(ParserFactory.java:129)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:746)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:799)
at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:787)
at org.gsea_msigdb.gsea/xtools.api.param.PobParam.getPob(PobParam.java:79)
at org.gsea_msigdb.gsea/xtools.api.param.DatasetReqdParam.getDataset(DatasetReqdParam.java:29)
at org.gsea_msigdb.gsea/xtools.api.param.DatasetReqdParam.getDataset(DatasetReqdParam.java:33)
at org.gsea_msigdb.gsea/xtools.gsea.Gsea.execute(Gsea.java:143)
at org.gsea_msigdb.gsea/xtools.api.AbstractTool.module_main(AbstractTool.java:417)
at org.gsea_msigdb.gsea/org.genepattern.modules.GseaWrapper.main(GseaWrapper.java:287)
at org.gsea_msigdb.gsea/xapps.gsea.CLI.main(CLI.java:29)


============================

I tried to change the 
 -set_max 500 -set_min 5 
 -set_max 500 -set_min 100
 -set_max 5000 -set_min 5  
 -set_max 5000 -set_min 1000

it still showed the same error.


anybody can help me to solve it? Thanks
 










WANG Zifeng

unread,
Aug 26, 2021, 12:32:29 AM8/26/21
to gsea-help
the error report is:

2129     [INFO  ] - Renaming rpt dir on error to: 002_GSEA/error_Eca109_oep38_PFD400mM24h_vs_Eca109_Vec_Vehicle.hallmark.Gsea.1629951360812

Anthony Castanza

unread,
Aug 26, 2021, 2:28:45 PM8/26/21
to gsea...@googlegroups.com

Hello,

 

Based on the error message it appears to be an issue with the GCT parser when reading data from the Eca109_Vec_Vehicle_Rep1 sample.

Could you perhaps send us a screenshot of the GCT file open in something like Excel? Or a truncated version of the GCT plaintext file – we wouldn't need the information for all genes, just enough to debug why it's not able to parse the file for this specific condition.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/dff89eac-86ec-4773-a55c-d775b18e8b22n%40googlegroups.com.

David Eby

unread,
Aug 26, 2021, 3:01:56 PM8/26/21
to gsea...@googlegroups.com
HI Wang,

From the error message, the problem is that the string "Eca109_Vec_Vehicle_Rep1" appears in a field that should have a numeric value.  Check the file contents against the Data Formats page in our Wiki.

It's just a guess, but Eca109_Vec_Vehicle_Rep1 appears to be some sort of sample name.  It may be that the data matrix header line (the third line of the GCT format) has been somehow duplicated or misplaced within the file.  Try searching for all occurrences of that string to see if any appear where they should not.

Regards,

WANG Zifeng

unread,
Aug 26, 2021, 11:53:46 PM8/26/21
to gsea...@googlegroups.com
Dear Anthony and David, 

   Thanks for your reply.

    I run the R code to generate the gct files again (no change, totally the same with the previous code), and then run the GSEA again, it was successful!! too unexpected!


========
The R code to generate gct files is:
###  GSEA gct files

RNAcocktail.DEG <- read.table(paste0("001_DEG_GO-KEGG/",  pathname, "/", pathname, "_annotation.46.txt", sep=""), sep = '\t', header=F)
Description.col <- data.frame("DESCRIPTION"=c("DESCRIPTION", (rep("na",nrow(all.GeneName)-1))))
all.GeneName.gct <- cbind(all.GeneName[,1],Description.col[,1],all.GeneName[,2:ncol(all.GeneName)])
all.GeneName.gct <- as.data.frame(all.GeneName.gct)
t_all.GeneName.gct <- t(all.GeneName.gct)
t_all.GeneName.gct <- as.data.frame(t_all.GeneName.gct)

all.GeneName.gsea.gct <- subset(all.GeneName.gct, all.GeneName.gct$V1 %in% RNAcocktail.DEG$V10)

t_gsea_file <- t(all.GeneName.gsea.gct)
t_gsea_file <- as.data.frame(t_gsea_file)

t_gsea_file$'#1.2' <- c("#1.2", rep(NA,(nrow(t_gsea_file)-1)))
t_gsea_file$'geneNumber' <- c((ncol(t_gsea_file)-1), "4", rep(NA,4)) 
t_gsea_file$'geneName' <- t_all.GeneName.gct$V1  

t_gsea_file <- t_gsea_file[,c(which(colnames(t_gsea_file)=="#1.2"),which(colnames(t_gsea_file)== "geneNumber"), which(colnames(t_gsea_file)== "geneName"),1:(ncol(t_gsea_file)-3))]
test_pre <- t(t_gsea_file)

test_pre <- as.data.frame(test_pre)
write.table(test_pre, file=paste0("002_GSEA/", pathname, "_all.TMP.GeneName.gct"),quote = F, col.names = F,row.names = F, na = "", sep = "\t")

=========

The gct file format is:

image.png
image.png
22438 raws, the same with previous generated file!


I don't know what's wrong. Never the less, it is successful now.  Thanks for your replay again. 





Anthony Castanza <acas...@cloud.ucsd.edu> 于2021年8月27日周五 上午2:28写道:
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/Tqp0-0HcZBo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB760992C3F6CFB51FCAC06878F7C79%40SJ0PR05MB7609.namprd05.prod.outlook.com.

Anthony Castanza

unread,
Aug 27, 2021, 1:00:56 PM8/27/21
to gsea...@googlegroups.com

Hello,

 

That is very strange. Could you perhaps send us the old and new files as well as the CLS so we can dig in to what may have gone wrong in the first run? You can send them confidentially to gsea...@broadinstitute.org if you're willing.

If not, I'll consider this resolved.

Glad you got it working!

WANG Zifeng

unread,
Aug 28, 2021, 10:22:11 PM8/28/21
to gsea...@googlegroups.com
Dear Anthony, I overwrite the old files. So I can't send it to you. I re-run the program again, it still work well😂.   It's solved, thanks for your time.

Anthony Castanza <acas...@cloud.ucsd.edu> 于2021年8月28日周六 上午1:00写道:
Reply all
Reply to author
Forward
0 new messages