Error: col:129 > matrix's fColCnt:129

478 views
Skip to first unread message

Mohan Liu

unread,
Mar 17, 2022, 3:09:27 PM3/17/22
to gsea-help
Hi,

After loading the .txt and .cls files, I ran into this error. 

---- Full Error Message ----
col:129 > matrix's fColCnt:129

---- Stack Trace ----
# of exceptions: 1
------col:129 > matrix's fColCnt:129------
java.lang.ArrayIndexOutOfBoundsException: col:129 > matrix's fColCnt:129
        at org.gsea_msigdb.gsea/edu.mit.broad.genome.math.Matrix.getColumnV(Matrix.java:261)
        at org.gsea_msigdb.gsea/edu.mit.broad.genome.objects.DefaultDataset.getColumn(DefaultDataset.java:291)
        at org.gsea_msigdb.gsea/edu.mit.broad.genome.objects.TemplateFactory.extract(TemplateFactory.java:95)
        at org.gsea_msigdb.gsea/edu.mit.broad.genome.alg.DatasetGenerators.extract(DatasetGenerators.java:279)
        at org.gsea_msigdb.gsea/edu.mit.broad.genome.alg.DatasetGenerators.extract(DatasetGenerators.java:275)
        at org.gsea_msigdb.gsea/xtools.gsea.AbstractGsea2Tool.execute_one(AbstractGsea2Tool.java:79)
        at org.gsea_msigdb.gsea/xtools.gsea.AbstractGsea2Tool.execute_one_with_reporting(AbstractGsea2Tool.java:103)
        at org.gsea_msigdb.gsea/xtools.gsea.Gsea.execute(Gsea.java:165)
        at org.gsea_msigdb.gsea/edu.mit.broad.xbench.tui.TaskManager$ToolRunnable.run(TaskManager.java:389)
        at java.base/java.lang.Thread.run(Unknown Source)


I have 130 samples, and the first few lines of the .cls file looks correct to me: 

130 2 1                                                                                                                                                 #untreated treated                                                                                                                             
T T T T U U U U U T T U T T U U T T T T T T U T T T T T U T U U U U U U U T U U U T T U T U T T U T U U U T T T U T U T T T U U T T U U T T T T U T T T T T U T U U U U U T U U U U U T U T T T T T U U U U T U U T T U T U T T T U U T U U T T U U T U U T T T T U

and the dataset does have 130 samples. I'm not sure how to fix this issue...

Thank you for your help!


Best, 

Mohan


Anthony Castanza

unread,
Mar 17, 2022, 3:15:47 PM3/17/22
to gsea...@googlegroups.com

Hello,

 

The error message you received, "col:X> matrix's fColCnt:X" is usually associated with there being more columns somewhere in the dataset than are defined by the dataset header. This can be caused by, for example, excel adding hidden empty columns. I'd recommend opening the dataset in a plain text editor and double checking for any extra tabs at the end of each row. I'd also suggest double checking the txt specification here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#TXT:_Text_file_format_for_expression_dataset_.28.2A.txt.29

 

If you're still having problems, you can share the dataset confidentially to gsea...@broadinstitute.org if you're willing to and we can take a look and see what might be going on here.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a2c1a74c-800b-46ac-a337-1eef5950b52an%40googlegroups.com.

Message has been deleted
Message has been deleted

Mohan Liu

unread,
Mar 18, 2022, 12:08:00 PM3/18/22
to gsea-help
Hi,

Thank you for your suggestion. I was able to fix this error.
Then I ran into another error: "None of the gene sets that you specified passed the size threshold".
The data was downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39671, which used Affymetrix Human Genome U133 Plus 2.0 Array, so I downloaded the geneset table https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&is_datatable=true&acc=GPL570&id=55999&db=GeoDb_blob143 which should contain all the genes...
The IDs seem to match i.e. probes. And I have changed the Max size to 54675, but the error retains.

Could you please provide suggestion on this?

Anthony Castanza

unread,
Mar 18, 2022, 2:09:48 PM3/18/22
to gsea...@googlegroups.com

Hi Mohan,

 

If your dataset is using the identifiers in the first column of that probe-gene map file, then you'll need to set the "Collapse/Remap dataset" parameter to "Collapse" and select one of the MSigDB provided CHIP files from the "Chip platform" field. Based on the identifiers in the list, you should select the "Human_AFFY_HG_U133" chip.

 

Let me know if you have any further issues,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

From: 'Mohan Liu' via gsea-help <gsea...@googlegroups.com>
Date: Friday, March 18, 2022 at 7:23 AM
To: gsea-help <gsea...@googlegroups.com>
Subject: Re: [gsea-help] Error: col:129 > matrix's fColCnt:129

Hi,

 

Thank you for your suggestion! I was able to fix this error.

Now I ran into another error: "None of the gene sets that you specified passed the size threshold."

Apparently the data contains all the genes listed in http://www.sciwhylab.org/gigeasa/human/probe-gene-map.txt.

And I couldn't find a geneset that contains all of them at http://www.gsea-msigdb.org/gsea/msigdb/search.jsp

Could you please provide any suggestion on this?

 

Thank you so much!

 

 

Best regards,

 

Mohan

On Thursday, March 17, 2022 at 3:15:47 PM UTC-4 Anthony Castanza wrote:

Mohan Liu

unread,
Mar 18, 2022, 2:28:54 PM3/18/22
to gsea-help
Hi Anthony,

I have changed the "Collapse/Remap to gene symbols" to "Collapse", and set the "Chip platform" to "Human_AFFY_HG_U133_MSigDB.v7.5.1.chip".
The error persists.

Thank you for your advice!


Best regards,

Mohan

Anthony Castanza

unread,
Mar 18, 2022, 2:32:49 PM3/18/22
to gsea...@googlegroups.com

Hi Mohan,

 

That is quite odd. How many genes are in your input data file? GSEA generally expects all expressed genes to be provided. If the dataset contains >10000 genes, then its perhaps there is some encoding issue with the file, like quoted columns or similar. Perhaps you could share a screenshot of the file opened in a text editor? Or we have the gsea...@broadinstitute.org address you could send the data file confidentially to.

Mohan Liu

unread,
Mar 18, 2022, 2:54:39 PM3/18/22
to gsea-help
Hi,

There are 54675 genes in the input data file. 
Attached are the screenshots of the input data file and geneset.gmt file in .txt

Thank you!


Mohan

Screen Shot 2022-03-18 at 2.53.38 PM.png
Screen Shot 2022-03-18 at 2.52.28 PM.png

Anthony Castanza

unread,
Mar 18, 2022, 3:09:53 PM3/18/22
to gsea-help

Hi Mohan,

Oh, I didn't realize you were using a custom gene set database file. If that database was retrieved from the same source as the probe-gene-map.txt file that you previously linked, you can use the file I've attached as the chip file instead, I've just taken the probe-gene-map.txt file and done some minor processing to make it comply with the CHIP format specification https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#CHIP:_Chip_file_format_.28.2A.chip.29

If the gene sets gmt you're using is not from the same source as the CHIP file, then what you'd need to do is manually reprocess the gene sets in the GMT to match the gene symbols used in either this chip file, which you'd then use for collapsing the dataset, or the MSigDB chip files and use the MSigDB chip file for collapsing the dataset.

I can't really judge from the screenshot what you'd need to do for the GMT because the formatting appears odd to me. Are you sure that is a GMT formatted gene sets file?
https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29

probe-gene-map.chip

Mohan Liu

unread,
Mar 21, 2022, 9:11:39 AM3/21/22
to gsea-help
Hi Anthony,

Thank you for your advice and the attached chip file.

The geneset gmt is not from the same source as the probe-gene-map.txt file, but following the format of the .chip you sent me, I was able to create a chip file using the same source as the geneset gmt. Now both the geneset gmt and chip file are downloaded from the table at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570 (as the table in the attached screenshot), which linked to the dataset I have downloaded (and normalized using RMA)

For the gmt file, I didn't do any modification except changing the extension to .gmt (as the screenshot of the geneset.gmt (in .xls view)). 
And for the .chip file, I realize there are "---" for the "Gene symbol" for some probes (as the attached chip file).
I'm not sure if either of these has caused the same error, and your opinion would be very much appreciated!

Thank you so much for your efforts and time.


Best,

Mohan 
Screen Shot 2022-03-21 at 9.00.25 AM.png
Geneset.gmt.png
GSE39671.chip

Anthony Castanza

unread,
Mar 21, 2022, 10:50:25 AM3/21/22
to gsea...@googlegroups.com

Hi Mohan,

 

This file isn't a gene set GMT, this is the Affymetrix probe mapping information table (similar to a chip file, but not with current gene annotations). A gene set gmt contains the collections of biological pathways or signatures that you want to evaluate using the GSEA methodology. MSigDB provides a number of these resources bundled with the GSEA application, but if you use one of them, we'd recommend using one of our provided CHIP files to do the gene mapping as well ( I believe I mentioned in my initial reply that your data matches the "Human_AFFY_HG_U133" chip).

 

That file does have gene ontology annotations associated with the probe to gene mappings though, if that is what you're interested in obtaining enrichment for, we do have the gene ontology database in the provided MSigDB gene sets database files, these start with "c4.go." in the file list, c5.go.v7.5.1.symbols.gmt contains the ontology annotation for GO Biological Process, GO Molecular Function, and GO Cellular Component all together, and then the c5.go.bp c5.go.mf, and c5.go.cc contain each of these subcomponents of GO separately if you were only interested in a specific part of GO.

 

Let me know if you have additional questions,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

From: 'Mohan Liu' via gsea-help <gsea...@googlegroups.com>
Date: Monday, March 21, 2022 at 6:11 AM
To: gsea-help <gsea...@googlegroups.com>
Subject: Re: [gsea-help] Error: col:129 > matrix's fColCnt:129

--

You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.

Mohan Liu

unread,
Mar 21, 2022, 11:19:02 AM3/21/22
to gsea-help
Hi Anthony!

Problem solved! THANK YOU GENIUS!


Best regards,

Mohan

Reply all
Reply to author
Forward
0 new messages