Error 1005 - The collapsed dataset was empty when used with chip

1,161 views
Skip to first unread message

RK

unread,
Apr 6, 2021, 2:44:36 PM4/6/21
to gsea-help
Hi there - 
I am using a normalized count table (generated with DESeq) from human NGS samples. The first column is ENSEMBL ID, without the version appendix (e.g., ".15"). Therefore, I think I need to Collapse by gene, using Human_ENSEMBL_Gene_ID_MSigDB.v7.4.chip as my Chip platform. However, I am constantly met with the above error.

What am I missing?

Thank you

Anthony Castanza

unread,
Apr 6, 2021, 2:48:42 PM4/6/21
to gsea...@googlegroups.com

Hello,

 

Yes, that should be the correct chip file. Are you setting the “Collapse/Remap to gene symbols” parameter to “collapse”?

For count data, I’d also recommend expanding the “Advanced fields” section and setting the “Collapsing mode for probe sets =>1 gene parameter” to to “sum_of_probes” instead of the default “max_probe”.

 

If you’ve set the collapse/remap field to “Collapse” and you’re still seeing issues, could you please send a screenshot of your Run GSEA window, as well as an example from the first few lines of your dataset (including the header if using the GCT format).

 

Thanks,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/3dec5fee-c438-4bfd-b5e9-2772c709cb0fn%40googlegroups.com.

RK

unread,
Apr 6, 2021, 2:54:27 PM4/6/21
to gsea-help
Hi Anthony - 
Yes, I have it set to Collapse. I changed the collapsing mode parameter as recommended. Unfortunately, I still get error 1005. Please see attached.

Screen Shot 2021-04-06 at 1.53.34 PM.png
Screen Shot 2021-04-06 at 1.53.13 PM.png

Anthony Castanza

unread,
Apr 6, 2021, 3:02:30 PM4/6/21
to gsea...@googlegroups.com

Hi,

 

The screenshot of the responder.count file you send doesn’t match our file format guidelines. Please check your data file to confirm that it complies with either our GCT or TXT format specifications:

https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29

https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#TXT:_Text_file_format_for_expression_dataset_.28.2A.txt.29

 

If you’re having difficulty formatting your files, or if you’re still having errors after validating your file formatting, you can send the data confidentially to gsea...@broadinstitute.org and I can take a closer look what might be going wrong here.

RK

unread,
Apr 6, 2021, 3:05:08 PM4/6/21
to gsea-help
The table is outputted in txt format. Am I permitted to use ENSEMBL ID as the first column? The first row is indeed the sample ID, and the optional Description column is not present. What am I missing? 

Anthony Castanza

unread,
Apr 6, 2021, 3:08:29 PM4/6/21
to gsea...@googlegroups.com

Ensembl ID is the recommended ID type for RNA-seq data. Is the ID column given the required “NAME” header? In the screenshot you sent there was no entry in the cell where NAME is expected.

In addition to confirming that the header complies with the specification, you might also try adding the DESCRIPTION column (and filling the descriptions with “na”).

RK

unread,
Apr 6, 2021, 3:10:07 PM4/6/21
to gsea-help
After reviewing the link you provided, I did add "NAME" to the first column (ENSEMBL ID), as well as DESCRIPTION with "na" values for the second column. I still get the error, unfortunately...

Anthony Castanza

unread,
Apr 6, 2021, 3:20:10 PM4/6/21
to gsea...@googlegroups.com

Hello,

 

I’m not sure what might be going on here. Something is probably malformatted somewhere. It could be something as simple as a trailing space, or the file being written out from Excel with quote marks around the gene symbols. Instead of opening the file in Excel, could you open the exact file you’re importing into GSEA in a plain text editor (like notepad) and then sending a screenshot?

 

If we still can’t identify the problem from there, I don’t think I’ll be able to figure out precisely what the is problem without being able to take a closer look at the files. You can provide your data file to gsea...@broadinstitute.org so we can take a closer look and try to figure it out. We’ll keep the file completely confidential and only use it for debugging.

Russell Keathley

unread,
Apr 6, 2021, 3:25:08 PM4/6/21
to gsea...@googlegroups.com
I was able to get it to work by restarting the program...thanks!



--
Russell Keathley
PhD candidate
Daniela Matei Lab
Feinberg School of Medicine
Northwestern University
303 E Superior St, Lurie 4-220

Reply all
Reply to author
Forward
0 new messages