The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub ...

171 views
Skip to first unread message

Komal Kharat

unread,
Jul 9, 2021, 5:36:53 AM7/9/21
to gsea-help
Hello 
I am trying to analyze human gene sets for their expression in tumor. I am facing an error saying "The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub
 ..."
Image of the .cls file

cls file.PNG

Image of the .gct file

gct file.PNG

Image of the error

error file.PNG

Could someone please help me understand what could be causing this error? and how could I successfully analyze my data. 

Anthony Castanza

unread,
Jul 9, 2021, 10:36:24 AM7/9/21
to gsea...@googlegroups.com
Hi Komal,

That GCT file appears to only have one value precomputed across all samples. In this case you're going to want to use GSEA Preranked not standard GSEA. And won't need a CLS, since that's only used if you have GSEA compute the ranking metric.

You also don't mention which CHIP file you're using but the correct one would be the Human Gene Symbol Remapping chip for whichever version of MSigDB you're using (probably 7.4 since it's the current one)

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/9b8b5f96-0328-40db-b4b3-04b42f45ed3dn%40googlegroups.com.

Komal Kharat

unread,
Jul 12, 2021, 5:47:56 AM7/12/21
to gsea...@googlegroups.com
Hello..
Thank you for your help, 
I tried running my data using the Preranked GSEA but again faced an error.. 

image.png

.rnk file image
image.png

The error faced is as follows..

image.png
What shall I do now?

Anthony Castanza

unread,
Jul 12, 2021, 11:44:01 AM7/12/21
to gsea...@googlegroups.com
Hi Komal,

Per the error message, there are multiple symbols that map to the same gene in this dataset. You've set GSEA to Remap_only mode which just performs a simple symbol conversion and doesn't handle multiple mappings. In order to handle multiple mappings you'd need to switch it to "collapse" mode. By default, setting it to collapse will cause it to pick the option with the largest fold change.



--

Komal Kharat

unread,
Jul 15, 2021, 5:12:52 AM7/15/21
to gsea...@googlegroups.com
Hello, 
I tried running the dataset using collapse mode as suggested by you but there is still another error saying, "After pruning, none of the gene sets passed size thresholds." 

Image of the parameters used:
image.png

Image of the error:
image.png

Please let me know how I can make this work:(

Anthony Castanza

unread,
Jul 15, 2021, 1:11:21 PM7/15/21
to gsea...@googlegroups.com

Hi Komal,

 

It's possible this is an issue with the gene set itself. This isn't one of our sets so I don't know anything about the format of the genes in it, or how many of them there are. If you want to send the B520C.gmt file I can take a look at it, or even just a sample of the IDs and some indication of how big the sets would be helpful in diagnosing this further. Since everything else appears to be fine for a "typical" experiment from the information you've sent so far.

 

If I had to take a blind guess, it might be that the custom signatures you're using have more than 500 members. Our default threshold here is pretty conservative and many experimentally derived sets can be larger than this depending on how they were generated. You could try increasing "Max size: exclude larger sets" to 2000, that's about the largest size we'd possibly recommend.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

 

From: gsea...@googlegroups.com <gsea...@googlegroups.com> on behalf of Komal Kharat <komalkha...@gmail.com>
Date: Thursday, July 15, 2021 at 2:12 AM
To: gsea...@googlegroups.com <gsea...@googlegroups.com>
Subject: Re: [gsea-help] The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub ...

Hello, 

I tried running the dataset using collapse mode as suggested by you but there is still another error saying, "After pruning, none of the gene sets passed size thresholds." 

 

Image of the parameters used:

 

Image of the error:

 

Please let me know how I can make this work:(

On Mon, 12 Jul 2021 at 21:14, Anthony Castanza <acas...@cloud.ucsd.edu> wrote:

Hi Komal,

 

Per the error message, there are multiple symbols that map to the same gene in this dataset. You've set GSEA to Remap_only mode which just performs a simple symbol conversion and doesn't handle multiple mappings. In order to handle multiple mappings you'd need to switch it to "collapse" mode. By default, setting it to collapse will cause it to pick the option with the largest fold change.

 

On Mon, Jul 12, 2021 at 2:47 AM Komal Kharat <komalkha...@gmail.com> wrote:

Hello..

Thank you for your help, 

I tried running my data using the Preranked GSEA but again faced an error.. 

 

 

.rnk file image

 

The error faced is as follows..

 

What shall I do now?

On Fri, 9 Jul 2021 at 20:06, Anthony Castanza <acas...@cloud.ucsd.edu> wrote:

Hi Komal,

 

That GCT file appears to only have one value precomputed across all samples. In this case you're going to want to use GSEA Preranked not standard GSEA. And won't need a CLS, since that's only used if you have GSEA compute the ranking metric.

 

You also don't mention which CHIP file you're using but the correct one would be the Human Gene Symbol Remapping chip for whichever version of MSigDB you're using (probably 7.4 since it's the current one)

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

 

On Fri, Jul 9, 2021, 2:36 AM Komal Kharat <komalkha...@gmail.com> wrote:

Hello 

I am trying to analyze human gene sets for their expression in tumor. I am facing an error saying "The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub

 ..."

Image of the .cls file

 

 

Image of the .gct file

 

 

Image of the error

 

Komal Kharat

unread,
Jul 16, 2021, 1:33:28 AM7/16/21
to gsea...@googlegroups.com
Hello, 
I have solved the issue yesterday, I could run the GSEA successfully for my dataset. 
I adjusted the maximum size limit according to the number of genes present in my .gmt file (8000) and it worked! 
Thank you, for helping me out through this process. 
Regards.

Reply all
Reply to author
Forward
0 new messages