GSEA 1006 Error

54 views
Skip to first unread message

NAKIM

unread,
Apr 9, 2024, 1:27:55 AM4/9/24
to gsea-help
Dear GSEA help team, 

GSEA version 4.1.0 was able to run and generate result terms even with only one sample per group. However, after upgrading to GSEA version 4.3.2, an error 1006 is encountered.

I understand that the matrix I used is signal to noise, which can lead to a denominator of 0 when calculating statistics. However, I am curious why GSEA version 4.1.0 was able to produce values despite this.  

When there is one sample in group A and one sample in group B, I replicated the samples equally in each group, resulting in three samples in group A and three samples in group B. After that I conducted signal-to-noise analysis. However, It is my understanding that when the variance is zero, the results of the analysis should be undefined. However, I observed that the analysis was able to produce results even after unchecking the "fix metrics for low variance" option in the algorithms. This is unexpected and I would like to understand the underlying mechanism that allowed the analysis to proceed and generate results. 

I have reviewed the following links(https://groups.google.com/g/gsea-help/c/dEuYDMK9okQ) and understand that signal to noise may not be the most appropriate matrix for my analysis. However, I am still puzzled by the results that I obtained when using this matrix.

I am considering using GSEA version 4.3.2 to analyze a dataset with only one sample per group. To increase the sample size and potentially improve the statistical power of the analysis, I am wondering if it is acceptable to duplicate the single sample in each group three times, resulting in three samples per group.

I would appreciate it if you could explain it in more detail.  

Anthony Castanza

unread,
Apr 9, 2024, 1:49:15 PM4/9/24
to gsea...@googlegroups.com
Hi Nakim,

The latest version of GSEA has had numerous bugfixes since version 4.1.0, including to an issue where the minimum sample number requirements were not being respected which might have been the cause of the behavior you saw here. You should always use the latest version of GSEA whenever possible.

As to why the software is generating results when you have duplicated your samples such that they would be expected to have zero variance, but have unchecked the "fix metrics for low variance" option, I am not sure. It is possible that in this case of zero variance there is a logic bug that is bypassing this setting. I'll note this for investigation. You should be able to manually calculate the expected ranking metric using the signal to noise ratio formula with the expected correction factors from our documentation and see if this aligns with the values that the software is returning.

However, In regard to your other question, I would say that no, it is not appropriate to duplicate samples to bypass our dataset expectations checks, and that the results from a run where you have done so are not valid.
The correct way to run this data would be to use an alternative ranking metric that does not have these restrictions (such as ratio of classes, or diff of classes), or to externally compute a ranking metric of your choice and use GSEA Preranked. Also note that phenotype permutation is invalid for this data.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/762c36e1-0037-40dd-a2f2-95c15f33cfb4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages