Hi Dong,
The calculations for the GSEA metrics are described in the "GSEA Statistics" section of the GSEA User Guide: https://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html
And in the GSEA PNAS publication: https://www.gsea-msigdb.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf
But essentially, yes. GSEA computes Enrichment Scores using a weighted running sum statistic for each gene set in the real gene list and each of the permuted gene lists then compares the real score against the distributions of permuted scores.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/e29b3de5-cb50-4fc7-a3d6-7977b6ac3c68n%40googlegroups.com.
3.2.1. Gene Sampling
In gene sampling the significance of a gene set score S(Gi) for a given gene set Gi is assessed by comparing it to the scores of randomly assembled sets of ||Gi|| genes from the reference set U, i.e., all genes under study. In gene sampling method, a large number of random gene sets are assembled, and their scores are calculated. Then the significance value of the gene set score of Gi is calculated as the fraction of assembled gene sets that lead to stronger scores than the score of Gi, where a score in comparison to another is considered stronger if it is more in favor of rejecting the null hypothesis of interest.
Since gene sampling does not depend on the number of samples, it has been widely used for gene set analysis of datasets with small sample sizes (Subramanian et al., 2005; Tian et al., 2005; Ackermann and Strimmer, 2009). The main shortcoming of gene sampling is that it relies on the unrealistic assumption of independence between genes within a gene set. Usually genes within a gene set show a highly correlated behavior; therefore, a gene sampling method may incorrectly predict a gene set as differentially enriched only because of high correlation between its genes. In this regard, it may cause false positive predictions. Another shortcoming of gene sampling is being computationally demanding. For each gene set Gi, the whole process of gene set score calculation should be repeated for a large number of randomly assembled gene sets. In implementations of the gene-sampling approach, usually the number of assembled gene sets is an order of magnitude of 1,000. This number of repetitions makes the significance evaluation computationally demanding. Moreover, gene sampling may lead to a lack of statistical reliability of the significance values for large gene sets (Keller et al., 2007). Even using an order of magnitude of 1,000 assembled gene sets may not be enough to represent the background distribution; therefore, the significance value for large gene sets may not be statistically reliable.
From <https://www.frontiersin.org/articles/10.3389/fgene.2020.00654/full>
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/e34a2600-8f50-46cf-80bc-9fd8a0cc66ebn%40googlegroups.com.