GSEA analysis for one gene against datasets of another one

Hicham Hboub

unread,

Apr 26, 2022, 7:57:45 AM4/26/22

to gsea-help

Hello,

I want to compare the expression of only one gene (gene A)against mutiple data sets (pathways) of another gene (gene B).

I download data sets of gene B from GSEA website and I created gene expression file and phenotype file for gene A with 67 samples, then I uploaded data in GSEA software but it won't work and I got this error:

numMarkers: 100 cannot be larger than dataset size: 1

Anthony Castanza

unread,

Apr 26, 2022, 12:25:01 PM4/26/22

to gsea...@googlegroups.com

Hello,

GSEA is designed to assess perturbation of gene sets (pathways) using the expression data from all expressed genes, you can't provide expression data for just a single gene to GSEA. With a single gene all you could find out is if those sets contain the single gene of interest.

Perhaps if you tell me more about what exactly you're hoping to learn from this analysis I could suggest an appropriate way forward here.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/3e8374b9-ab79-49c2-86ef-2160a7d0036cn%40googlegroups.com.

Hicham Hboub

unread,

Apr 26, 2022, 6:34:24 PM4/26/22

to gsea-help

What I'm looking for excatly is what they call "Single-Gene Gene Set Enrichement Analysis" which could be performed using GSEA software, in this case GSEA is performed to a single-gene in different pathways of another gene, you could check this article (Methods and Results of Single-Gene Gene Set Enrichement Analysis) to understand my point:

Frontiers | Identification of Early Diagnostic and Prognostic Biomarkers via WGCNA in Stomach Adenocarcinoma | Oncology (frontiersin.org)

(*Also I solved the error, but now I got a figure but without any result)
I still sure there's a way to got this analysis done.

Anthony Castanza

unread,

Apr 26, 2022, 6:44:08 PM4/26/22

to gsea...@googlegroups.com

The "single-gene GSEA" they describe in this paper seems to be a misnomer for what they've actually done. A true "single gene" GSEA is a nonsensical procedure.

From the methods:

We used GSEA v_4.1.0 software [sic] to perform single-gene pathway enrichment analysis on the expression matrix containing 344 STAD tumor samples. The median of gene expression [of the gene of interest] is used as the standard for dividing high and low expression groups.

So what they actually did is that they took their gene of interest and stratified the samples into two groups, those highly expressing their single gene of interest and those lowly expressing their single gene of interest, then performed a standard GSEA with the full expression data.

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/eae9691c-0da8-404b-a7ae-05c48a6cb109n%40googlegroups.com.

Message has been deleted

Hicham Hboub

unread,

Apr 28, 2022, 7:28:11 AM4/28/22

to gsea-help

Could you more explain to me why it's nonsensical procedure !!!

it seems to me more intersting as it make us specify our targets in less number of genes (for every analysis one gene) for hub genes so we could more verify thiers enrichement in certain pathways !

also are they divided samples of Tumor without using normal samples ! is that make any sense !

and thanks for clearifying the idea

On Thursday, April 28, 2022 at 11:20:19 AM UTC Hicham Hboub wrote:

Could you more explain to me why it's nonsensical procedure !!!
it seems to me more intersting as it make us specify our targets in less number of genes (for every analysis one gene) for hub genes so we could more verify thiers enrichement in certain pathways !
and thanks for clearifying the idea

Anthony Castanza

unread,

Apr 28, 2022, 1:16:48 PM4/28/22

to gsea...@googlegroups.com

GSEA is gene set enrichment analysis. It looks of the enrichment of a set of genes in a ranked list of genes to determine if that set is overrepresented at the top of the list or the bottom of a list. Running one gene does not make sense as there isn't any information on the global context of genes to calculate this overrepresentation against, furthermore, there isn't any information on the other genes that make up the "set" that you're trying to calculate overrepresentation for. You couldn't analyze 'hub" genes in this way because, at least as far as GSEA would be concerned, there isn't anything to be a hub of. I would encourage you to review the principles of the GSEA procedure as described in the original publication: https://www.pnas.org/doi/10.1073/pnas.0506580102

What might give you something closer to what you're looking for is to run GSEA in accordance with it's intent – the complete ranking of all expressed genes and a standard pathway database. And then to perform leading edge analysis and identify any pathways that contain your genes of interest in the leading edge compoenent (the component of the gene set most strongly contributing to the enrichment score).

As to the other study; what they were interested in is the impact that their gene of interest made on pathway enrichment they did this by stratifying otherwise equivalent tumor samples (i.e. samples that were otherwise equivalent) by the expression of their gene of interest to determine the impact that stratification made on calculated enrichment of the pathways. I.e. they can determine if, say, High PTEN tumors have more or less MTOR signaling than Low PTEN tumors. You don't inhernently need a "normal tissue" background for this.

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/5080a259-1d3b-4434-8d5a-fd8b22608282n%40googlegroups.com.

Reply all

Reply to author

Forward