Information about Parametric usage

32 views
Skip to first unread message

Adithi GR

unread,
Mar 5, 2026, 4:56:43 AMMar 5
to gsea-help
Dear GSEA team,

Thank you for helping with my doubts. I wanted to know if there are any information about the basic fields and where can I find them. I wanted to optimise my results but I'm unable to figure out when to use what.

What I mean is: 
which enrichment statistic to use, which metric to use for ranking genes, gene list sorting mode, gene list ordering mode all of that.

If there is any link with this information, it will be helpful.

Thank you

Anthony Castanza

unread,
Mar 6, 2026, 11:45:45 AMMar 6
to gsea-help
We offer the GSEA User guide here: https://docs.gsea-msigdb.org/#GSEA/GSEA_User_Guide/
Generally, the most common parameter that people adjust is the permutation type, which requires being set to the "gene_set" method for datasets with small numbers of samples (<7 per phenotype). Beyond that, the defaults are generally correct for most purposes.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Adithi GR

unread,
Mar 9, 2026, 1:33:53 AMMar 9
to gsea-help
I'm currently using the TPM data directly. I think I should perhaps do normalisation is there a specific normalisation that you would recommend.

Please let me know.

Thanks,
Adithi

Anthony Castanza

unread,
Mar 11, 2026, 7:25:13 PMMar 11
to gsea-help
Hi Adithi,

For standard GSEA (e.g. not single-sample GSEA) we generally recommend normalized counts (such as what you can output from DESeq2's "median-of-ratios" method), this is generally a more appropriate method for between-sample comparisons than TPM which is best for comparing relative expression of genes within a sample.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Adithi GR

unread,
Mar 25, 2026, 6:07:31 PM (7 days ago) Mar 25
to gsea-help
Thank you Anthony. As you have mentioned and as the information on the website I tried using the Deseq2 normalised counts for GSEA however no matter what I do I'm unable to get any data in FDR 25% section using the permutation phenotype. 
I have removed the gene rows with less expression. This is also with a particular condition called stable (12 samples with replicates of clones) and not with the other condition named unstable (10 samples with replicate of clone data). 
For more context, I have labelled the clone data I have as stable or unstable based on a parameter and then I'm trying to run GSEA. I have RNASeq salmon files for these clones which are in replicates.
Please do help or suggest me something that can help improve my data.

This is the code I'm currently using to normalise the data.

#libraries

install_if_missing <- function(packages) {
  if (length(setdiff(packages, rownames(installed.packages()))) >0) {
    install.packages(setdiff(packages, rownames(install.packages())))
  }
}

#libraries

library(tximport)
library(dplyr)
library(ggplot2)
library(DESeq2)
library(readxl)
library(readr)

files <- list.files(path = "path", pattern = ".sf", full.names = TRUE, recursive = TRUE)

sample_names <- basename(files) %>% gsub(".sf", "", .)


input_path <- "path"
tx2gene <- read_excel(input_path)
head(tx2gene)
txi <- tximport(
  files,
  type = "salmon",
  tx2gene = tx2gene,
)


#creating metadata and condition data
meta<- data.frame(condition = c("unstable","unstable", "unstable", "unstable", "stable", "stable", "stable", "stable",
                                "unstable", "unstable", "unstable", "unstable", "unstable", "unstable", "unstable", "unstable",
                                "unstable", "unstable", "unstable", "unstable", "stable", "stable", "unstable", "unstable", "unstable", "unstable"
                                ))
colnames(txi$counts) <- sample_names
rownames(meta)<- colnames(txi$counts)

meta

#creating normalised counts using deseq2

dds <- DESeqDataSetFromTximport(txi, colData = meta, design = ~ condition)

#perform DESeq2 analysis (this normalises the data)
dds <- DESeq(dds)


#Get the normalised counts
normalized_counts <- counts(dds, normalized = TRUE)
colnames(normalized_counts) <- sample_names

# Now view it
head(normalized_counts)
print(normalized_counts)


write.csv(normalized_counts, file = "normalized_counts.csv", row.names = TRUE)



Anthony Castanza

unread,
Mar 25, 2026, 6:17:16 PM (7 days ago) Mar 25
to gsea-help
Have you done any other kind of analysis, like a PCA to try to get an idea how strong the signal is in your dataset? Is there clear separation of your two phenotype groups?What about the actual deseq2 results themselves, were there many significant genes?
What MSigDB collections are you using?

If you have a dataset with low power and are running a lot of gene sets, particularly combining multiple collections, you can run into situations like this.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gsea-help/1335117f-49f1-47f9-93a1-cdc4f421ede4n%40googlegroups.com.

Adithi GR

unread,
Mar 31, 2026, 11:03:48 AM (yesterday) Mar 31
to gsea-help
I have not done a PCA or any other analysis. Should I do it to see how the two conditions and how they are segregated?
I was able to see significant results from Deseq2 around 50 to 60 genes if the log2fc was>1 and padj <0.05. I didn't use the log2fold ratio as the input for GSEA. I used the normalised counts from Differential analysis.
I was using the gmt file that was given to me from KEGG for chinese hamster. I have used all the other mouse related datasets from MSigDB as well and have seen the same issue.
I see that there are datasets that are upregulated and some are in p nominal value <1 but none of them are in the threshold of FDR<25%.
Reply all
Reply to author
Forward
0 new messages