help regarding input

mehwish wahid

unread,

May 22, 2023, 1:33:24 PM5/22/23

to gsea-help

Hello ,

I am doing RNAseq analysis , while running GSEA analysis I wanted to ask that on internet i have seen that the input file containing
log2FoldChange
lfcSE
stat
pvalue
padj etc of a defined condition (treatedAsample vs untreated Asample) (implemented in R) can also be used .

my question is

then what information phenotype file will contain ?

I have done GSEA using the normalized expression values in different samples but my results are not matching the results previous results generated by my lab .(I have to reproduce the work).

Can you please guide me

Thankyou

Mehwish

mehwish wahid

unread,

May 22, 2023, 2:24:42 PM5/22/23

to gsea-help

my question is if we are using the these input value

log2FoldChange
lfcSE
stat
pvalue
padj

What does the phenotype file contains , as in the input file the samples are not present rather , the above mention values for each gene is present in the specified condition of samples

Castanza, Anthony

unread,

May 22, 2023, 3:17:42 PM5/22/23

to gsea...@googlegroups.com

Hi Mehwish,

For GSEA generally we recommend an expression file containing the full normalized read counts for all expressed genes for all samples. Then the phenotype file contains the mapping of sample to the phenotypes being studied. See the file spec’s in our Data formats wiki: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

The file you’ve described seems to be the results of a differential expression analysis, you can take one of the metrics from that file and use it for GSEA Preranked, but that is generally considered an “advanced” analysis that we don’t really provide much direct support for.

Another factor that can be influencing your results could be using a different version of MSigDB than was used in the original analysis. My advice is to contact the person who did the original analysis and get as much detail about what they did as possible (ideally the full results files as there should be some indications of specific parameters used). Do be aware though that due to the random nature of the null distribution generation GSEA results are expected to vary slightly from run-to-run. This can only be “fixed” by supplying the exact random seed as was used in the previous run.

If you get the specific details about the previous run that you’re trying to replicate I might be able to provide more detailed advice

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/7f675d11-bf2e-4f81-82cb-513d155f5eeen%40googlegroups.com.

Reply all

Reply to author

Forward