Hi Sarah,
Data doesn't need to be normalized with DESeq to be appropriate for GSEA, the key is that it needs to be normalized in such a way that between-sample comparisons can be performed, this is the case for metrics like median-of-ratios normalized counts for RNA seq, but not the case for something like TPM which is normalized for within-sample comparisons.
Based on the information in this paper about the SOMAscan assay (https://www.nature.com/articles/s41598-017-14755-5), assuming similar methods were followed for normalization, the intensities should be comparable. That said, it would be worthwhile to reach out to whoever did the data normalization and confirm that it is appropriate for differential expression analysis as-is.
The larger issue is the number of proteins assayed. GSEA is designed to run on expression data for all genes datasets with only 1,000-2,000 genes will frequently not have enough information available to accurately assess gene sets. As such we don't really recommend running GSEA on this sort of data.
GSEA doesn't use separate sets as a background, while there are some gene sets such as housekeeping genes that can be used as a reference for a set that should not be differentially expressed, generally GSEA works through an empirical null distribution model where either samples or genes are randomly permuted and random enrichment scores for sets that shouldn't be enriched are calculated to determine how likely an observed enrichment is. The background used for this calculation is the input gene list, if a gene is not in dataset you provide it is excluded from all gene sets and calculations. This causes problems when so many genes are unavailable for sets that the set is no longer a meaningful representation of the annotation.
Unfortunately I can't really say if this assay is appropriate because I don't know enough about how genes were selected for inclusion, if it was designed in such a way as to try to get an unbiased sampling of the genome, it may be possible to get some meaningful enrichment results, but you would need to pay close attention to how many genes were in the set originally, and how many are in the set after background filtering. This information is made available on the GSEA enrichment report page if you want to give it a try anyway.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/1e6569e2-d930-480f-8b0a-5107be0ef659n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/8a58fb23-a790-422e-a2cc-ea2a4a7c68b5n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/27728295-164a-4823-aad8-59c5222896abn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/cca1c288-b5c6-4bd0-93a9-adb5037d163cn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/2537608e-735d-4603-ae8f-02abeaf4486en%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/11465fb9-d08b-468b-afa2-08506f1e14b4n%40googlegroups.com.
For RNA-seq the data should generally be normalized for between-sample comparisons, i.e. the median-of-ratios method from DESeq2 (can be retrieved by exporting the counts(dds, normalized=TRUE) table) or the TMM method, or some other appropriate normalization. FPKM/RPKM/TPM are not appropriate methods of normalization for standard GSEA.
GSEA Preranked is provided as an option for where someone might have computed statistics externally on, for example, a complicated experiment that needed particular confounding variable correction, and they want to run GSEA on the results. Or if someone just has a Log2FC list from a small experiment that we can't support (GSEA's default statistics require a minimum of 3 samples per group, so if you wanted to do a 1v1 or 2v2 comparison you'd need to rank it yourself and then provide that ranking to GSEA Preranked). It's mainly a mode we make available to provide greater flexibility to our users.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/3a9cefe4-d845-4adc-b082-e2e7ecce9c86n%40googlegroups.com.