Hello,
RPKM is not generally considered to be appropriate for between-sample comparisons. There is a pretty good review article on the topic available here: https://rnajournal.cshlp.org/content/26/8/903.full.pdf
What I would recommend is to go back to whoever provided the RNA-seq dataset and request either the raw counts or the counts normalized by some appropriate method such as DESeq2's median-of-ratios, or TMM or similar. If they provide the raw counts, most implementations of DESeq2 through publicly accessible tools like those on GenePattern.org or usegalaxy.org provide the option to produce the "normalized counts" output. That normalized counts output would be appropriate for GSEA. Additionally, since you only have three samples per group, you would need to ensure that you're running GSEA in "gene set" permutation mode, rather than the default "phenotype" permutation mode.
As to the relative lack of difference in the heatmaps associated with the enrichment, it's difficult to say what is causing this without the proper normalization being used. It could be GSEA detecting a relatively weak signal which is something it is fairly well optimized for. That large red block in the middle is a group of genes that were not expressed in any of your samples, once you've renormalized the data, you might try filtering out such non-expressed genes (some tools like GenePattern's DESeq2 will do this by default). Under some circumstances that can also improve GSEA results.
Hope this helps, let us know if you have any more questions
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/74509c0e-f86b-4fb5-98b4-e5ddbb7e6c72n%40googlegroups.com.
Hi,
If they used the "quant all" information for DESeq2 it was probably raw counts, in which case it would need to be normalized prior to GSEA, if you use the implementation of DESeq2 on GenePattern (cloud.genepattern.org, which is free and only requires a simple registration) one of the outputs will be a normalized counts .gct file that can be used for GSEA. Unfortunately I don't know exactly what "quant all" is, this isn't a standard name, so I'm only guessing here. I would recommend reaching out to the individual or group that provided this dataset and inquiring about specifically what procedures that were performed to generate that file, but it is probably correct.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/36899486-0928-4071-b2cb-fcab8260960an%40googlegroups.com.
If the quant all file contains raw counts, which is what I suspect it does since the provider of the file used it to run DESeq2 and DESeq2 requires raw counts as input, those counts will need to be normalized to be usable for GSEA. You
can either go back to the provider of the data and ask them to export DESeq2's internal normalization table, or you can use the raw counts you've already been provided (probably) to rerun this normalization yourself.
I can provide you instructions for producing this normalization manually, or you can use one of the free publicly accessible online bioinformatics platforms to do this normalization, the two major options are Galaxy (usegalaxy.org) or GenePattern (cloud.genepattern.org).
On the GenePattern option, which requires the creation of a free user account to run, you'd format the raw counts as a GCT file just as you would use for GSEA, but instead you'd search for and run the "DESeq2" tool, that DESeq2 tool will give you a series
of outputs. One of those outputs will be a file that contains "normalized.counts.gct" that file can then be used as the input for GSEA.
Hopefully that is clearer, let me know if you have additional questions.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a8370e40-3abd-406f-85c6-141d9b98c3den%40googlegroups.com.
They might be describing the TMM method of count normalization, TMM is a suitable method for normalization for GSEA.
You might ask specifically if that is what they are describing there. If they mean it's a simple normalization using a global mean without consideration of sample-specific factors, I'm not sure if that is appropriate it is not a method I've tried or seen any literature on.
Some more information on standard normalization methods here: https://academic.oup.com/bib/article/19/5/776/3056951
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a4a11713-36ca-4efb-b78d-72e24b706fc0n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/ad927ae4-c1cf-4fe3-a53f-bec7354b57e6n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAGCeyZyGVrczX3kZX-kRF2qtJE9aUNwgb8JU5ADX0awi7r95KA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAGCeyZyGVrczX3kZX-kRF2qtJE9aUNwgb8JU5ADX0awi7r95KA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB76094AF2D50EF3672886709FF7029%40SJ0PR05MB7609.namprd05.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXvMAodPHZvQ5WCF-tRWTVWEOSWBLKAwKaEkgtyMjgE7EQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAGCeyZyGVrczX3kZX-kRF2qtJE9aUNwgb8JU5ADX0awi7r95KA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXsTff%3D-rGj6QdxfOU-Y450USXoB5PGjuO%2Bw6HBAHapsig%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAGCeyZyg_ts8U-UE-4Dc-0JJ9WBcRJcpaA-VS2tqaaQWE8rWUA%40mail.gmail.com.
Hello,
I we don’t really have anything in the way of support for this feature. I might suggest taking a look at the EnrichmentMap Cytoscape package. If you use the load datasets functionality from withing Cytoscape itself rather than from the tool link-out in GSEA, it has the option to load multiple datasets and display overlap between them. https://enrichmentmap.readthedocs.io/en/latest/
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXspinptjUyoa7rYPXxXb%3D4RFMaOCBWNTZbg6zd3i%2BjdMw%40mail.gmail.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB76098D17FCC05DDCA2919A49F7119%40SJ0PR05MB7609.namprd05.prod.outlook.com.
This is the heatmap of just the top and bottom 50 genes as ranked by the metric that was used to run GSEA (i.e., the signal_to_noise metric). It's mostly just included to provide a sanity check showing that there is good class discrimination in the top and bottom ranked genes.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXtv39QwZr64KpuhnTnEZ-CYm2JWF8TnMRdGQ-2qrgQL-g%40mail.gmail.com.
What heat map are you talking about? Could you send a screenshot of what you're referring to? You should only be getting results for gene sets that you've selected.
If you're talking about the same heatmap before, that heatmap has nothing to do with the sets that were run, it is only the top and bottom ranked genes in the data.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXsFS%3DN-xDEy03R2Z86w8256rDjksGZfcOedWLpaJB9hHw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB76098D17FCC05DDCA2919A49F7119%40SJ0PR05MB7609.namprd05.prod.outlook.com.
Hello,
I'm glad you've found the results from GSEA informative!
Running GSEA using custom sets is pretty easy, assuming you have a number of genes of interest (i.e. a set of differentially expressed genes from an independent study, a clique from WCGNA, or really any group of genes) you can put them
into any of our Gene Set Dataset formats, here:
https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#Gene_Set_Database_Formats
The main GMX or GMT formats are mainly intended for multiple sets where each set is a column (GMX) or a row (GMT) in either of these formats the first cell of either the row or the column is used as the set's name, and the second as it's description with the
remaining used for the set's members.
There is also the "grp" format which is just a simple text format with the first line starting with a # character and the set name then the rest of the rows being the set members.
Once a file in any of these formats is prepared, that file can be loaded into GSEA the same way you load in your expression data or ranked list. Then in the Run GSEA/Run GSEAPreranked window, you'll want to click the […] button next to the Gene sets database dialogue, and in the window that pops up click the "Gene matrix (local gmx/gmt)" tab if you've used one of those formats or click the ">" until you see the Gene sets (grp)" if you used the grp format for your set.
One thing to note is that if running only a single set with GSEA the " FDR q-val" statistic isn't generally meaningful the "NOM p-val" will still be valid and will tell you the significance of the enrichment.
Let me know if you have additional questions, or encounter any errors during this process
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXvhoG%2Btq731_ARODsEy6J1-MKXG6h1v8QrkokAxqHmFnQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609D7F7FA0CBF744915D191F7E49%40SJ0PR05MB7609.namprd05.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609D7F7FA0CBF744915D191F7E49%40SJ0PR05MB7609.namprd05.prod.outlook.com.
Hello,
In the documentation cytogenetic location is used as an example of a (series of) gene sets. The columns in that example are being used to provide multiple gene sets.
The gene set name you use is entirely up to you, ideally it should be something meaningful as to the origin of the set you're creating and have minimal special characters (I.e. no slashes, colons, dashes, etc).
If you only have one gene set you only need one column. The name in the first row of that column should be something you choose for that set, likewise the second row in that column should be a description that can be longer than the set name and include more details. The reset of the rows in that column are the genes that you want to comprise that set (one per row).
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXvyP5ZmcbYPBEhc-07BSP5pb4dnVxx4v8b3F4%3DMsMMm4A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB760957ED11C874749303BE94F7E79%40SJ0PR05MB7609.namprd05.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609D7F7FA0CBF744915D191F7E49%40SJ0PR05MB7609.namprd05.prod.outlook.com.
It's a little difficult to tell without seeing the Details view of the error, but based on the window in the background it looks like the run might not have been configured correctly. The "Ranked list" box is blank, you should click that box and select the ranked list you're trying to run from the dropdown. Additionally, in the Collapse/Remap to gene symbols" box you've selected "Remap_only" but in the "Chip platform" box, there is no chip file selected. Either "Collapse/Remap" should be set to "None" (not recommended) or the correct chip for your datatype should be selected for the "Chip platform". If you need assistance in selecting the correct chip file, please send a sample of the gene identifiers from the ranked list file.
Let me know if you're still having problems after addressing these issues.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXuT%3D__k%2Bq71UkfV2HytPYc%3DLPxyTgsjRGZRcf8S2ECw%3Dg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609E758DB40630A13FBF936F7E99%40SJ0PR05MB7609.namprd05.prod.outlook.com.
As far as I can tell there isn't anything wrong with the GMX file, this error has to do with the ranked list file. Did you try the steps I suggested in my previous email for correcting the issues with parameter values?
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXttX6SR%3Do%2BOz0sxOy8RRMu%2Bk6R8c4L2FCWhQQOQsJ-x0Q%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB760966B9AC0B3FBBA1A9B6BCF7E99%40SJ0PR05MB7609.namprd05.prod.outlook.com.
For running GSEA Preranked, your dataset should be prepared in the .RNK format here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#RNK:_Ranked_list_file_format_.28.2A.rnk.29
Once prepared in that format, it needs to have the file extension .rnk not .rnk.txt. If the file is not showing up, your operating system may have hidden the .txt extension. You should be able to remove this extra extension from the properties/get info menu. Once that's done, try reloading the file into GSEA and then see if it shows up in that dropdown.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXtxPYmvjiV0%3DkHaB5gBb5UtgOttspLWV1C0njTOOis4hA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB76090872713DFBA94CAE78AFF7E99%40SJ0PR05MB7609.namprd05.prod.outlook.com.
Ok, I didn't realize this was the same data you'd run before, my answer assumed it was a preranked dataset because the screenshot you sent was of the "Run GSEAPreranked" window.
If this is the same data you'd used before then the issue is that you're trying to run the wrong GSEA function. You've clicked on the "Run GSEAPreranked" tool which is designed for a single column of ranked data in the .rnk format. You'll need to click the
standard "Run GSEA" function and your data should show up under the standard "expression dataset" dropdown.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAvqpXs99GB9GxMn0JqqcpsH6Eh-SeUa-5GQbvDfe%3DDvzaQATQ%40mail.gmail.com.