Hi Mohamed,
The input gct file you’ve provided is comma separated, ssGSEA requires tab separated files. If you’re writing files out from R make sure to pass sep=”\t” to the write.table function. Also, the recommended quantification input for ssGSEA is TPM/FPKM/RPKM not normalized counts, normalized counts is used for standard GSEA which calculates differential expression across samples, whereas GSEA only ranks within a sample and therefore needs an internally normalized representation of relative expression (like TPM).
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
genepattern-he...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/genepattern-help/d00fa280-1466-4ae8-b314-c4964be5affen%40googlegroups.com.
Hi Mohamed,
I would refer you to this biostars thread: https://www.biostars.org/p/340547/
If you still have the original feature counts output which includes a column giving gene length, there’s a formula in that thread that you can apply to your data to calculate FPKM. If not, there’s another suggestion that might help but would require more work to apply.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/8648d6d0-8b74-41a0-8272-4e110416c18dn%40googlegroups.com.
Hi Mohamed,
It’s unlikely, FPKM is specifically “fragments per kilobase million” which requires paired end reads to constitute the “fragment”. The metric for single end reads is RPKM (reads per kilobase million) which would probably require a slightly different calculation.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/be5dae28-7183-4741-a3cc-ad77670d92fan%40googlegroups.com.
GSVA may perform some internal normalization but from the command I don’t see that. I would strongly advise against using counts as it will substantially bias your enrichment towards long genes.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/e9af3de6-1471-4315-ac0c-1b044f54d5c2n%40googlegroups.com.
Hi Mohamed,
This error means that the gene sets used don’t have enough overlap with your gene set to run the ssGSEA calculations, this is typically for one of two reasons:
If 1) please supply the full dataset to ssGSEA, if 2) you’ll need to collapse your dataset to gene symbols. This can be done with the GenePattern CollapseDataset module: https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00134:2.0.0
If your dataset is Human, I recommend using “sum of probes” if mouse or rat, “max_probe”. The chip file selected should match the gene identifier space used in your dataset.
Let me know if you have any additional difficulty with this process.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/a1a98f47-6b63-4b03-9ac7-8f57e01212d3n%40googlegroups.com.
Hi Mohamed,
The output looks as expected assuming you intended the analysis to be performed on a single gene set.
The gct format is tab delimited text so it should be easy to import to excel. To read it into R, you might want to open it with a text editor and remove the first two (header) rows, or read it in with read.table() setting sep=”\t”, fill=TRUE.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From: genepatt...@googlegroups.com <genepatt...@googlegroups.com> on behalf of Mohamed Kamal <mohamedka...@gmail.com>
Date: Thursday, February 25, 2021 at 3:41 PM
To: GenePattern Help Forum <genepatt...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "GenePattern Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
genepattern-he...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/09792276-9501-4948-b861-0c618f0d816fn%40googlegroups.com.
Hello,
I wasn't able to view your job ID directly, because it appears that that job was deleted. In the future, please leave the errored job in place if requesting debugging so we can view what went wrong. What I suspect happened here is that this was caused by file formatting errors.
Looking at your gene set file, you've formatted this is a GMX file not the GMT extension that you've given and also you are missing the Gene Set Name and Description rows. Please see https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#Gene_Set_Database_Formats for the proper structure for the GMT and GMX formats.
Additionally, please double check that the correct formatting for the GCT format was applied here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29 including the header and description column (the description can be filled with NA's).
If you still encounter errors after correcting the formatting please let us know
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/genepattern-help/ebcb688f-44da-4ef0-918f-fa334835b073n%40googlegroups.com.