--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/36023d04-e9a6-4d4e-aba4-827640a88ff7n%40googlegroups.com.
Hi Joe,
This is correct.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/efe7ce8a-a02c-4d9b-b3a7-4cec8ebf30a6n%40googlegroups.com.
Hello,
Yes, you can do this, however we generally find that while gene_set enrichment gives more “significant” results, the results from phenotype permutation are generally more robust when the data fits its expectations.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/cbfed146-4912-4052-89ad-6f89945bd9d4n%40googlegroups.com.
Hi Feriel,
Phenotype permutation is a very stringent procedure that can frequently lead to low numbers of significant results, however, generally these are the most robust results. It isn’t uncommon to get no significant results if the phenotype being assessed didn’t result in particularly large molecular changes in the samples. Did you perform any kind of hierarchical or PCA clustering to determine if you have good separation between your phenotypes? Perhaps you have outliers that are causing issues with the phenotype permutation that should be excluded. Additionally, how was this data normalized? GSEA should generally be run on normalized counts, not raw counts or FPKM/RPKM/TPM data. Additionally, we recommend eliminating genes that aren’t expressed above a reasonable threshold in the data. This prevents unreasonable inflation of the global distribution with irrelevant genes.
Gene Set permutation is much more permissive than phenotype permutation, scrambling the genes in the sets rather than the samples in the phenotypes to construct the null distribution. This mode only really assesses how likely it is that a set of a given size was to be enriched to that degree in the data, rather than how likely it is that a set was enriched in that phenotype compared to a random phenotype.
A reduction to 23300 genes, from 24658 through collapsing the dataset is both reasonable and expected. Our Collapsing tool eliminates genes that don’t have an annotated symbol match and combines any gene symbols that might be annotated by multiple IDs (i.e. an Ensembl gene on a patch contig, and an Ensembl gene on the primary assembly will be combined). You can partially disable this behavior but we do not recommend it. In the advanced fields section, you can set the “Omit features with no symbol match” parameter to “false” instead of its default “true”, this will allow GSEA to keep any genes that did not match anything in the chip file, but it will still cause any multiple id-to-symbol mappings to be combined (which is necessary).
Finally, in the future, we’d ask that you create a new topic for issues specific to your dataset, to prevent notifying the original posters of a given topic with answers that aren’t relevant to their original issue. Thanks!
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/143d973f-c597-467d-9f28-577e609af7bbn%40googlegroups.com.