Hi Josh,
It appears that this is data that was quantified at the individual transcript level. GSEA needs datasets that were quantified at the gene level. As such, we don't generally maintain CHIP files that map transcripts to genes as GSEA doesn't really have the math to do that properly. My recommendation would be to go back to the original data and requantify it using the gene level probe mappings rather than the transcript level mappings. If you don't have that available, you might be able to get away with tricking GSEA into accepting the data. You would need to get the gene symbols from the mapping instead of the probe IDS. If you don't have them, then I think this is it: https://gemma.msl.ubc.ca/arrays/showArrayDesign.html?id=871
What you might want to do is to replace the probe IDs with their respective symbols, then use the Gene_Symbols chip file to collapse with the "mean_of_probes" or "median_of_probes" options. That would estimate the gene's expression by averaging the transcripts. I'm not 100% sure that would be an acceptable way of handling this. You could just leave it as "max_probe" and then the highest expressed transcript will be taken as representative for the gene.
The proper way to do this would be to go back to the raw data and follow the established workflows for gene level microarray quantification.
Sorry I couldn't be of more help here.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
gsea-help+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gsea-help/0e634bb0-9442-4ab9-b078-862b11dac45fn%40googlegroups.com.