AFFY Chip annotation mismatch

29 views

Skip to first unread message

Josh Spin

unread,

May 9, 2022, 5:34:59 PM5/9/22

to gsea-help

Hi GSEA folks,

This is my first time running GSEA with mouse Affy data (always used Agilent prior).

There are a number of Chip platform files available to download that map the chip information to human symbols and I looked at several of them, but none seem to match the Probe Set IDs from the chip I'm using that show up when I export the data.

This is the gene chip (it's a mouse whole genome):

#%array_type=MTA-1_0

#%annotation=MTA-1_0.r3.na36.mm10.a1.transcript.csv

The IDs run from:

TC0100000001.mm.1 to
TSUnmapped00000186.mm.1

There are 65956 IDs in the dataset. (I note that several of the Probe Set IDs match to more than one gene symbol.)

Which Chip file should I be using for the analysis?

Thanks much,

Josh Spin

Anthony Castanza

unread,

May 9, 2022, 5:47:47 PM5/9/22

to gsea...@googlegroups.com

Hi Josh,

It appears that this is data that was quantified at the individual transcript level. GSEA needs datasets that were quantified at the gene level. As such, we don't generally maintain CHIP files that map transcripts to genes as GSEA doesn't really have the math to do that properly. My recommendation would be to go back to the original data and requantify it using the gene level probe mappings rather than the transcript level mappings. If you don't have that available, you might be able to get away with tricking GSEA into accepting the data. You would need to get the gene symbols from the mapping instead of the probe IDS. If you don't have them, then I think this is it: https://gemma.msl.ubc.ca/arrays/showArrayDesign.html?id=871

What you might want to do is to replace the probe IDs with their respective symbols, then use the Gene_Symbols chip file to collapse with the "mean_of_probes" or "median_of_probes" options. That would estimate the gene's expression by averaging the transcripts. I'm not 100% sure that would be an acceptable way of handling this. You could just leave it as "max_probe" and then the highest expressed transcript will be taken as representative for the gene.

The proper way to do this would be to go back to the raw data and follow the established workflows for gene level microarray quantification.

Sorry I couldn't be of more help here.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/0e634bb0-9442-4ab9-b078-862b11dac45fn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages