Help with formatting Agilent microarray data for GSEA analysis

161 views
Skip to first unread message

Darren Mok

unread,
Mar 1, 2021, 1:01:39 AM3/1/21
to gsea-help

Hello,

I would like to ask for assistance with formatting data files for GSEA analysis. I have microarray data obtained using the Agilent mRNA Microarray v3.0 8x60k chip and I would like to format it for use with GSEA. However, I am unsure as to how to go about this.

Has anyone ever done this before? Any help would be greatly appreciated!

Thank you and best regards,
Darren

Darren Mok

unread,
Mar 1, 2021, 1:05:26 AM3/1/21
to gsea-help
Just to update that the microarray platform and type is the Agilent SurePrint G3 Gene Expression Microarrays for Human (v3) 8x60k, 1 color.

Anthony Castanza

unread,
Mar 1, 2021, 1:20:25 AM3/1/21
to gsea...@googlegroups.com
Hi Darren,

Unfortunately of the Agilent SurePrint G3 GE 8x60k series we're only able to support direct analysis the v1 and v2 probe array as these are the only two with annotation information deposited in Ensembl's biomart.

However, if you are able to collapse the probe IDs in your dataset to their corresponding gene symbols in Agilent's array mapping files (most arrays have these symbols added from the standard analysis pipelines, or you may be able to find that array's annotations in GEO somewhere) then you should be able to use our Human Gene Symbol with Remapping chip to ensure that they are harmonized with the MSigDB version you want to use for your analysis.

Let me know if you have any questions about this process and I'll see what I can do to assist you

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/752e539f-26c9-41c1-99f6-ad87407b24fan%40googlegroups.com.

Darren Mok

unread,
Mar 1, 2021, 2:48:20 AM3/1/21
to gsea-help
Hi Anthony,

I do have the annotation and gene list ID for the chip set. But I am not sure how collapse the probe IDs to their corresponding gene symbols in Agilent's array. I have attached the files here for your review should you require it.

Darren Mok

unread,
Mar 1, 2021, 2:48:51 AM3/1/21
to gsea-help
annotation.txt

Darren Mok

unread,
Mar 1, 2021, 2:50:07 AM3/1/21
to gsea-help
genelist.txt.zip

Anthony Castanza

unread,
Mar 1, 2021, 1:15:49 PM3/1/21
to gsea...@googlegroups.com

Hi Darren,

 

You should be able to annotate gene symbols in your array using the information in the gene list file. Has this array been processed already (background correction, normalization, etc.)? If so, and assuming you already have the data for all your samples in one file, then you should be able to annotate in the gene symbols with a simple intersection between the two files (one way to do this would be with something like the R merge() command. Another option might be to use the “join two files” tool on a galaxy server (https://usegalaxy.org/) that should allow you do add the information you need non-programmatically. If your data is still in separate files you’ll want to combine them first on the basis of the probe IDs, but from my understanding, this should happen as part of the standard microarray analysis pipeline so that they are properly normalized. Unfortunately we can’t offer more specific help on the pre-processing pipeline for the agilent array platofrm itself.

 

After that, your array would need to be reformatted into GCT format (you could do this in Excel, format specification here: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29)  and run in GSEA using the Human_Gene_Symbol with Remapping chip file.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

Anthony Castanza

unread,
Mar 1, 2021, 1:33:33 PM3/1/21
to gsea...@googlegroups.com

Hi Darren,

 

Go ahead and disregard my last message, I realized it would probably be easier if I just used the file you sent to construct you a chip file that would work for this array using the same process we use to handle remapping the clariom array annotations. That’s to say, I’m offering this file without any guarantees or formal support (meaning, if there are genes that you are expecting to see that aren’t present, there isn’t really anything I can do about it). But it should allow you to analyze the SurePrint G3 GE 8x60k v3 array Probe IDs directly in GSEA without any outside remapping steps or using any other CHIP files.

 

Hope this helps! Let me know if you have other questions,

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

 

Agilent_SurePrint_G3_GE 8x60k_v3_to_Gene_Symbol_with_Remapping_MSigDB.v7.2.chip

Darren Mok

unread,
Mar 7, 2021, 7:01:16 PM3/7/21
to gsea-help
Oh my goodness thank you so much Anthony! I will try this out!
Reply all
Reply to author
Forward
0 new messages