Clariom S chip

36 views
Skip to first unread message

j serrano

unread,
Jul 26, 2024, 1:37:42 PM7/26/24
to gsea-help

Good afternoon,

 We wanted to use the GSEA desktop application to analyze outputs from a mouse clariom s array.

I did notice that your legacy chip (Clariom_S_Mouse.r1_MSigDB.v7.1_REMAPPED_PATCH ) contains all uppercase symbols, which do not match any mouse gene set, and corrected it within excel (not a big deal). Otherwise the program gives errors.

However, I started to compare the original mapping from Thermo with yours and I found some discrepancies.

  •  5068 missing assignments (Na) where there are only 3 in Thermo’s
  • 2101 miss-matched assignments (different gene names) between the chip and Thermo’s

Could you help me out to understand how the chip was created?

For the mismatch, can you help me see if you were using human orthologs, and if the gene sets would use those or the mouse IDs?

Still, 5068 missing looks concerning.

 

Joan

Anthony Castanza

unread,
Jul 26, 2024, 2:33:26 PM7/26/24
to gsea-help
Hi Joan,

The Clariom chips for mouse are indeed ortholog mapped to Human Gene Symbols.
That particular version of the chip is quite old. The current version is: https://data.broadinstitute.org/gsea-msigdb/msigdb/annotations/human/Mouse_Clariom_S.r1_Human_Orthologs_MSigDB.v2023.2.Hs_REMAPPED_PATCH.chip
However, this is indeed still mapped to human orthologs and intended to be used against the human collections. Our ortholog mapping process is quite strict, and based on the Alliance of Genome Resources orthologs dataset.
I do not recommend converting this by case-correcting the gene symbols to the mouse standard format. I would not consider this to be a valid backwards-mapping to use with the mouse collections.

Unfortunately because the Thermo data is proprietary, we needed to get a one-time exception to access the raw data and construct an initial set of mappings which we've been carrying forward by remapping to the new database. However we don't necessarily recommend using these Clariom chips if you do have your own access to the underlying data.
In that case, my recommendation is to construct your own chip file from the Clariom data, then make use of the separate Collapse Dataset tool in GSEA to collapse your dataset to an identifier type we do support (such as Ensembl ids or gene symbols), then to run GSEA with our chip file for that identifier type to ensure the symbols match our database symbols.

Let me know if you have any additional questions here,

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

j serr

unread,
Jul 26, 2024, 3:18:38 PM7/26/24
to gsea-help
Hi Anthony,

Thanks for your quick response.
We were intending to use it against mouse collections, so we'll try to do our chip.

Small question in that regard. I have some probes with two gene names separated by a semicolon (Mir206; Mir133b) while the chip file you uploaded uses a triple bar (Slc12a6 /// katnbl1). Could you tell what would GSEA do in each case? Are the triple bar cases treated as 1 gene, 2 genes, or 1 gene with additional information? I couldn't find info on the guide.

Joan

Anthony Castanza

unread,
Jul 27, 2024, 12:12:50 AM7/27/24
to gsea-help
Hi Joan,

GSEA treats each row of the first column as a single entity for mapping, it doesn't split either the semicolon or triple bar. There really isn't a good way to handle these ambiguous IDs in GSEA. None of the gene sets in MSigDB contain these multi-genes anyway so they can contribute to the gene universe and the null distribution but never to a gene set. In most cases it is safe to omit them since the gene entities that are included in the list are generally better represented by other, non-ambiguous probes without inflating the gene universe.

Let me know if you have more questions

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Reply all
Reply to author
Forward
0 new messages