Inquiry About Metadata Harmonization in cBioPortal

16 views
Skip to first unread message

Xiaodong Pang

unread,
Aug 8, 2025, 5:54:05 PMAug 8
to cbiop...@googlegroups.com
Dear cBioPortal Team,

I hope this email finds you well. I’ve been exploring the metadata in various studies on cBioPortal and noticed some variability in how certain fields are labeled—for example, some studies use "gender" while others use "sex" to describe similar patient attributes. I’m curious to learn more about how metadata harmonization is currently handled in cBioPortal.

Specifically:
1. Could you share any documentation or guidelines on how metadata fields are standardized across studies?
2. Are there plans to further harmonize inconsistent metadata terms (e.g., "gender" vs. "sex") in the future?
3. If this is an ongoing challenge, would the team be open to collaboration or suggestions for improving metadata harmonization? Our team has developed a powerful method for metadata standardization that could potentially enhance consistency across studies, and we’d be happy to discuss how it might integrate with cBioPortal’s workflows.

Thank you for your time and for maintaining such a valuable resource for the research community. I’d be grateful for any insights you can share.

Best regards,
Xiaodong Pang, PhD
Insilicom LLC
xp...@insilicom.com

Benjamin Gross

unread,
Aug 8, 2025, 5:57:00 PMAug 8
to Xiaodong Pang, cBioPortal for Cancer Genomics Discussion Group
Hi Xiaodong,

We have a curation team that works towards harmonizing metadata across cBioPortal studies. I’ve included Ritika Kundra in this email. She leads our curation team and can provide more information and as well as discuss potential collaborations.

Best,
-Benjamin
> --
> You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cbioportal/5E45583F-5A66-4E98-96FE-9F026B7B5413%40insilicom.com.

Xiaodong Pang

unread,
Aug 12, 2025, 11:44:45 AMAug 12
to Benjamin Gross, cBioPortal for Cancer Genomics Discussion Group, Jinfeng Zhang
Dear Benjamin and Ritika,

Thank you for your prompt response—I’m delighted to connect and learn more about cBioPortal’s curation process.

Our team has developed the IDEAL platform (https://ideal.insilicom.com), which includes tools for metadata harmonization, data submission, and exploration. Specifically, our metadata harmonization pipeline is an end-to-end solution for standardizing unstructured/structured text into any target data model. For example:

We’ve harmonized GEO metadata to a subset of clinical variables derived from the GDC framework.

We’ve also applied this pipeline to all cBioPortal studies, enabling natural-language-based querying with consistent filters (e.g., resolving "gender" to "sex" for uniformity). You can explore this here: IDEAL Data Query.

The harmonization tools are available via:



We’d be happy to learn more about cBioPortal’s curation process,  demo these tools and discuss how they might complement cBioPortal’s curation efforts. Looking forward to your thoughts!

Best regards,
Xiaodong
Reply all
Reply to author
Forward
0 new messages