fields populated by source data vs populated by cbp efforts

8 views
Skip to first unread message

Heather Costello

unread,
Feb 25, 2026, 3:56:19 AM (5 days ago) Feb 25
to cBioPortal for Cancer Genomics Discussion Group
Hello,

My name is Heather Costello, and I am working as part of a team at the Nationwide Children's Hospital to build a metaknowledgebase of cancer variants found in pediatric cancers. I am using all of the pediatric studies from your database. 

My current question is, of the data present in the mutations, clinical sample, and clinical patient files, which were taken from the original source data, and which were populated during the building of the resource, through normalizers and such. At the moment, I'm particularly interested in the "ONCOTREE_CODE" column from the clinical sample file, although I would ideally like to know the original source of all "descriptive" columns in the data.  

Is there someplace I can go to get this information? 

Thank you,

Heather Costello, PhD
Abigail Wexner Research Institute
Nationwide Children's Hospital
Columbus, OH 43205


Jessica Singh

unread,
Feb 25, 2026, 7:42:04 AM (4 days ago) Feb 25
to Heather Costello, cBioPortal for Cancer Genomics Discussion Group

Hi Heather,

Thanks for reaching out. This is a great question, especially given that you are building a meta-knowledge base for pediatric cancer.

In general, cBioPortal does not generate primary data. The mutation, clinical sample, and clinical patient files are derived from the original study data, which typically come from publications, supplementary materials, or public repositories. That said, during ingestion, certain fields may be harmonized or standardized to enable consistent comparison across studies within the portal.

Regarding the ONCOTREE_CODE column specifically, tumor type annotations are often mapped to OncoTree during curation. In many cases, the original study provides free-text histology or diagnosis labels, and these are then mapped to standardized OncoTree terms as part of the data processing workflow. So the underlying diagnosis comes from the source study, but the OncoTree code itself may be assigned during ingestion.

For provenance of other descriptive clinical fields, the best places to look are:

  • The original study pages and linked publications/supplementary materials (these usually describe what clinical annotations were provided)

  • The cBioPortal DataHub repository, which contains the import-ready files and metadata for many public studies, including clinical attribute definitions and values as used in the portal

If it helps, feel free to share a short list of the specific clinical attributes you are most interested in tracing (beyond ONCOTREE_CODE). If you can also share one or two representative study IDs from the portal as examples, I can point you to the most likely source references and where those fields are typically introduced or standardized.

If you have any follow-up questions, please Reply All so that our continued conversation is captured on the cBioPortal Google Group. 

Best,
Jessica

Jessica Singh - Schedule a Meeting with me

Solutions Consultant - Genomics and Target Discovery Team

E jes...@thehyve.nl

T +31 30 700 9713


Read more about our services around cBioPortal and Open Targets. Interested in The Hyve's newsletter? Then sign-up!




--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/eaed10c0-9dd8-41b0-a126-6a74afefb97an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages