inconsistencies between cohorts

Laura Parrilla Monge

unread,

Jun 8, 2026, 1:37:19 PMJun 8

to cbiop...@googlegroups.com

To whom it may concern,

My name is Laura and I am using cBioportal data as part of my current project about Liver Cancer and TP53.

I have noticed that two of the liver cancer cohorts have the exact same patients, TCGA Firehose Legacy and TCGA GDC 2025.

When I select one cohort and query for my favorite gene, TP53, I have noticed that 9 patients (list below) are considered "homozygous deletion" in the Firehose cohort (meaning, they have lost both copies of the gene). Surprisingly, the same patients in the GDC 2025 cohort are no longer the same! They simply have one mutation with shallow deletion (only one copy lost), and even 2 patients do not have any alteration any more.

Can you please explain to me the reason for this discrepancy? I read the algorithm was changed but I really have a hard time understanding how these patients would recover the gene.

Thank you in advance,

Laura

p.s. These are the 9 patients that I refer to

--

Laura Parrilla-Monge, PhD

Department of Pathology

MART Building 8M-0301

SUNY Stony Brook

Stony Brook NY 11794-8691, USA

Phone: 4-3510

Mail: laura.par...@stonybrook.edu

Tali Mazor

unread,

Jun 9, 2026, 4:30:01 PMJun 9

to Laura Parrilla Monge, cbiop...@googlegroups.com

Hi Laura - The data processing differences between the Firehose Legacy vs GDC can indeed have a significant impact on the copy number calls. You can see a more detailed answer here: https://groups.google.com/g/cbioportal/c/Ph_ch2-bwD0/m/dgIBqQtTAQAJ

-Tali

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/CANo5qy9FpUhcr3Y%3DDhh%3Dw%3D0KLYQupvK-TnntxT0eqcJDNVQa5Q%40mail.gmail.com.

Tali Mazor

unread,

Jun 10, 2026, 10:32:49 AMJun 10

to Laura Parrilla Monge, cBioPortal for Cancer Genomics Discussion Group

Hi Laura,

All pipelines have their pros and cons. If you are more comfortable with ASCAT, you can certainly use the GDC data. We generally recommend the PanCancer Atlas studies for several reasons including that they have the most data types available and the data was manually reviewed during the PanCancer Atlas analysis process. There is also a difference in genome to consider - GDC data is hg38 while PanCancer Atlas (and most other studies in cBioPortal) is hg19.

Please include the google group on future replies so others can benefit from this conversation.

-Tali

On Wed, Jun 10, 2026 at 12:45 AM Laura Parrilla Monge <laura.par...@stonybrook.edu> wrote:

Hi Tali,
Thank you for your fast response. However, your answer prompts more questions. The newer GDC study is based on ASCAT processing, and this is better for Copy Number Variation analysis.
Why do the experts advise to keep using the older processing?
Having used ASCAT, what happens with the RNAseq data analysis? Is it not trustable?

Thank you for your answers,
Laura

On Tue, Jun 9, 2026 at 4:30 PM Tali Mazor <tma...@ds.dfci.harvard.edu> wrote:

This is the first time you received an email from this sender (tma...@ds.dfci.harvard.edu). Exercise caution when clicking links, opening attachments or taking further action, before validating its authenticity.

Reply all

Reply to author

Forward