"Log2 copy-number values" are not

1,184 views
Skip to first unread message

Steven Schumacher

unread,
Dec 7, 2015, 10:41:30 PM12/7/15
to cbiop...@googlegroups.com
Hi CBioPortal Staff,

First let me say that the CBioPortal is an absolutely stunning piece of work. Nice job!

I am writing to point out, however, that the gene-level  SNP 6.0 copy number values one obtains for downloaded TCGA data is inaccurately labeled as "Log2 copy-number values" when they are, in fact, relative linear copy number values. To transform them to median-centered log2 ratio data, you would apply the formula

    L2R = log2(RCN+2) - 1  (Note that 0 <=> 0 in either units)

This is an unfortunate misunderstanding that I believe has been propagated by outdated GISTIC 2.0 documentation. I think CBio should either transform the data as above or change the portal description so that users will draw accurate conclusions from these data.

Thanks,
Steve Schumacher
GISTIC Developer

Nikolaus Schultz

unread,
Dec 8, 2015, 11:11:09 PM12/8/15
to cbiop...@googlegroups.com, Steven Schumacher
Hi Steve,

Thank you for the kind words, and for making us aware of this. I don’t think we want to change the data and transform it, so we should just rename it. We will try to do this soon.

Niki.



--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amanda

unread,
Nov 23, 2016, 1:29:17 PM11/23/16
to cBioPortal for Cancer Genomics Discussion Group, sc...@broadinstitute.org
Hi

I have found that cBioPortal has changed the name of the copy number data to "linear CNA". Could I understand the levels contained in the file represent the average of the segmented copy-number scores (GISTIC 2.0) of the genetic interval between the transcription start and end sites, which could be used as the gene-based somatic copy-number measure? Thank you!

Best wishes,
Amanda

Steven Schumacher

unread,
Nov 23, 2016, 4:14:07 PM11/23/16
to Amanda, cBioPortal for Cancer Genomics Discussion Group
Hi Amanda,

You are approximately right. The units are not the focal copy number score used as a statistic by GISTIC, but just the median-centered relative copy ratio. A few years back we would average the copy number from the transcription start to end sites, but currently we take the most extreme copy number value found in the transcript: the least deletion or greatest amplification.

Thanks,
Steve

Amanda

unread,
Nov 23, 2016, 4:14:08 PM11/23/16
to Steven Schumacher, cBioPortal for Cancer Genomics Discussion Group
Hi Steve,

Thank you so much for your prompt response. Does that mean the levels in the file did not go through GISTIC? What would you suspect how correlated it is between the focal copy number score by GISTIC and the relative copy ratio (not go through GISTIC)?

Thank you!
Amanda

Steven Schumacher

unread,
Nov 29, 2016, 11:01:11 AM11/29/16
to Amanda, cBioPortal for Cancer Genomics Discussion Group
The GISTIC gene-table outputs (all_data_by_genes, focal_data_by_genes, broad_data_by_genes) are derived from data that has been pre-processed by GISTIC
  1. Gaps, germline CNVs are removed
  2. Events smaller than the join segment size parameter are merged with the adjacent segment nearest in value
  3. Data are median centered within each sample and capped
  4. log2 ratio data are transformed to linear copy number
After these steps, the all_data_by_genes output could be created from the prepared marker-level data (in practice we run the rest of the GISTIC analyses first). I believe this is the output that the cBioPortal consumes.

The focal_data_by_genes and broad_data_by_genes outputs get a little more processing: the ziggurat deconstruction procedure analyzes the linear copy profile into a sum of a profiles made up of focal events and broad events; from these the respective tables are made.

The correlation between the focal value and the overall value really depends on the characteristics of the sample/disease - if there are a lot of arm-level changes, I would not expect much correlation. For non-aneuploid samples where most of the copy changes are focal, I would expect fairly good correlation.

Thanks,
Steve
Reply all
Reply to author
Forward
0 new messages