What is the data source and calling procedure for copy number data in ccle_broad_2019?

19 views
Skip to first unread message

Neha Talluri

unread,
Mar 9, 2026, 12:09:54 AMMar 9
to cBioPortal for Cancer Genomics Discussion Group
I'm trying to understand the source and processing of the discrete copy number alteration data in the ccle_broad_2019 study and have hit several points of confusion.

What I've found so far:

In PR https://github.com/cBioPortal/datahub/pull/856 (which added this study), data_CNA.txt is stored in Git with no documented source. The meta_CNA.txt file only says "Copy number alterations from CCLE" with no methodological detail and where this data actually came from.

The cbioportal https://www.cbioportal.org/study/summary?id=ccle_broad_2019 site mentions it downloads the data from here: https://depmap.org/portal/data_page/?tab=allData. I however have no idea what file was used for data_CNA.txt; maybe CCLE_ABSOLUTE_combined_20181227.xlsx.

In issue https://github.com/cBioPortal/datahub/issues/287, the proposed mapping was CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct -> data_CNA.txt, with a ? indicating uncertainty. The file appears to be downloadable from https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/. This data seems different than what is actually in data_CNA.txt.

On cbioportal https://www.cbioportal.org/study/summary?id=ccle_broad_2019, there is also a specific CNA_genes.txt file. I have no idea how that was made but it does mention that the word Gistic and has the copy number calls (amplification and homozygous deletion) in it.

In PR https://github.com/cBioPortal/datahub/issues/1607, this answer also went unanswered.

My specific questions:

1) What is the exact source file used for data_CNA.txt?

2) What procedure was used to generate the copy number calls? The cBioPortal FAQ states that copy number data is often generated by GISTIC or RAE, but I cannot determine which was used here. The values in the data are only -2, 0, and 2; there are no -1 or 1 values. Was a different calling method used or was the data thresholded to only retain the extreme calls?

3) What exactly happened to get CNA_genes.txt?
Reply all
Reply to author
Forward
0 new messages