What is the data source and calling procedure for copy number data in ccle_broad_2019?
19 views
Skip to first unread message
Neha Talluri
unread,
Mar 9, 2026, 12:09:54 AMMar 9
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cBioPortal for Cancer Genomics Discussion Group
I'm trying to understand the source and processing of the discrete copy number alteration data in the ccle_broad_2019 study and have hit several points of confusion.
What I've found so far:
In PR https://github.com/cBioPortal/datahub/pull/856 (which added this study), data_CNA.txt is stored in Git with no documented source. The meta_CNA.txt file only says "Copy number alterations from CCLE" with no methodological detail and where this data actually came from.
On cbioportal https://www.cbioportal.org/study/summary?id=ccle_broad_2019, there is also a specific CNA_genes.txt file. I have no idea how that was made but it does mention that the word Gistic and has the copy number calls (amplification and homozygous deletion) in it.
1) What is the exact source file used for data_CNA.txt?
2) What procedure was used to generate the copy number calls? The cBioPortal FAQ states that copy number data is often generated by GISTIC or RAE, but I cannot determine which was used here. The values in the data are only -2, 0, and 2; there are no -1 or 1 values. Was a different calling method used or was the data thresholded to only retain the extreme calls?