query regarding the gene-level CNV breast cancer dataset from the TCGA data hub

86 views
Skip to first unread message

Lisa George

unread,
Oct 27, 2019, 9:40:05 AM10/27/19
to gd...@broadinstitute.org
Hello,
My name is Lisa and I am a PhD student at the Hebrew University, in Jerusalem. I am interested to use the TCGA breast invasive carcinoma (BRCA) gene-level copy number variation (CNV) dataset estimated using the GISTIC2 method which is available for download at the ucsc xena TCGA data hub from the following link:
In the description, it mentions that TCGA FIREHOSE pipeline applied GISTIC2 method to produce segmented CNV data, which was then mapped to genes to produce gene-level estimates (processed at UCSC into the xena repository).

I have a few questions regarding this dataset that I would highly appreciate some clarification on:
1) The units of this data are given as Gistic2 copy number. Are these values in a log2 scale?  If not, can you please provide me with the exact information about the scale of the values?

2) I would also like some clarification on how to interpret these values for every gene. I found some documentation at this link https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/ which mentions the following:

Numeric focal-level Copy Number Variation (CNV) values were generated with "Masked Copy Number Segment" files from tumor aliquots using GISTIC2 on a project level. Only protein-coding genes were kept, and their numeric CNV values were further thresholded by a noise cutoff of 0.3:
Genes with focal CNV values smaller than -0.3 are categorized as a "loss" (-1)
Genes with focal CNV values larger than 0.3 are categorized as a "gain" (+1)
Genes with focal CNV values between and including -0.3 and 0.3 are categorized as "neutral" (0).


What I would like to understand is, does the BRCA CNV dataset that I linked above use this same interpretation for the values. In case my question was not clear, what I mean to ask is, for example: if a gene from the above dataset, say AICF, has gistic2 copy number values of -0.468, -0.008 and 0.005 for different samples, can we interpret these values based on the above thresholds or do they have a different interpretation altogether? I was interested to understand how exactly to interpret/read these gistic2 copy number values and whether these are in a log2 scale as it is not fully clear to me even after reading the paper of Mermel,Beroukhim et al(2011).
This information would be very valuable for my project and I would be very thankful if you could please provide with me this information or forward my email to someone who could kindly clarify my doubts.
Awaiting your response, Thanks very much.

Kind regards,
Lisa George

Lisa George

unread,
Oct 28, 2019, 2:05:34 PM10/28/19
to gd...@broadinstitute.org
Awaiting your response, Thank you very much.

Kind regards,
Lisa George

Gdac-users

unread,
Oct 28, 2019, 2:10:49 PM10/28/19
to Gdac-users, gd...@broadinstitute.org, lisa.r...@gmail.com
Hi Lisa,

We provide the TCGA data we generated via our data portal, firebrowse.org. Please take the time to look through it. you'll find that the GISTIC Copy Number report goes into great detail about all the output files and what settings were used.

Regards,
David

--
David Heiman
Senior Software Engineer
GDAN Processing Genome Data Analysis Center
CPTAC Proteogenomic Data Analysis Center
The Broad Institute of MIT and Harvard
415 Main Street
Cambridge, MA 02142
Reply all
Reply to author
Forward
0 new messages