Download all mutations

160 views
Skip to first unread message

Dave DeStephano

unread,
Jul 11, 2022, 9:15:20 PM7/11/22
to cBioPortal for Cancer Genomics Discussion Group
Hello,

I would like to download mutation data for all available genes from the TCGA pancan lung adenocarcinoma study. 

When navigating to the study data, I am able to "Download all clinical and genomic data for this study", however when I clean this data (data_mutations.txt) and use the mutation field (HGVSp_Short) I see some discrepancies compared to downloading specific genes from the OQL bar ("click gene symbols below or enter here")

I am also not able to "Exclude alterations (mutations, structural variants and copy number) of unknown significance"

I was wondering if anyone has any advice on how to download all gene mutation data. Unfortunately there is not a simple "select *" option in the OQL query search bar.

David Higgins

unread,
Jul 12, 2022, 9:52:35 AM7/12/22
to Dave DeStephano, cBioPortal for Cancer Genomics Discussion Group

Hi Dave,

 

Only variants that cause non-synonymous mutations are displayed in cBioPortal in the user interface. If you are interested in non-synonymous mutations in a handful of genes, you can enter a query for these genes and then download them from the Download tab of the query results.

 

cBioPortal contains mutational data for variants in genes that are synonymous mutations. You can download the full list of all variants for all genes for all participants as you have done, by clicking “Download all clinical and genomic data for this study.” Discrepancies between the list you downloaded and what is on the Portal should reflect differences between these two data sources. The data_mutations.txt file will contain synonymous mutations.

 

There is no “one click” exclude option but you could filter data_mutations.txt by the “Consequence” column. cBioPortal considers missense mutations, nonsense mutations, start codon lost, stop codon lost, frameshift mutations, inframe deletions, and splice region variants to be non-synonymous.

 

If you have any follow-up questions, please Reply All so that our continued conversation is captured on the cBioPortal Google Group.

 

Best,

 

David M. Higgins, Ph.D. | (he/him)
Informatics Program Manager
Center for Data-Driven Discovery in Biomedicine (D3b)
Children’s Hospital of Philadelphia, USA

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/ff26dd7b-f17a-4abe-b720-358b4c38c542n%40googlegroups.com.

Dave DeStephano

unread,
Jul 13, 2022, 2:38:50 PM7/13/22
to David Higgins, cBioPortal for Cancer Genomics Discussion Group
Thank you so much for your assistance. I ran your response by the physicians I work for, and they don't think it is possible to use the "Consequence" column to determine variants of known significance, because they think they are based on annotations from literature, not purely the type of mutation. The Cancer Hotspots and OncoKB databases do this, but there is no way that I know of to integrate the annotations from the Cancer Hotspots and OncoKB databases for the complete list of genes and not just a handful.

The physicians I work with think that the Cancer Hotspots and OncoKb annotated mutations will filter out the noise and yield gene mutations of higher importance, while using the "Consequence" column in mutations.txt will not.

Best wishes,
David DeStephano

Dave DeStephano

unread,
Jul 21, 2022, 1:18:38 PM7/21/22
to cBioPortal for Cancer Genomics Discussion Group

Hello ,

We are continuing to investigate options for defining genes of interest. Do you have any documentation related to the "Impact" column and how it is defined in the data_mutations.txt file?

Best,
Dave
Reply all
Reply to author
Forward
0 new messages