Mutations in PDAC datasets

37 views
Skip to first unread message

Katelyn Mullen

unread,
Jun 29, 2021, 11:42:33 AM6/29/21
to cBioPortal for Cancer Genomics Discussion Group
Hi,

I'm trying to compare driver mutations present in independent cohorts. In particular, I am interested in the Pancreatic Adenocarcinoma cases from the MSK Clinical Sequencing Cohort and the Pancreatic Adenocarcinoma (TCGA, PanCancer Atlas) cohort. I know that all of the genes in the Mutated Genes chart are cancer genes according to OncoKB, but does the mutation count/frequency at which each gene is mutated in the cohort represent only driver mutations in the caner gene, or any mutation observed in the cancer gene? I am only interested in driver mutations in cancer genes, not passenger/silent mutations. Furthermore, does a deeper level annotation for these genes exist in cbioportal for these cohorts? I.e. a table with reference and alternate alleles for each patient, etc.

Thanks!
Katelyn 

Benjamin Gross

unread,
Jun 29, 2021, 9:42:00 PM6/29/21
to Katelyn Mullen, cBioPortal for Cancer Genomics Discussion Group
Hi Katelyn,

The mutation count/frequency in the Mutated Genes table is not restricted to driver mutations.  However certain mutation types are filtered out during the import process, including silent, intron, 3’utr/flank, 5’utr (some 5’flank mutations for promotor regions are allowed in).

Regarding deeper annotation, if you visit the patient view page, accessible via the “View selected cases” button on the study view page:


you can choose to display additional properties of each gene/mutation belonging to the patient:


Best,
Benjamin

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/1b5ba3af-7c01-4e60-87f4-361c999ebb81n%40googlegroups.com.

Katelyn Mullen

unread,
Jun 30, 2021, 9:11:54 AM6/30/21
to cBioPortal for Cancer Genomics Discussion Group
Hi Ben,

Sorry, I should have specified further; is there a way to download the annotated information for all patients in one file from cbioportal instead of as separate tsvs for each patient in the selected cases summary? When I went to fetch the original data instead (https://gdc.cancer.gov/about-data/publications/pancanatlas), the maf contains way more patients than the PDAC subset I am interested in (184 patients wanted, 10295 unique tumor sample barcodes found in the maf). Screen Shot 2021-06-30 at 9.08.16 AM.png

Benjamin Gross

unread,
Jun 30, 2021, 9:45:40 AM6/30/21
to Katelyn Mullen, cBioPortal for Cancer Genomics Discussion Group
I think the easiest way may be to download the mutation file from the data hub repository.  You can search by study identifier here:


This is the folder for the IMPACT study:


The mutation file is called data_mutations_extended.txt.  It follows the TCGA-MAF format.

Best,
Benjamin



Katelyn Mullen

unread,
Jul 6, 2021, 7:04:01 AM7/6/21
to cBioPortal for Cancer Genomics Discussion Group
Thanks, Ben! Looking at the MSK IMPACT dataset from 2017, I notice that there are less than 400 PAAD samples. Would it be possible to create another directory, similar to hcc_mskimpact_2018, with more data including more recent PAAD samples collected though 2020? The current sample count for PAAD in cbioportal is over 2800, so I would really like to take advantage of that data, if possible. 

Benjamin Gross

unread,
Jul 9, 2021, 9:47:13 AM7/9/21
to Katelyn Mullen, Kundra, Ritika/Sloan Kettering Institute, cBioPortal for Cancer Genomics Discussion Group
Hi Katelyn,

I’m sorry this fell through the cracks.  As you probably know, within the cBioPortal website you could perform queries across datasets based on your cancer type of interested, but to see a deeper level of mutation annotation, you would be limited to a small set of genes.   

The directories in datahub correspond to published datasets found in cbioportal.org.  If you wanted to work and this level, you would have to identify the datasets of interest, and then download corresponding MAFs and combine them.

Perhaps others in the community have a better suggestion.

I hope this helps.

Benjamin

Reply all
Reply to author
Forward
0 new messages