Which TCGA files are used to create cBioportal's "Mutations" tab?

303 views
Skip to first unread message

ir8...@gmail.com

unread,
Aug 8, 2017, 11:47:58 AM8/8/17
to cBioPortal for Cancer Genomics Discussion Group
I have downloaded somaticSniper, Mutect, Muse, and Varscan exome files from the GDC Archive for a number of the TCGA datasets.  From what I can tell, it looks like only the single nucleotide variants observed with all algorithms get reported on the "Mutations" tab in cBioportal.  Also, it looks like indels only need to be detected by Mutect to be reported. 

Can anyone confirm that this is correct?  

Pieter Lukasse

unread,
Aug 9, 2017, 4:01:25 AM8/9/17
to cBioPortal for Cancer Genomics Discussion Group, ir8...@gmail.com
Thanks for you question. Indeed not all mutations are loaded into cBioPortal. A number of mutations are filtered out by the data loading pipeline, as documented here: https://github.com/cBioPortal/cbioportal/blob/master/docs/File-Formats.md#mutation-data . Here you will find the specific filtering details regarding Variant_classification field, pasted below for your convenience:

Variant_Classification: (MAF column) Translational effect of variant allele. Allowed values (from Mutation Annotation Format page): Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins, Missense_Mutation, Nonsense_Mutation, Silent, Splice_Site, Translation_Start_Site, Nonstop_Mutation, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR1 , Intron, RNA, Targeted_Region, De_novo_Start_InFrame, De_novo_Start_OutOfFrame. cBioPortal skips the following types during the import: Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR and RNA. Two extra values are allowed by cBioPortal here as well: Splice_Region, Unknown.

Regarding your question about only SNPs being loaded: yes, I think most mutations will be SNPs. But you will find that other types of mutations are present as well, as you can see in the attached screenshot (here you see In_Frame_Del and Frame_Shift_Del events regarding a deletion of 2 or more nucleotides). Furthermore, cBioPortal supports fusion mutations as well (https://github.com/cBioPortal/cbioportal/blob/master/docs/File-Formats.md#fusion-data, see this example http://www.cbioportal.org/index.do?session_id=598abdac498e5df2e293b0c7&show_samples=false&)

I hope this extra information helps in answering your questions. Please let me know if you have any more questions.

Best regards,

Pieter

PS: if you have a local installation and are interested in changing the filtering rules (e.g. load the 5'Flank mutations as well), then you could be interested in a recent development we added to make this configurable: https://github.com/cBioPortal/cbioportal/pull/2670. I expect this will be release soon.
if_del_fs_del.png

Nikolaus Schultz

unread,
Aug 9, 2017, 6:35:01 AM8/9/17
to Pieter Lukasse, cBioPortal for Cancer Genomics Discussion Group, ir8...@gmail.com
Note that we do not currently load the mutation data from the GDC - what we have is the original mutation data generated by the individual TCGA sequencing centers. The source of the data is the Broad Firehose (or the publication pages for data that matches a specific manuscript). These data are usually a combination of two mutation callers, but they differ by center (typically a variant caller like MuTect plus an indel caller. 

Niki 
--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/56b829cd-3dec-4d59-a085-7d299517dfd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<if_del_fs_del.png>
Reply all
Reply to author
Forward
0 new messages