Mutation Count discrepancy

31 views
Skip to first unread message

Bullen, Catherine (NIH/NCI) [C]

unread,
Jun 17, 2025, 10:42:07 PMJun 17
to cbiop...@googlegroups.com, Steck, Rebecca (NIH/NCI) [C], Dunn, Patrick (NIH/NCI) [C]

Hello cBio team!

 

We are reaching out regarding an observed discrepancy in mutation loading.

 

When we loaded in the mutations MAF file using the metaImport.py script into local instance via docker-compose method, we noticed during QA of data load that the total number of mutations was lower than expected post-filtering of Silent/Intronic mutations, as well as that certain samples actually gained mutations compared to the expected counts.

 

We narrowed this down to mutations that were annotated by VEP as ENST*, i.e. ENST00000507747, which were being mapped post-data load to the TERT gene, i.e. a subset of ENST* annotations in MAF file were mapped to TERT post-data load (perhaps analogous to symbol synonym mappings), even though the position of the mutations did not overlap TERT position in genome AND were being mapped to samples that did not have TERT mutations or ENST* mutations in original MAF file. The rest of the ENST* annotation mutations appear to have not been loaded at all.

 

We wanted to reach out if this is a known issue or how this may come about, there do not appear to be any specifications in the gene synonyms file to map unknown Hugo Symbol designations to TERT https://github.com/cBioPortal/datahub-study-curation-tools/blob/master/gene-table-update/build-input-for-importer/gene_info.txt

 

For resolution of issue, it seems that we should likely filter out our observed ENST mutations prior to data load as they do not have a recognized Hugo Symbol annotation from VEP, but we are also inquiring if there any additional parameters or steps you recommend that might deal with this automatically during data load, so that when users download the MAF file from cBioPortal data download button those mutations can still be preserved?

 

Thanks, and appreciate any insight!

 

Catherine Bullen, Ph.D. [c]

Manager I, Bioinformatics, CTOS

Bioinformatics and Computational Sciences Directorate (BACS)

Frederick National Laboratory for Cancer Research

National Institutes of Health

office: 240-620-0843

catherin...@nih.gov [c]

 

 

Benjamin Gross

unread,
Jun 17, 2025, 11:17:44 PMJun 17
to Bullen, Catherine (NIH/NCI) [C], cbiop...@googlegroups.com, Steck, Rebecca (NIH/NCI) [C], Dunn, Patrick (NIH/NCI) [C]
Hi Catherine,

Thank you for your detailed email.

It is true that during import, a certain amount of filtering can occur based on the mutation type of the variant-such as silent, intronic, UTR/flank, or IGR.  It is also true that during data loading, if a primary gene ID or symbol cannot be found, but the id or symbol is an alias for another gene, the data may be mapped to that gene. However, I’m not aware of any mapping that occurs after data loading based on VEP annotation.  Perhaps I am misunderstanding.

Are you seeing this on the Mutation Table of the the Study View page?  Does your MAF contain Entrez IDs or Hugo Symbols?  If you are using Hugo Symbols in your MAF,  you might try to specifying Entrez IDs instead, as this can ensure unambiguous mapping.  If you can provide some example variant records, we can try and reproduce the issue on our end.  Can you also confirm which version of the cBioPortal you are using (https://www.cbioportal.org/api/info)?

Thanks,
Benjamin



-- 
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/SA0PR09MB66522CB8C423904E493236199C73A%40SA0PR09MB6652.namprd09.prod.outlook.com.

Bullen, Catherine (NIH/NCI) [C]

unread,
Jun 18, 2025, 10:23:51 AMJun 18
to Benjamin Gross, cbiop...@googlegroups.com, Steck, Rebecca (NIH/NCI) [C], Dunn, Patrick (NIH/NCI) [C]

Hi Benjamin,

 

Thanks for your response. Re: your questions:

 

Are you seeing this on the Mutation Table of the Study View page?  

 

Yes, we see gains in TERT gene (expected 38 mutations, observed 126) in the Mutation Table of the Study View. We also compared mutations counts for genes downloaded from the Mutation Table to those in our MAF and found discrepancies. We also compared expected (MAF) vs actual mutation counts (downloaded from Sample level mutation summary table) for genes for a subset of samples that had different mutation counts than expected and also observed increases in TERT gene mutations or removals of mutations in ENST annotated mutations.  

 

Also interesting to note is that we set variant filtering during loading to default (i.e. mutations annotated as protein changes RNA, IGR, Intron, Silent, 3' and 5' Flank and 3' and 5' UTR etc will not be loaded in), but the additional 88 mutations assigned to TERT all have been assigned to 5'Flank mutation types which should not have been loaded based on the default settings:

 

hIwAAQQQAABBBBAAAEEEEAAAQQQQAABBNwJfAdBmhPe+yQMPAAAAABJRU5ErkJggg==

 

 

Does your MAF contain Entrez IDs or Hugo Symbols?  If you are using Hugo Symbols in your MAF,  you might try to specifying Entrez IDs instead, as this can ensure unambiguous mapping. -

 

Correct, we are using Hugo Symbols at this time, we may explore adding Entrez ID as well to our annotation pipeline although  it appears that many of the records giving us issues may not have associated Entrez IDs, for example: ENST00000507747 does not appear to have an associated Entrez ID.

 

If you can provide some example variant records, we can try and reproduce the issue on our end.

 

Sure, here are some examples:

 

    Hugo_Symbol Chromosome  Start_Position  End_Position Variant_Classification

ENST00000507747       chr6       166858144     166858144      Missense_Mutation

ENST00000636096       chrX        71667993      71667993      Missense_Mutation

ENST00000519555       chr8         6059094       6059094        Targeted_Region

ENST00000665361       chr2       170770690     170770690        Targeted_Region

 

 

Can you also confirm which version of the cBioPortal you are using ( https://www.cbioportal.org/api/info)?

 

We're running this in the docker cBioPortal with versions

portalVersion: 6.0.24

dbVersion: 2.13.1

 

Thanks!

Catie

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

 

Benjamin Gross

unread,
Jun 18, 2025, 11:46:23 AMJun 18
to Bullen, Catherine (NIH/NCI) [C], cbiop...@googlegroups.com, Steck, Rebecca (NIH/NCI) [C], Dunn, Patrick (NIH/NCI) [C], Madupuri, Ramyasree
Hi Catie,

I’ve added Ramya Madupuri to this email.  She is on our data curation team and will be assisting me with this issue.

Can you provide the entire record for the examples below?  We are curious what mutation type has been assigned to these records.  Even better if you can provide the entire MAF, we can attempt to account for all the TERT mappings.  If you prefer you can send this to us directly.

Thanks,
Benjamin

On Jun 18, 2025, at 10:17 AM, Bullen, Catherine (NIH/NCI) [C] <catherin...@nih.gov> wrote:

Hi Benjamin,
 
Thanks for your response. Re: your questions:
 
Are you seeing this on the Mutation Table of the Study View page?  
 
Yes, we see gains in TERT gene (expected 38 mutations, observed 126) in the Mutation Table of the Study View. We also compared mutations counts for genes downloaded from the Mutation Table to those in our MAF and found discrepancies. We also compared expected (MAF) vs actual mutation counts (downloaded from Sample level mutation summary table) for genes for a subset of samples that had different mutation counts than expected and also observed increases in TERT gene mutations or removals of mutations in ENST annotated mutations.  
 
Also interesting to note is that we set variant filtering during loading to default (i.e. mutations annotated as protein changes RNA, IGR, Intron, Silent, 3' and 5' Flank and 3' and 5' UTR etc will not be loaded in), but the additional 88 mutations assigned to TERT all have been assigned to 5'Flank mutation types which should not have been loaded based on the default settings:
 
image001.png
Reply all
Reply to author
Forward
0 new messages