TMB calculation in TCGA data

270 views
Skip to first unread message

Víctor

unread,
Mar 26, 2024, 11:55:38 PM3/26/24
to cbiop...@googlegroups.com
Dear BioPortal Team,

I hope this email finds you well. I am writing to you regarding a query about the data provided on the portal, specifically regarding the "tmb_non_synnonymous" variable of TCGA patients.

I have been using the data available on your platform for my research, and have come across the variable mentioned above. However, I would like to get more information on how exactly this variable is calculated, as I need to understand in detail the process to correctly interpret my results.

I would appreciate receiving detailed information regarding the methodology employed to calculate "tmb_non_synnonymous" in the TCGA data. Up until encountering your methodology, my approach involved dividing the total number of mutations in each patient's exome by 30Mb (the length of the exome) to derive the TMB, using WXS data from TCGA. However, upon attempting to replicate the results using my method, disparities emerged. Therefore, I am keen to understand the methodology you have utilized, including, if possible, the sources of data and any adjustments or processing conducted.

I look forward to your response and thank you in advance for your attention and assistance in this matter.

Best regards,

Víctor Montosa
BDSLab - UPV

Nikolaus Schultz

unread,
Mar 27, 2024, 12:04:08 AM3/27/24
to Víctor, cbiop...@googlegroups.com
Hi Victor,

The approach you described is exactly what we do in cBioPortal. In this example of melanoma, there are 141 non-synonymous mutations. Divide that by 30, and you get the reported 4.7 mutations / MB. 

Please note that this TMB only includes the non-synonymous variants, not the silent / synonymous ones. 

Please let us know where you see discrepancies.

Thanks.
Niki.


--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/CAJ69QDFLW64j6k42NwBSpJ2eqEHLM%3D5KEpzXK5eTCchUiXD1%3Dw%40mail.gmail.com.

Nikolaus Schultz

unread,
Mar 27, 2024, 10:53:27 PM3/27/24
to Víctor, cBioPortal for Cancer Genomics Discussion Group
Hi Victor,

I am adding the Google Group back to cc so that others can follow along.

I am not sure where you are getting the data from - are these MAF files from the GDC? I don’t recognize the file names, and we do not have have GDC data in cBioPortal. Do the inconsistencies remain when you use the MAF files from our datahub?

Niki.


On Mar 27, 2024, at 5:20 AM, Víctor <victormo...@gmail.com> wrote:

Hello again, 

I wanted to bring to your attention a slight discrepancy I noticed in our calculations regarding the Tumor Mutational Burden (TMB) for the sample TCGA-3N-A9WB that you provided.

In the example provided, your methodology yielded a non-synonimous Mutation Count of 141, resulting in a TMB of 4.7 when divided by 30. However, upon reviewing the data using the file 24b3f2a9-c617-4a2b-8ab4-89c454e6609a.wxs.aliquot_ensemble_masked.maf.gz from the TCGA WXS section, I observed a Mutation Count of 198. After filtering out synonymous mutations, the count reduces from 198 to 137, leading to a TMB of 4.56 when divided by 30.

I double-checked the application of the filter and found consistent results regardless of whether it was applied in column 51 (One_Consequence) or 52 (Consequence), confirming the mutation count as 137. I have attached the WXS file I used, and the Sample Sheet that verifies that indeed the WXS file belongs to this TCGA-3N-A9WB patient.

Additionally, I noticed a minor discrepancy in the TMB value representation. While the calculation yields 4.7, you've represented it as 4.7333, which seems inconsistent.

Thank you for your attention to this matter, and I look forward to resolving these discrepancies.

Víctor Montosa







24b3f2a9-c617-4a2b-8ab4-89c454e6609a.wxs.aliquot_ensemble_masked.maf
gdc_sample_sheet.2024-03-27.tsv

Víctor

unread,
Mar 28, 2024, 8:23:38 AM3/28/24
to Nikolaus Schultz, cBioPortal for Cancer Genomics Discussion Group
Hello Nikolaus, 

Yes, my data is from GDC. The example we have discussed is available at: https://portal.gdc.cancer.gov/files/0e8e8b90-280a-41c7-baa2-864739e67327 .
I don't know how to get a similar archive to compare it in the Bioportal, but looking at the summary you previously provided (https://www.cbioportal.org/patient?studyId=skcm_tcga_pan_can_atlas_2018&caseId=TCGA-3N-A9WB), it seems that GDC and Bioportal have different data from the same patient. 

Can you provide the MAF file of your datahub to compare it?

Thank you very much

Víctor Montosa

Reply all
Reply to author
Forward
0 new messages