Discrepancies in mutated genes frequency within the same set of samples

52 views
Skip to first unread message

Seidel, Lisa

unread,
Jul 18, 2019, 8:45:24 AM7/18/19
to cbiop...@googlegroups.com

Hi,

 

I am using your website for research purposes and find it very useful.

It would be very nice, if you could help me with my problem.

 

I queried Esophagogastric Adenocarcinomas, which gives me a total of 12 studies and 2862 samples. I am interested in TP53, FAT4, NBEA and COL14A1. If I hit the “query by gene” button and input these genes there, I get the percentage of samples in which the individual gene is mutated or altered in copy number (Fig1).

If I use the “explore selected studies” button, I assume that the same set of samples is queried. Looking only at the frequency of mutated genes, I expect smaller percentages compared to the oncoprint output, because copy number alterations are not included and listed separately. However, I get 18% for FAT4 (vs 16% together with CNA in Oncoprint) (Fig2). Also, I cannot find NBEA and COL14A1 in the list at all, even though they were mutated in 12% and 11% in oncoprint, respectively. How is this possible? What am I getting wrong?

 

 

Fig1

 

Fig2

 

 

Thanks a lot.

 

Lisa

 

 

Dr. Lisa Seidel

 

AG Prof. M. Binder - Tumorimmunologie

Universitätsklinikum Halle

Poliklinik für Innere Medizin IV

FG 06, U 01, 17.5

Ernst-Grube-Str. 40

06120 Halle (Saale)

lisa....@uk-halle.de

+49 345 557 3027

 

JJ Gao

unread,
Jul 19, 2019, 10:59:25 AM7/19/19
to Seidel, Lisa, cbiop...@googlegroups.com
Hi Lisa,

There are two main reasons that caused this discrepancy.

First of all, as you can see from your OncoPrint screenshot. Not all samples were profiled for both mutations and copy number alterations. Actually, the studies you selected, only the TCGA studies and the MSK study have both mutations and CNA data. In OncoPrint, when we calculate frequency, we use the samples profiled for either mutations or CNAs as the denominator, and therefore, the frequencies may be underestimated. To solve this problem, you can choose the cases with both mutations and CNAs data.

image.png

Secondly, FAT4 is not profiled in the MSK study (MSK study is not an exome study - FAT4 is not part of the gene panel). Usually we take that into consideration when calculating frequencies, but in this case, we have a data bug (https://github.com/cBioPortal/datahub/issues/780) and therefore cause more underestimated frequency. We will fix the issue.

That being said, it is important to pick the sample set for performing your analysis of frequencies. If you are only interested in Mutations, it would be a good idea to select only the mutations profile and cases with mutations data.

I also noticed that you selected all TCGA studies. There are many duplicated samples in those studies. We recommend using the latest Pancancer studies when possible.

Best,
-JJ




--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/898E3469215C184E900D6A77BB22FFB503996C17%40MBS5.uk-halle.de.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages