Dear cBioPortal support team,
This is Leopoldo Garcia and Sara Piccirillo from the Health Science Center of the University of New Mexico.
We are asking your help to reproduce the mRNA expression z-score values relative to diploid samples from the Glioblastoma Multiforme (TCGA, Firehose Legacy) study. We use the information provided in the following link to calculate the z-score values relative to diploid samples for the TP53 gene:
https://github.com/cBioPortal/cbioportal/blob/master/docs/Z-Score-normalization-script.md
We were able to reproduce the z-scores for the Affymetrix and Agilent platforms but not for the RNA-Seq platform. Could you please tell us if there are any additional steps we should consider?
We would appreciate your support on this matter.
Best regards,
Leopoldo and Sara.
Dear JJ,
As a follow-up to our last message regarding the generation of z-scores relative to diploid samples, we thought it might be helpful to share with you our approach to replicate the values and the discrepancies we observe. Below you will find the steps we took to calculate the TP53 z-scores relative to diploid samples using the expression data of the RNA-seq platform, followed by an image depicting some of our results:
1. Download
the CNA information "Copy-number Alterations (OQL is not in
effect)" and the expression values "mRNA
expression (RNA Seq V2 RSEM)"of the RNA seq platform directly
from the downloads tab on cBioPortal. There are 166 samples with
expression data.
2. Calculate the mean of the diploid samples using only the
expression values of the samples categorized as "0" in the CNAs
3. Calculate the standard deviation of the diploid samples using
only the expression values of the samples categorized as "0" in the
CNAs.
Apply the formula provided by cBioPortal: zScore <- (value - mean)/sd:
Expression value – mean (of only diploid
("0") samples)
Standard deviation (of only diploid ("0") samples)
4. Subtract
the mean (1878.739997) from each of the expression values.
5. The resulting value for each sample was divided by the standard
deviation (830.7615069), to obtain the manually calculated
z-scores (Green).
6. Compare the obtained value with the direct z-scores downloaded
from cBioPortal (Yellow).

If you compare the z-scores that we generated (Green column) with the z-scores found in cBioPortal (Yellow column) you can observe that there is a small difference in the values. I am sending the image as an attachment in case the image inserted in the text is not visible. We would like to reiterate that we only observe these discrepancies when we calculate the z-scores for the RNA-seq platform. All values calculated for the Agilent and Affymetrix platforms match perfectly (not shown).
We would appreciate any insights you can provide about why we might be obtaining different results compared to cBioPortal.
We appreciate your time and support.
Best regards,
Leopoldo and Sara.
Dear JJ,
Dr. Piccirillo and I contacted you earlier this week inquiring if the expression values from the Affymetrix U133 microarray platform that are available in cBioPortal are logged transformed or if they have been processed using normalization methods, such as RMA. We are using data from the Glioblastoma Multiforme (TCGA, Firehose Legacy) study for our analyses.
We found the following information in the reference listed in the Glioblastoma (TCGA, Cell 2013) study:
“mRNA Expression and Expression Subtypes
Transcriptomic Subtype Classification
CEL files of 543 samples from HT-HG-U133A gene expression array platform were downloaded from the Data Portal and preprocessed using quantile normalization and RMA through the aroma package, in combination with a gene-centric CDF (Bengtsson et al., 2008). Gene centric expression values were transformed to 2 based logarithm scale and median centered.” (see supplemental methods; page S7)
I have included this reference (Brennan et al., 2013) attached to this message.
Is this the method that was used to generate the expression values for the U133 Affymetrix platform of the Glioblastoma Multiforme (TCGA, Firehose Legacy) study?
We are really looking forward to receiving your reply. We appreciate your time and support.
Best regards,
Leopoldo Garcia and Sara Piccirillo.