Question regarding mRNA expression z-scores relative to diploid samples

329 views
Skip to first unread message

Leopoldo A. Garcia-Montaño

unread,
Nov 9, 2020, 10:55:21 AM11/9/20
to cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu

Dear cBioPortal support team,

This is Leopoldo Garcia and Sara Piccirillo from the Health Science Center of the University of New Mexico.

We are asking your help to reproduce the mRNA expression z-score values relative to diploid samples from the Glioblastoma Multiforme (TCGA, Firehose Legacy) study. We use the information provided in the following link to calculate the z-score values relative to diploid samples for the TP53 gene:

https://github.com/cBioPortal/cbioportal/blob/master/docs/Z-Score-normalization-script.md

We were able to reproduce the z-scores for the Affymetrix and Agilent platforms but not for the RNA-Seq platform. Could you please tell us if there are any additional steps we should consider?

We would appreciate your support on this matter.

 

Best regards,

Leopoldo and Sara. 

Leopoldo A. Garcia-Montaño

unread,
Nov 13, 2020, 7:15:25 PM11/13/20
to cBioPortal for Cancer Genomics Discussion Group, Aaron Lisman, JJ Gao
Dear cBioPortal support team, 

Dr. Piccirillo and I contacted you early this week inquiring about certain aspects of the z-score transformation process relative to diploid samples. We are really looking forward to receiving your reply. We appreciate your time and support. 

Best regards,

Leopoldo Garcia and Sara Piccirillo. 

Leopoldo A. Garcia-Montaño

unread,
Nov 13, 2020, 7:18:16 PM11/13/20
to cBioPortal for Cancer Genomics Discussion Group, Aaron Lisman, JJ Gao, SPicc...@salud.unm.edu
Dear cBioPortal support team, 

Dr. Piccirillo and I contacted you early this week inquiring about certain aspects of the z-score transformation process relative to diploid samples. We are really looking forward to receiving your reply. We appreciate your time and support. 

Best regards,

Leopoldo Garcia and Sara Piccirillo. 

---------- Forwarded message ---------
From: Leopoldo A. Garcia-Montaño <leogarci...@gmail.com>
Date: Mon, Nov 9, 2020 at 8:55 AM
Subject: Question regarding mRNA expression z-scores relative to diploid samples
To: cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>
Cc: <SPicc...@salud.unm.edu>


JJ Gao

unread,
Nov 13, 2020, 7:37:48 PM11/13/20
to Leopoldo A. Garcia-Montaño, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Hi Leopoldo,

Sorry for the late reply. 

Do you mean that you were not able to generate the same z-score values? Would you please give us some more details about the discrepancies? 

Thanks,
-JJ

Leopoldo A. Garcia-Montaño

unread,
Nov 13, 2020, 8:12:25 PM11/13/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Hi JJ,

We were able to generate the same z-score values relative to diploid samples only for the Affymetrix and Agilent platforms. However, when we tried generating the z-score values for the RNA-seq data, the values were slightly different. We followed the same procedure for all 3 platforms, according to the link provided on cBioPortal:

 
We are wondering if there are any additional steps we should consider to do the z-score transformation using the RNA-seq data.

Thanks,

Leo.

 

Leopoldo A. Garcia-Montaño

unread,
Nov 16, 2020, 2:12:39 PM11/16/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu

Dear JJ,

As a follow-up to our last message regarding the generation of z-scores relative to diploid samples, we thought it might be helpful to share with you our approach to replicate the values and the discrepancies we observe. Below you will find the steps we took to calculate the TP53 z-scores relative to diploid samples using the expression data of the RNA-seq platform, followed by an image depicting some of our results:

1.  Download the CNA information "Copy-number Alterations (OQL is not in effect)" and the expression values "mRNA expression (RNA Seq V2 RSEM)"of the RNA seq platform directly from the downloads tab on cBioPortal. There are 166 samples with expression data.
2. Calculate the mean of the diploid samples using only the expression values of the samples categorized as "0" in the CNAs
3. Calculate the standard deviation of the diploid samples using only the expression values of the samples categorized as "0" in the CNAs.

Apply the formula provided by cBioPortal: zScore <- (value - mean)/sd


 Expression value – mean (of only diploid ("0") samples)
Standard deviation (of only diploid ("0") samples)

 

4. Subtract the mean (1878.739997) from each of the expression values.
5. The resulting value for each sample was divided by the standard deviation (830.7615069), to obtain the manually calculated z-scores 
(Green).
6. Compare the obtained value with the direct z-scores downloaded from cBioPortal 
(Yellow).

 

 image.png


If you compare the z-scores that we generated (Green column) with the z-scores found in cBioPortal (Yellow column) you can observe that there is a small difference in the values. I am sending the image as an attachment in case the image inserted in the text is not visible. We would like to reiterate that we only observe these discrepancies when we calculate the z-scores for the RNA-seq platform. All values calculated for the Agilent and Affymetrix platforms match perfectly (not shown).

We would appreciate any insights you can provide about why we might be obtaining different results compared to cBioPortal.

We appreciate your time and support.

Best regards,

Leopoldo and Sara.

TP53 Z-score values relative to diploid samples RNA-seq.png

JJ Gao

unread,
Nov 16, 2020, 5:13:10 PM11/16/20
to Leopoldo A. Garcia-Montaño, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Hi Leo,

Thanks for the detailed explanation. Your process is consistent with how we calculate z-scores. We will look into this. Could you also send us your spreadsheet?

BTW, we recommend using the profile of "scores relatevie to all samples (log RNA seq)" since logged RNA seq data are more normal-like.

Best,
-JJ

Leopoldo A. Garcia-Montaño

unread,
Nov 17, 2020, 3:10:03 AM11/17/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Hi JJ,

I am sending you the excel file with the steps we took to perform the z-score transformation using the RNA-Seq data.

We noticed the profile with the logged RNA-seq data and we observed that those z-scores have been calculated relative to all samples. We did not have any problems generating those z-scores. We only observe the discrepancies when we tried to generate the TP53 z-scores relative to diploid samples. 

Please, let me know if you have any questions or require further information.   

Best,
Leo.
TP53 Z-score values relative to diploid samples RNA-seq.xlsx

JJ Gao

unread,
Nov 17, 2020, 3:47:52 PM11/17/20
to Leopoldo A. Garcia-Montaño, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
HI Leo,

Thanks for the additional information.

It looks like we have some bug in our code for TCGA (your process is correct). We only use one sample per TCGA patient, but some TCGA patients have multiple. You can find more details here: https://github.com/cBioPortal/datahub/issues/1327

Thanks for reporting this to us and we will fix the issue in future releases.

Best,
-JJ

Leopoldo A. Garcia-Montaño

unread,
Nov 17, 2020, 8:07:31 PM11/17/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Hi JJ,

Thank you so much for clarifying this matter to us. We really appreciate it.

Best,
Leo.

Leopoldo A. Garcia-Montaño

unread,
Nov 30, 2020, 6:48:50 PM11/30/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Dear JJ, 

Dr. Piccirillo and I greatly appreciate that you clarified our doubts regarding the z-scores relative to diploid samples. 

In a highly related topic, we would like to ask the following questions:

1. Are the "raw" expression values from the Affymetrix U133 microarray platform logged transformed?
2. Have the expression values from this platform been normalized using RMA or any other normalization method?

We tried finding this information using different cBioPortal resources, but we could not find an answer. 

We will appreciate any information you can provide. 

Best regards,

Leo.

JJ Gao

unread,
Dec 7, 2020, 11:13:15 AM12/7/20
to Leopoldo A. Garcia-Montaño, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Dear Leopoldo and Dr. Piccirillo,

Apologies for the delayed response. The u133 data in Glioblastoma Multiforme (TCGA, Firehose Legacy) does not appear to be median centered log transformed (all values are positive). You might want to consider using the data "mRNA expression" in the study "Glioblastoma (TCGA, Nature 2008)" as that was log transformed.

-JJ

On Thu, Dec 3, 2020 at 9:00 PM Leopoldo A. Garcia-Montaño <leogarci...@gmail.com> wrote:

Dear JJ, 

 

Dr. Piccirillo and I contacted you earlier this week inquiring if the expression values from the Affymetrix U133 microarray platform that are available in cBioPortal are logged transformed or if they have been processed using normalization methods, such as RMA. We are using data from the Glioblastoma Multiforme (TCGA, Firehose Legacy) study for our analyses.

 

We found the following information in the reference listed in the Glioblastoma (TCGA, Cell 2013) study:

 

“mRNA Expression and Expression Subtypes

Transcriptomic Subtype Classification

CEL files of 543 samples from HT-HG-U133A gene expression array platform were downloaded from the Data Portal and preprocessed using quantile normalization and RMA through the aroma package, in combination with a gene-centric CDF (Bengtsson et al., 2008). Gene centric expression values were transformed to 2 based logarithm scale and median centered.” (see supplemental methods; page S7)

 

I have included this reference (Brennan et al., 2013) attached to this message.

 

Is this the method that was used to generate the expression values for the U133 Affymetrix platform of the Glioblastoma Multiforme (TCGA, Firehose Legacy) study?

 

We are really looking forward to receiving your reply. We appreciate your time and support. 

 

Best regards,

 

Leopoldo Garcia and Sara Piccirillo. 

Leopoldo A. Garcia-Montaño

unread,
Dec 14, 2020, 7:27:59 PM12/14/20
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group, SPicc...@salud.unm.edu
Dear JJ, 

I apologize for the late response. Thank you so much for providing this information. Dr. Piccirillo and I greatly appreciate your support. We will be in contact again if we have more questions regarding the data from cBioPortal. 

I hope you are having a great day.

Best regards,

Leo. 
Reply all
Reply to author
Forward
0 new messages