Hello cBioportal and/or Nicholaus,
I performed a mRNA expression analysis of several genes (c-MET, HGF, ERBB2) in a TCGA data set for stomach adenocarcinoma, by comparing it with the distribution of expression from other TCGA tumor sets. I did this by downloading the RNA-seq V2 RSEM data and comparing in Prism program. Given your email explanation below, I thought that it would be meaningful to compare RNA-seq (RSEM) TCGA data across different tumor types, as the experiments should have been run similarly with similar controls/normalization and analysis. The results are presented in slides 3-5 in the attached file.
My first question is that the only available RNA-seq data available to download for all tumor sets is indicated as “RNA-Seq V2 RSEM” data in the cBioportal browser, however, for query options, the RNA-seq data option is indicated as “z-score RNA0seq VS RSEM” data”. The downloaded data does not look like Z-scores (which are relatively small numbers, being the +/- st. deviations around the mean of the normalized data). The downloaded data varies generally over 3-3.5 logs. My question is whether the query mRNA expression data presented in the Oncoprint is from “z-score RNA-Seq V2 RSEM” data, or from non-z-score RNA-seq data??
My second question is whether the mRNA RNA-seq V2 RSEM data for TCGA Nature 2014 stomach adenocarcinoma is the same type of data as for the other TCGA tumor sets also indicated as “ RNA seq V2 RSEM” data? The reason I ask is that the relative levels of mRNA from RNA-seq (RSEM) data for the 3 genes are much lower from the stomach adenocarcinoma data set than all the other tumors types (see attached slides #3-5). I found a paper that indicates that there are housekeeping genes encoding proteins involved in the proteosome and RNA spliceosome that have the least variable mRNA expression across different tumor types, when TCGA tumor data sets plus their own tumor set are analyzed by RNA-seq RPMK method (PloS One, attached). In the slides attached (slide #6), each of 4 housekeeping genes they identified were had much lower expression levels in the stomach adenocarcinoma tumors relative to the other types.
The collective data suggests that there is something different about the RNA-seq (RSEM) data from TCGA Nature 2014 stomach adenocarcinoma tumor set, and was wondering if you knew why? This might affect the analysis of others.
Many thanks
John
////////////////////////
John Winslow, Ph.D
Director, Oncology R&D
Monogram Bioscience, Inc
345 Oyster Point Blvd
South San Francisco, CA 94080
From: Nikolaus Schultz [mailto:nsch...@gmail.com]
Sent: Tuesday, June 30, 2015 8:59 PM
To: cbiop...@googlegroups.com
Cc: Winslow, John
Subject: Re: questions about mRNA expression data
Dear John,
Thank you for contacting us. I hope the following responses will be useful:
1) The expression of each gene in each cohort is normalized separately. Each gene is compared to the mean level of expression of all samples in the cohort, for which the gene is diploid. The z-score indicates the number of standard deviations from the mean of the expression distribution in the diploid samples.
2) You can compare expression levels of a single gene across studies, as long as all studies were normalized the same way. You can assume this to be the case for all TCGA studies that have RNA-Seq V2 RSEM data. We are actually planning to offer this as a feature in the future, hopefully later this year.
Niki.
On Jun 30, 2015, at 4:42 PM, Winslow, John <Win...@labcorp.com> wrote:
Hi,
First I would like to say that I have been using the cBioportal website analysis of cancer genes and it’s a great tool. Recently I have been looking at relationships between the expression of different genes within a tumor set, and between tumor sets. I have a few questions which if you can help me with the answers, it could given me an idea if I am interpreting the data correctly or not.
1) I understand from your explanation of the mRNA Z-score that you set the relative expression of a given gene in a tumor set to the gene’s distribution in a reference population (the mean). Do you know what that is for the TCGA 2012 Nature breast cancer set? I am assuming that this normalized data for a given gene cannot then be compared across mRNA Z-scored data from a different tumor set because the gene’s distribution could be different in different tumor types, resulting in a different mean and stdevs? (ie TCGA breast vs TCGA lung?).
2) If one cannot compare mRNA expression of a gene across different tumor types if it expressed relative to its distribution in a reference population, can one compare gene expression across different tumor types using RNA-seq V2 RSEM data? For example, I would like to compare mRNA expression of different genes in different tumor sets by using TCGA provisional data sets analyzed by RNA-seq V2 RSEM. Would it be valid to do so? Is the RNA-seq VS RSEM data normalized some way within a tumor type, say to account for total RNA level? If so, is that method consistent across the different tumor types say within the same group (ie TCGA vs Broad) so as to allow cross-tumor comparisons? If there selection of tumor sets or mRNA expression analysis that allows comparison of gene expression across different tumor types, I’d appreciate it if you would let me know.
Thanks,
John Winslow
///////////////////////
John Winslow, Ph.D
Director, Oncology R&D
Monogram Bioscience, Inc
345 Oyster Point Blvd
South San Francisco, CA 94080
-This e-mail and any attachments may contain CONFIDENTIAL information, including PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or disclosure of this information is STRICTLY PROHIBITED; you are requested to delete this e-mail and any attachments, notify the sender immediately, and notify the LabCorp Privacy Officer at privacy...@labcorp.com or call (877) 23-HIPAA / (877) 234-4722.
-This e-mail and any attachments may contain CONFIDENTIAL information, including PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or disclosure of this information is STRICTLY PROHIBITED; you are requested to delete this e-mail and any attachments, notify the sender immediately, and notify the LabCorp Privacy Officer at privacy...@labcorp.com or call (877) 23-HIPAA / (877) 234-4722.