What is the unit of RNASeq v2 mRNA expression profile in a cross-cancer query?

1,024 views
Skip to first unread message

Yong Huang

unread,
Jul 16, 2017, 8:09:55 PM7/16/17
to cBioPortal for Cancer Genomics Discussion Group

Hi there,


I am trying to get a sense on mRNA expression levels of certain genes in different types of cancers. I queried Cbioportal TCGA studies and was able to get gene expression plots based on RNASeq v2 data (here is an example of cross-cancer gene expression plot), but it is not clear to me what is the unit of the expression value. As this is cross-cancer query, I assume it shouldn't be z-score, am I right?


Thanks!

Yong

RNAseq v2 cancer sample.jpg

JJ Gao

unread,
Jul 17, 2017, 11:23:26 AM7/17/17
to Yong Huang, cBioPortal for Cancer Genomics Discussion Group
Hi Yong,

It should be log value of RNA-Seq V2 as indicated in the y-axis label. The details about RNA-Seq V2 is here (https://wiki.nci.nih.gov/display/tcga/rnaseq+version+2). We use ".rsem.genes.normalized_results" in the portal.

Best,
-JJ

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+unsubscribe@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/b8522467-dc88-4c87-bb83-f5456176131e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JJ Gao

unread,
Jul 17, 2017, 12:24:08 PM7/17/17
to Yong Huang, cbiop...@googlegroups.com
For the RNASeq v2 in the cBioPortal, it is reasonable to compare them across studies. 

To download clinical data, you can go to the study view page by clicking on the summary icon.

Inline image 1


-JJ

ps. please cc'ing the mailing list (cbiop...@googlegroups.com) when replying. Thanks.

On Mon, Jul 17, 2017 at 12:17 PM, Yong Huang <yhu...@optiviabio.com> wrote:
Thanks JJ for your prompt response, really appreciate!

I am new to bioinformatics, my question boil down to: can RNASeq v2 be used to compare the expression of same genes in different cancer types? 

Plus, is there a way I download patient sample info, such as cancer stage, from Cbioportal?

Thank you so much!
Yong

Sent from my iPhone

archana....@gmail.com

unread,
Jul 19, 2019, 9:04:00 AM7/19/19
to cBioPortal for Cancer Genomics Discussion Group
Dear Yong Haung,

Thanks for your response, yes you are right it is not a z-score. I am unable to download  point mutations expression data from RNASeq v2(Log2) plots. As shown in the image, each gene ( EGFR) has several mutations, I have provided an image, which shows a point mutation and its expression data. Could you please suggest me how to download whole mRNA expression data for point mutations (Missence driver and Missense (VUS) and it is not z score Please find the following attachment.

Thanks!
Archana 

cbio exp data.png

JJ Gao

unread,
Jul 19, 2019, 11:54:00 AM7/19/19
to archana....@gmail.com, cBioPortal for Cancer Genomics Discussion Group
Dear Archana,

You can download the non-zsore mRNA data in the Download query interface (see screenshot below) or download the whole dataset from datahub (https://github.com/cBioPortal/datahub/). 

image.png

However, the gene expression data are not specific to the mutant genes. In your screenshot, the mutation coloring indicates the sample is mutated, but the expression is for EGFR for both wildtype and mutant.

Best,
-JJ

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.

To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.

JJ Gao

unread,
Jul 20, 2019, 1:23:50 PM7/20/19
to archana katta, cBioPortal for Cancer Genomics Discussion Group
Hi Archana,

Would you please send us a screenshot of the error?

You can also go to the datasets page (https://www.cbioportal.org/datasets) to download all files in a study.

Best,
-JJ

ps. please keep cbiop...@googlegroups.com in the email. Thanks.

To download from datahub, you can go to this folder (https://github.com/cBioPortal/datahub/tree/master/public) and download the data of any studies. 

On Sat, Jul 20, 2019 at 12:56 AM archana katta <archana....@gmail.com> wrote:
Dear JJ Gao,

Thank you for your response and advise, I tried to download the non z score data from downloads which was shown in the screenshot, but it was shown there is an internal server error. Could you please suggest me how to download data from datahub (https://github.com/cBioPortal/datahub/). Thank you in advance.

Regards
Archana Katta

JJ Gao

unread,
Jul 23, 2019, 8:10:30 AM7/23/19
to archana katta, cBioPortal for Cancer Genomics Discussion Group
Dear Archana,

Thanks for the screenshot. We will look into that.

Unfortunately, we only have the gene expression data, not mutation expression data.

Best,
-JJ

On Mon, Jul 22, 2019 at 11:20 PM archana katta <archana....@gmail.com> wrote:
Dear Gao,

Sorry for the delayed response, I could download expression data of gene in each sample study but I need each mutation expression data of a gene. Could you please confirm me, whether I can download the mutation expression data or not, which was shown in the past screenshot (Box Plot). Please find the following attachment of screenshot regarding Internal server error. Thank you for your consideration.


Best Regards,
Archana Katta

JJ Gao

unread,
Jul 23, 2019, 11:05:07 PM7/23/19
to archana katta, cBioPortal for Cancer Genomics Discussion Group
Hi Archana,

I am curious where you are copying mutation expression data? Just to be clear, we don't have any mutation expression data on cbioportal.org. What you see in Expression tab (your screenshot) is gene expression. If a mutation occurred to a sample, we show the mutated sample as a different glyph, but it doesn't mean the expression value is for the mutation.

Best,
-JJ

On Tue, Jul 23, 2019, 10:57 PM archana katta <archana....@gmail.com> wrote:
Dear Gao,

Thanks for your concern, My problem is copying mutation expression data for several genes is  a time consuming process so I requested you people to suggest me but,  unfortunately you don't have mutation expression data. Many thanks for your advises and time.


Regards
Archana Katta

JJ Gao

unread,
Jul 26, 2019, 7:40:53 AM7/26/19
to archana katta, cBioPortal for Cancer Genomics Discussion Group
Hi Archana,

We are using ".rsem.genes.normalized_results" data from RSEM output for TCGA. You can find some detailed explanation here: https://www.biostars.org/p/106127/

Please feel free to contact us (cbiop...@googlegroups.com) if you have more questions.

Best,
-JJ


On Fri, Jul 26, 2019 at 5:06 AM archana katta <archana....@gmail.com> wrote:
Dear Gao,

Could you please tell me what is the unit of  RNASeq V2 mRNA expression? Thank you for considering my requests.


Regards
Archana Katta






On Wed, Jul 24, 2019 at 12:25 PM archana katta <archana....@gmail.com> wrote:
Dear Gao,

Thank you for your information, we are aware of that information sir. We are collecting mutated gene expression data for specific point mutation which was shown in the screenshot. Could you please suggest me how to get the information for specific point mutation. Thank you.


Regards
Archana Katta

JJ Gao

unread,
Jul 26, 2019, 8:44:41 AM7/26/19
to William Wright, cBioPortal for Cancer Genomics Discussion Group
Hi William,

I am not sure if I have an answer for the unit. We are using the normalized_results:

"The normalized results (normalized_count) is a simple transformation of the "raw_count" that you can do yourself to check. For gene level estimates you divide all "raw_count" values by the 75th percentile of the column (after removing zeros) and multiply that by 1000. The normalized file therefore does not take any external factors into account, but simply transforms each sample so the values are relative the 75th percentile with a x1000 adjustment factor."

-JJ

On Fri, Jul 26, 2019, 8:07 AM William Wright <wcharle...@gmail.com> wrote:
Like mentioned in the linked article, doesn't RSEM output isoform.results and gene.results...........but each of those files still has TPM, FPKM, and Counts. 
If I look at cBioportal's coexpression tab, I'll call the x-axis expression of geneX. If I were to write the expression as a sentence, I would say "The expression of geneX is 10 ____________"

Do I say 10 transcripts per million? 10 normalized counts? 10 fragments per kilobase per million mapped reads?




Reply all
Reply to author
Forward
0 new messages