get gene expression data from cBioPortal TCGA dataset

1,550 views
Skip to first unread message

11033...@qq.com

unread,
Jul 26, 2017, 10:53:37 AM7/26/17
to cBioPortal for Cancer Genomics Discussion Group
Hi,
I am a freshman in gene expression analysis and this is my first time to download data from cBioportal. I want to get the data of the expression level of PTEN (a tumor suppressor gene) in colorectal cancer (specifically, from TCGA colorectal project). After downloading data from http://www.cbioportal.org/study?id=coadread_tcga#summary, I got a folder containing multiple files like this (in the attached figure). I do not know the exact meaning of each file and cannot figure out which file should I use to get the expression level of PTEN gene. It is highly possible the information I need is in these 4 files: 
data_expression_median.txt 
data_mRNA_median_Zscores.txt
data_RNA_Seq_v2_expression_median.txt
data_RNA_Seq_v2_mRNA_median_Zscores.txt
Could anyone help me? Thanks a lot!

Yang
tcga_data_list.JPG

Tali Mazor

unread,
Jul 26, 2017, 12:05:12 PM7/26/17
to 11033...@qq.com, cBioPortal for Cancer Genomics Discussion Group
There’s also additional information available in the files you downloaded that begin with “meta_”.


On July 26, 2017 at 11:58:43 AM, Tali Mazor (tma...@jimmy.harvard.edu) wrote:

Hi Yang,

The easiest way to get PTEN expression is to run a query for PTEN in the TCGA colorectal study. You can select whether you want to include microarray or RNA-seq based expression (see screenshot). When you run the query, you can then download just the PTEN Z-score for all samples from the Downloads tab.

In terms of the files you downloaded, the first two are microarray-based expression and the second two are RNA-seq-based expression. The first file for each type is the expression values, processed according to standard TCGA protocols and the second is the Z-scores for each.

-Tali

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/1198224b-be20-4795-a228-b98519cba609%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
CB00CFC0-8BCC-4D02-80E9-D14E63294718

11033...@qq.com

unread,
Jul 27, 2017, 9:23:43 AM7/27/17
to cBioPortal for Cancer Genomics Discussion Group, 11033...@qq.com
Thanks! Your reply is of great help to me!
May I ask another question, I want to get the MSI(microsatelite instability) information for each sample but cannot find where it is. I remember that TCGA data should have the MSI information. Do you know where can I find it?
在 2017年7月27日星期四 UTC+8上午12:05:12,Tali Mazor写道:

Tali Mazor

unread,
Jul 27, 2017, 9:28:28 AM7/27/17
to cBioPortal for Cancer Genomics Discussion Group, 11033...@qq.com, 11033...@qq.com
MSI information is available in the published TCGA colorectal cohort (Colorectal Adenocarcinoma TCGA Nature 2012) but not in the provisional dataset that you downloaded. If you load up the published study (http://www.cbioportal.org/study?id=coadread_tcga_pub#clinical), MSI status is available for each sample:

11033...@qq.com

unread,
Jul 28, 2017, 8:52:38 AM7/28/17
to cBioPortal for Cancer Genomics Discussion Group, 11033...@qq.com

Thank you very much!
在 2017年7月27日星期四 UTC+8下午9:28:28,Tali Mazor写道:
Reply all
Reply to author
Forward
0 new messages