Questions about survival plot and co-expression plot

583 views
Skip to first unread message

김세미

unread,
Sep 1, 2014, 11:02:42 AM9/1/14
to cbiop...@googlegroups.com

Hello,

I found cbioportal site is so helpful for my cancer research. I like to thank the people for generating this tool. I read two papers (Sci. Signalling (2013); Cancer Dis. (2012)), however, I have a few questions for using cbioportal.

 

Q1. How can I calculate survival plot when expression of gene of interest is up and downregulated.

for example, in colorectal adenocarcinoma (TCGA, Nature 2012), ERBB2 mRNA is upregulated in 20 of 244 samples and downregulated in 1 of 244 samples. Survival Tab shows survival plot for ‘Cases with Alteration(s) in Query Gene(s)’ vs ‘Cases without Alteration(s) in Query Gene(s)’; I think ‘Alteration(s)’ includes both upregulation and downregulation.

How can I get survival plot of cases with upregulated ERBB2 vs remaining patients?

 

Q2. At the Co-expression Tab, I found that Pearson and Spearman coefficents are calculated automatically.

How can I get p-value to see if those coefficients(correlation) are statistically significant?

 

Thank you very much.

 

Best regards,

 

Semi Kim

 

-------------------------------------------------------------

Semi Kim, Ph.D

Principal Research Scientist, Immunotherapy Research Center

Korea Research Institute of Bioscience and Biotechnology (KRIBB)

 

125 Gwahak-ro, Yuseong-gu, Daejon 305-806, Rep. of Korea

T:+82-42-860-4228  F:+82-42-860-4149

E-mail: sem...@kribb.re.kr

-------------------------------------------------------------

 

Hyun-hwan Jeong

unread,
Sep 1, 2014, 9:25:51 PM9/1/14
to cbiop...@googlegroups.com, sem...@kribb.re.kr

To do both works, I recommend you use R and make the scripts for the works because this site does not give you additional customization functions for your needs.

cbioportal provides cdgsr - http://cran.r-project.org/web/packages/cgdsr/index.html, the R packages which links between database of the site your machine. Using the package, you may get data for your works and handle the data as what you want.

For the p-value calculation, this function of R - https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.test.html will useful for your works.

You can also get the survival plot using survfit function - http://stat.ethz.ch/R-manual/R-devel/library/survival/html/plot.survfit.html.

Hyun-hwan Jeong.



2014년 9월 2일 화요일 오전 12시 2분 42초 UTC+9, 김세미 님의 말:

JianJiong Gao

unread,
Sep 2, 2014, 10:52:01 PM9/2/14
to sem...@kribb.re.kr, cbiop...@googlegroups.com, yichao@cbio.mskcc.org Sun
Hi Semi,

Thanks for your nice feedback.

For you question one, You may want to use Onco Query Language
(http://www.cbioportal.org/public-portal/onco_query_lang_desc.jsp).
For example "ERBB2: EXP>2" queries ERBB2 upregulated cases vs others.

Currently, we are not calculating p-values for the co-expression
analysis. We will discuss if we want to add that in a future release.

Best,
-JJ
> --
> You received this message because you are subscribed to the Google Groups
> "cBioPortal for Cancer Genomics Discussion Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cbioportal+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

JianJiong Gao

unread,
Sep 5, 2014, 3:41:48 PM9/5/14
to 김세미, cbiop...@googlegroups.com, Anders Jacobsen Skanderup
Hi Semi,

Please see my comments below.

> 1) For determining mycaselist, I found there are options such as '3way_complete', 'all', etc.
> 2) For determining mygeneticprofile, I found there are options such as 'rna_seq_mrna', 'mrna_median_Zscores', 'rna_seq_mrna_median_Zscores', etc. (probably depending on the corresponding CancerStudy)
> I need to analyze correlation of two genes expression. Which options should be selected or preferred/rational?

You can selelct cases_all.txt and any one of the mrna profiles. After
retrieving the data, all samples without data can be removed.


> 3) I tried to use plot function, however, I got the message like this
> [1] "Error: empty data frame returned :\n Error..Problem.when.identifying.a.cancer.study.for.the.request."
>
> How can I solve this problem?

Not sure. I'm cc'ing Anders -- the author of CGDS-R on this.

> 4) There are multiple cancerstudies published/provisional regarding same cancer type.
> Is it usual to collect data from all studies for analysis? Or, is it more general/rational to analyze individually each study?

For expression analysis, it's better to analysis study by study. Merge
expression data across multiple studies may not be trivial especially
when different platforms were sued.

-JJ

김세미

unread,
Sep 10, 2014, 10:31:38 AM9/10/14
to JianJiong Gao, cbiop...@googlegroups.com, Anders Jacobsen Skanderup
Hello, JianJiong
Thank you very much for your kind reply.
It is of big help to me.

Best regards,

Semi Kim

김세미

unread,
Sep 11, 2014, 11:19:11 AM9/11/14
to JianJiong Gao, cbiop...@googlegroups.com, Anders Jacobsen Skanderup
Hello, JianJiong,
First, cgdsr plot function is now working. Thanks,
Second, I have a question regarding two colorectal adenocarcinoma studies (TCGA, Nature 2012, and TCGA, provisional). I' m wondering if datasets from these two studies overlap. When I retrieved microarray dataset ('mrna_median_Zscores' selected for mygeneticprofile) to get co-expression plot between two genes, I found two plots almost identical. First five values are shown below.


From TCGA, Nature 2012
TCGA.A6.2670 0.2818 1.6166
TCGA.A6.2671 NaN NaN
TCGA.A6.2672 0.6070 1.1957
TCGA.A6.2674 3.5336 1.3179
TCGA.A6.2676 0.8523 -1.5976
.....

From TCGA, provisional
TCGA.A6.2671 NaN NaN
TCGA.A6.2672 0.6522 1.2310
TCGA.A6.2674 3.6603 1.3563
TCGA.A6.2675 NaN NaN
TCGA.A6.2676 0.9043 -1.6337
....


Best regards,
Semi Kim



-----Original Message-----
From: 김세미 [mailto:sem...@kribb.re.kr]
Sent: Wednesday, September 10, 2014 7:23 PM
To: 'JianJiong Gao'
Cc: 'cbiop...@googlegroups.com'; 'Anders Jacobsen Skanderup'
Subject: RE: Questions about survival plot and co-expression plot

Hello, JianJiong
Thank you very much for your kind reply.
It is of big help to me.

Best regards,

Semi Kim

-----Original Message-----
From: JianJiong Gao [mailto:jg...@cbio.mskcc.org]
Sent: Saturday, September 06, 2014 4:41 AM
To: 김세미
Cc: cbiop...@googlegroups.com; Anders Jacobsen Skanderup
Subject: Re: Questions about survival plot and co-expression plot

JianJiong Gao

unread,
Sep 11, 2014, 12:46:12 PM9/11/14
to 김세미, cbiop...@googlegroups.com
Hi Semi,

The microarray-based mRNA expression for the published and provisional
colorectal studies should be very close. The difference you saw was
mainly due to different reference population for calculating z-scores.
Scatter plots should look very similar. I would recommend to use
non-zscore expression profiles (e.g. coadread_tcga_mrna or maybe
better coadread_tcga_rna_seq_v2_mrna) for co-expression analysis
because they are more complete.

-jj
Reply all
Reply to author
Forward
0 new messages