Question about RPPA UCEC data

65 views
Skip to first unread message

Oscar Migueles

unread,
Dec 14, 2020, 3:45:14 PM12/14/20
to cbiop...@googlegroups.com
Dear cbioportal team,

I am looking at the UCEC Pancancer data available at cbioportal and I have a couple of questions regarding RPPA data, hope you can help me.

First, "data_rppa.txt" , what is it exactly? Is it a log ratio? Or some level normalized data?

Second, I tried to get z scores out of data_rppa.txt by doing: (x-mean)/std
but the values obtained and the ones from "data_rppa_Zscores.txt" are not the same, perhaps I am missing something.

I truly appreciate your time.

Best,

Oscar

JJ Gao

unread,
Dec 15, 2020, 11:52:41 AM12/15/20
to Oscar Migueles, cBioPortal for Cancer Genomics Discussion Group
Hi Oscar,

data_rppa.txt contains "reverse-phase protein array" or RPPA-based proteomics data. Here is more information about the data: https://bioinformatics.mdanderson.org/public-software/tcpa/. We use level 3 normalized data as described on the same page.

For zscore discrepancies, would you please give us some more details? e.g. how you calculated, any code, any example of discrepancies?

Thanks,
-JJ

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/CA%2BOuZTkpanFc0A%2B6x27KiHFQgRWyQ7qAmDCSHD1pUm2_KBBSwg%40mail.gmail.com.

JJ Gao

unread,
Dec 16, 2020, 10:12:02 AM12/16/20
to Oscar Migueles, cBioPortal for Cancer Genomics Discussion Group
Hi Oscar,

The RPPA data we used was part of the TCGA PanCancer Atlas publication, which might be different version from TCPA. Both data were from the same source (MD Anderson) so I wouldn't anticipate much difference.

Thanks for explaining the z-scores discrepancies. We will look into it: https://github.com/cBioPortal/datahub/issues/1345

Best,
-JJ

On Wed, Dec 16, 2020 at 10:08 AM Oscar Migueles <oscarmig...@gmail.com> wrote:
Sorry the other level 3 data that I looked at was from: https://tcpaportal.org/tcpa/download.html

Best,

Oscar

El mié, 16 dic 2020 a las 12:54, Oscar Migueles (<oscarmig...@gmail.com>) escribió:
Hi JJ,

Thank you very much for your reply. I think I get the basic idea behind the rppa assays now. Checking the level3 normalized data from https://bioinformatics.mdanderson.org/public-software/tcpa/ in terms of UCEC the number of samples differ from the cbioportal file, so I assume you used another version?

Regarding z scores, for example in R:

> data_rppa<-fread("~/ucec_tcga_pan_can_atlas_2018/data_rppa.txt", stringsAsFactors = FALSE)
> data_rppa_z<-fread("~/ucec_tcga_pan_can_atlas_2018/data_rppa_Zscores.txt", stringsAsFactors = FALSE)
> data_rppa_df<-setDF(data_rppa)
> data_rppa_z_df<-setDF(data_rppa_z)
> data_rppa_df$means<-rowMeans(data_rppa_df[,2:424], na.rm = TRUE)
> data_rppa_df$sd<-apply(data_rppa_df[,2:424],1, sd, na.rm = TRUE)

#First protein, first sample z score calculation
(data_rppa_df[1,2]-data_rppa_df[1,425])/data_rppa_df[1,426]
[1] 0.2542827
#Second protein, first sample z score calculation
 (data_rppa_df[2,2]-data_rppa_df[2,425])/data_rppa_df[2,426]
[1] -1.849435

#And these are the values in the zscore rppa file:
> data_rppa_z_df[1,2]
[1] 0.1918
> data_rppa_z_df[2,2]
[1] -1.8583

Thank you for your time and your attention.

Best,

Oscar



Reply all
Reply to author
Forward
0 new messages