cBioPortal RNASeq questions

NICHOLAS R RYDZEWSKI

unread,

Jan 9, 2022, 6:17:31 AM1/9/22

to cbiop...@googlegroups.com

Hello, my name is Nick Rydzewski and I appreciate all the work you have put into the cBioPortal database. I have been using the data for my own research and came across some datasets that I had questions about. It appears that for most RNAseq data there is an unnormalized dataset and a dataset you fed through the cbioportal normalization workflow. For some of the studies though the data listed as unnormalized appears to have already been normalized. I am wondering if this could be looked into just to confirm that these following datasets haven’t already been normalized based on other samples in the dataset. Thanks!

Both below are listed as data_RNA_Seq_v2_expression_median but the values are actually quite similar to data_RNA_Seq_v2_mRNA_median_all_samples_Zscores:

luad_oncosg_2020

stad_oncosg_2018

These 3 below are all listed as data_mrna_seq_fpkm and all from CPTAC but all 3 appear to have a different format:

brca_cptac_2020

lusc_cptac_2021

gbm_cptac_2021 (this one I think hasn’t been adjusted so question is about the above two)

Same here, all below are listed as data_mrna_seq_rpkm:

mel_tsam_liang_2017 (different samples will have similar values for certain genes, making me think a cross cohort normalization scheme was performed)

luad_cptac_2020 (has negative values)

difg_glass_2019 (this one I don’t think is adjusted for reference)

nepc_wcm_2016 – RNA_seq_expression_median (this one just had negative values and wanted to check if that would be expected)

And final question is about TCGA Pan Can Atlas data, this just may be due to some batch correction effect but I notice that only the studies listed below have an expression value (RNA_Seq_v2_expression_median) below 0 while all others min value is 0:

laml_tcga_pan_can_atlas_2018

coadread_tcga_pan_can_atlas_2018

esca_tcga_pan_can_atlas_2018

ov_tcga_pan_can_atlas_2018

prad_tcga_pan_can_atlas_2018

stad_tcga_pan_can_atlas_2018

ucec_tcga_pan_can_atlas_2018

I understand if these are just the direct files you got from the original studies, but I just wanted to have this looked into in case some were being processed unintentionally even when not listed as normalized. Thanks! I really appreciate all the work!

Best,

Nick Rydzewski

___________________________________

Nicholas Rydzewski, MD, MPH

Radiation Oncology Chief Resident

Department of Human Oncology

University of Wisconsin Hospital and Clinics

Yichao S

unread,

Jan 20, 2022, 2:32:52 PM1/20/22

to NICHOLAS R RYDZEWSKI, cbiop...@googlegroups.com

Hi Nick,

We've logged this issue in our github data repo and are actively working on it.

You can track it here: https://github.com/cBioPortal/datahub/issues/1586.

In the comment section there is a link to a spreadsheet, where we are adding the details of the normalization process from the publications, for the studies you've mentioned.

Once we finish logging there, we'll work on adding these details to the meta files for mRNA expression as well.

Best,

Yichao

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/SN6PR06MB6413B4609A6BFA3FB2AEEAA7D84D9%40SN6PR06MB6413.namprd06.prod.outlook.com.

NICHOLAS R RYDZEWSKI

unread,

Jan 20, 2022, 3:09:45 PM1/20/22

to yichao...@gmail.com, cbiop...@googlegroups.com

Thanks! I appreciate it. Part of the problem I noticed was that examples like luad_oncosg_2020, stad_oncosg_2018, lusc_cptac_2021, and I think mel_tsam_liang_2017 were normalized by gene even under the data_RNA_Seq_v2_expression_median/data_mrna_seq_fpkm/ data_mrna_seq_rpkm headings. If possible I was hoping to access the datasets that weren’t normalized by gene (for example I found the non gene normalized data for the luad_oncosg_2020 and lusc_cptac_2021 through their papers/associated websites). If not possible to get those it would be helpful to have a confirmation on those/other datasets that don’t have the Zscore headings that they have or haven’t been normalized by gene. Thanks!

Nick

ramyama...@gmail.com

unread,

Feb 9, 2022, 9:53:52 AM2/9/22

to cBioPortal for Cancer Genomics Discussion Group

Hi Nick,

We unfortunately do not have access to the raw datasets to these studies. We suggest you to reach out to the authors.

Also, we have updated the meta files with the normalization method and we have reached out to the authors for confirmation on certain studies (can be tracked in the sheet) and are still awaiting a reply.

Best,
Ramya

Reply all

Reply to author

Forward