RNA-seq raw data transformation

82 views
Skip to first unread message

Marina Oteiza

unread,
Jul 26, 2024, 1:11:29 PM7/26/24
to cBioPortal for Cancer Genomics Discussion Group
Dear cBioPortal team,

First of all, I would like to thank you for the outstanding work you have accomplished with cBioPortal. Your platform has been an invaluable resource for the research community, providing comprehensive and accessible cancer genomics data.

I'm a college student doing a summer internship in  Vall d'Hebron Instituto de Oncología (VHIO) and I'm currently working with RNA-seq raw data form the dataset Prostate Adenocarcinoma (TCGA, PanCancer Atlas). 

I have downloaded the raw counts data from GDC but I'm struggling trying to get the same input as the one used in cBioPortal to calculate Z-scores (data_mrna_seq_v2_rsem.txt). The transformation method isn't in the documentation and I also have read RSEM's article, but haven't found any explanation.

I have read in forums that it consists of dividing all "raw_count" values by the 75th percentile of the column (after removing zeros) and multiplying that by 1000. I have transformed my data using this method but i get much lower values than the ones in cBioPortal, apart form the fact that cBioPortal's file contains negative values, which cannot appear using this percentile transformation.

Regarding TCGA-PRAD study: is this the transformation being applied to raw data? Are there other transformations applied? cBioPortal uses unstranded (raw data) or tpm_unstranded data of RNA-seq? Thank you so much!

Kind regards,
Marina

de Bruijn, Ino

unread,
Jul 30, 2024, 7:04:05 PM7/30/24
to Marina Oteiza, cBioPortal for Cancer Genomics Discussion Group, Madupuri, Ramyasree

Hi Marina,

 

Thanks for reaching out!

 

Unfortunately I’m not very familiar with the RSEM processing. On the cBioPortal side we don’t do any RSEM processing ourselves as far as I know. There’s only z-score calculations

 

Looping in @Madupuri, Ramyasree who might be more familiar with where the RNASeq RSEM data for TCGA Pancan comes from and the transformations we apply

 

Best wishes,

Ino

 

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/3af0a20e-754b-4121-9b3a-972df6e5363cn%40googlegroups.com.

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Disclaimer ID:MSKCC

Madupuri, Ramyasree

unread,
Jul 31, 2024, 5:42:11 PM7/31/24
to de Bruijn, Ino, Marina Oteiza, cBioPortal for Cancer Genomics Discussion Group
Hi Marina,

The RSEM data for TCGA Pancer datasets in the portal is from the Pancancer Atlas Publication directly - https://gdc.cancer.gov/about-data/publications/pancanatlas. No additional processing was done by us. 

The raw expression data was normalized and batch corrected to address platform differences, with additional adjustments for sequencing centers etc. For detailed data processing information you can refer to the  'RNA Data Batch Correction' section in the Pancan paper: https://pubmed.ncbi.nlm.nih.gov/29625048/.

Hope this helps!

Thanks,
Ramya

From: de Bruijn, Ino <debr...@mskcc.org>
Sent: Tuesday, July 30, 2024 7:03 PM
To: Marina Oteiza <marina...@vhio.net>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>
Cc: Madupuri, Ramyasree <madu...@mskcc.org>
Subject: Re: [EXTERNAL] [cbioportal] RNA-seq raw data transformation
 

Marina Oteiza

unread,
Aug 12, 2024, 7:37:03 AM8/12/24
to cBioPortal for Cancer Genomics Discussion Group

Hello,

Thank you for your responses. The information and resources provided have been really helpful.

Thanks again for your support!

Kind regards,
Marina

Reply all
Reply to author
Forward
0 new messages