Fwd: TCGA RNAseq RSEMv2 Bioinformatics Pipeline

537 views
Skip to first unread message

Sinkala Musalula

unread,
Apr 19, 2018, 10:53:00 AM4/19/18
to cbiop...@googlegroups.com
Hi,

With reference to the forwarded message below. We would like to find out if all TCGA cancer studies available at cBioPortal are obtained from the TCGA Legacy Archive which were processed using the pipeline given here: https://webshare.bioinf.unc.edu/public/mRNAseq_TCGA/UNC_mRNAseq_summary.pdf.

I look forward to your response.
Kind regards
Musalula


Begin forwarded message:

From: "Felau, Ina (NIH/NCI) [E]" <ina....@nih.gov>
Subject: TCGA RNAseq RSEMv2 Bioinformatics Pipeline
Date: 13 April 2018 at 6:10:23 PM SAST

Dear Dr. Musalula, 
 
Thank you for your email and interest in the TCGA data. Our apologies for the delayed response.
 
The two pipelines that you specify in your email pertain to the 1) current pipeline at Genomic Data Commons (GDC), and the 2) old pipeline that existed back in 2013. The pipeline that you should reference would depend on where you are getting the data, and what exact data you are using. 
 
Since you are using TCGA data from cBioPortal, I contacted someone from that team to inquire on which set of data they have made available in their site- the newly harmonized data from the GDC or the original TCGA data from TCGA Legacy Archive. In their response, they indicated that the harmonized data from the GDC has not been pulled at the cBioPortal yet. They further specified that, for the published TCGA studies, they used the data that were included in the publications. Overall, it looks like they are using the data from TCGA Legacy Archive, as retrieved back in 2016. 
 
Thus, if for your study you retrieved the data from cBioPortal after January 2016, then the pipeline used for that data is the following (I also confirmed this with the UNC team)-https://webshare.bioinf.unc.edu/public/mRNAseq_TCGA/UNC_mRNAseq_summary.pdf
 
Please note: The only caveat here is that we do not know what exact data you are working with from cBioPortal. You may need to contact cBioPortal, specify to them what dataset you are using and request confirmation from them that it was TCGA Legacy data pulled back in 2016. I was advised to provide you with the following email address-cbiop...@googlegroups.com  at cBioPortal. Please feel free to contact them accordingly.
 
For information about the GDC, please see the following website- https://gdc.cancer.gov/about-data
If you have any further questions about the GDC data, the GDC helpdesk (sup...@nci-gdc.datacommons.io ) would be best to help.
 
Let us know if you have any further questions.
 
Best,
 
The Cancer Genome Atlas Team
 
 
From: Sinkala Musalula [mailto:sms...@icloud.com] 
Sent: Saturday, March 31, 2018 5:15 AM
To: TCGA (NIH/NCI) <tc...@mail.nih.gov>
Subject: TCGA RNAseq RSEMv2 Bioinformatics Pipeline
 
Hi, 
 
We would like to analyse TCGA RNAseq data together with RNAseq data from the Expression Atlas. We have noticed that the FPKM reported by the TCGA are on average are much higher than those reported by the Expression Atlas for the same type of cancer. We have traced these variations (we hope) to the different Bioinformatics or mRNA quantification pipelines that were employed by the TCGA and Expression Atlas. 
 
Therefore, we would like to reanalyse the Expression Atlas FASTQ files or aligned BAM files using the TCGA Bioinformatics Pipeline to the make the FPKM values comparable. However, we have come across two different TCGA pipelines as provided in the links below:
 
 
 
Importantly, we would like to find out which among these two pipelines was used to process TCGA RNAseq RSEMv2 FPKM data such as that available as CbioPortal?
 
We look forward to your response.
 
Kind regards,
Sinkala

Nikolaus Schultz

unread,
Apr 27, 2018, 10:11:00 AM4/27/18
to Sinkala Musalula, cbiop...@googlegroups.com
Hi Sinkala,

Apologies for the late response. 

The response you received from TCGA is correct. We have the legacy TCGA data in cBioPortal, not the newly processed data from the GDC.

Niki.


--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/9E739056-B7E3-4F4A-BDD3-373791517799%40icloud.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages