Dear Dr. Musalula,
Thank you for your email and interest in the TCGA data. Our apologies for the delayed response.
The two pipelines that you specify in your email pertain to the 1) current pipeline at Genomic Data Commons (GDC), and the 2) old pipeline that existed back in 2013. The pipeline that you should reference would depend on where you are getting the data, and what exact data you are using.
Since you are using TCGA data from cBioPortal, I contacted someone from that team to inquire on which set of data they have made available in their site- the newly harmonized data from the GDC or the original TCGA data from TCGA Legacy Archive. In their response, they indicated that the harmonized data from the GDC has not been pulled at the cBioPortal yet. They further specified that, for the published TCGA studies, they used the data that were included in the publications. Overall, it looks like they are using the data from
TCGA Legacy Archive, as retrieved back in 2016.
Please note: The only caveat here is that we do not know what exact data you are working with from cBioPortal. You may need to contact cBioPortal, specify to them what dataset you are using and request confirmation from them that it was TCGA Legacy data pulled back in 2016. I was advised to provide you with the following email address-
cbiop...@googlegroups.com at cBioPortal. Please feel free to contact them accordingly.
Let us know if you have any further questions.
Best,
The Cancer Genome Atlas Team
We would like to analyse TCGA RNAseq data together with RNAseq data from the Expression Atlas. We have noticed that the FPKM reported by the TCGA are on average are much higher than those reported by the Expression Atlas for the same type of cancer. We have traced these variations (we hope) to the different Bioinformatics or mRNA quantification pipelines that were employed by the TCGA and Expression Atlas.
Therefore, we would like to reanalyse the Expression Atlas FASTQ files or aligned BAM files using the TCGA Bioinformatics Pipeline to the make the FPKM values comparable. However, we have come across two different TCGA pipelines as provided in the links below:
Importantly, we would like to find out which among these two pipelines was used to process TCGA RNAseq RSEMv2 FPKM data such as that available as CbioPortal?
We look forward to your response.