Hi everyone
I just compared TCGA LUAD's data by re-calculating the results by RSEM, tophat-cufflinks and TCGA rawdata.
In the below table, the expression trends are similar among RSEM and tophat-cufflinks but not in the last column TCGA-62-8402-01.
I am afraid that my command is wrong. Did any one can give me some suggestion or the problems?
# My system environment
iGenome-Ensembl-human-GRCh37- gene.gtf and genome.fa (Bowtie2)
Bowtie2-2.1.0
RSEM-1.2.18 :
rsem-prepare-reference --gtf ./genes.gtf --bowtie2 --bowtie2-path /home/"myID"/bowtie2-2.1.0/ ./Bowtie2Index/genome.fa ./rsem_ref/hg19
rsem-calculate-expression --bowtie2-path /home/peter/Downloads/bowtie2-2.1.0 --paired-end -p 6 ./*_1.fastq ./*_2.fastq /home/"myID"/rsem_ref/hg19 TCGA0002_rsem
Tophat-2.0.11 + cufflinks-2.1.1
tophat -p 8 -G /work/"myID"/genes.gtf --no-coverage-search /work/"myID"/Bowtie2_Index/genome ./*_1.fastq ./*_2.fastq; rm ./*.fastq; cufflinks -p 8 -G ../genes.gtf -b ../Bowtie2_Index/genome.fa ./tophat_out/accepted_hits.bam
The results in the same sample
| gene_id |
gene_short_name |
locus |
FPKM-cufflinks |
TPM-rsem |
FPKM-rsem |
TCGA-62-8402-01 |
| ENSG00000000005 |
TNMD |
X:99839798-99854882 |
0 |
0 |
0 |
0 |
| ENSG00000001561 |
ENPP4 |
6:46097700-46114435 |
2.76868 |
4.61 |
2.96 |
8.2385 |
| ENSG00000005073 |
HOXA11 |
7:27221128-27224842 |
2.21098 |
2.7 |
1.74 |
6.4946 |
| ENSG00000005075 |
POLR2J |
7:102113564-102119354 |
55.197 |
105.06 |
67.52 |
9.9242 |
| ENSG00000005513 |
SOX8 |
16:1031807-1036979 |
0.0603328 |
0.1 |
0.07 |
2.442 |
| ENSG00000006059 |
KRT33A |
17:39502343-39507064 |
0 |
0 |
0 |
0 |
| ENSG00000006075 |
CCL3 |
17:34415601-34417515 |
9.08015 |
36.39 |
23.39 |
8.4706 |
| ENSG00000006116 |
CACNG3 |
16:24266873-24374122 |
0 |
0 |
0 |
0 |
| ENSG00000006128 |
TAC1 |
7:97361219-97369784 |
0 |
0 |
0 |
0 |
| ENSG00000006606 |
CCL26 |
7:75398850-75419214 |
0.352621 |
0.46 |
0.3 |
1.984 |
| ENSG00000006788 |
MYH13 |
17:10201400-10276447 |
0 |
0 |
0 |
0 |
| ENSG00000008438 |
PGLYRP1 |
19:46522410-46526323 |
0.0367952 |
0.05 |
0.03 |
0.5779 |
By the way, is TPM more suitable for TCGA comparison among all samples? I read Dr. Patcher's blog (
https://liorpachter.wordpress.com/tag/fpkm/), according to the formula of FPKM the value would not be consist due to the size of sequencing reads pool. However, I read in TCGA forum that most TCGA value in RNA-seq is FPKM or RPKM. Which data is more reliable??
Thank you very much.
Best Regards,
Chuching