TPM or FPKM questions in TCGA data

778 views
Skip to first unread message

Tsung-Ching Lai

unread,
Nov 4, 2014, 5:04:30 AM11/4/14
to rsem-...@googlegroups.com
Hi everyone

  I just compared TCGA LUAD's data by re-calculating the results by RSEM, tophat-cufflinks and TCGA rawdata.

  In the below table, the expression trends are similar among RSEM and tophat-cufflinks but not in the last column TCGA-62-8402-01.

  I am afraid that my command is wrong. Did any one can give me some suggestion or the problems?

#  My system environment
    iGenome-Ensembl-human-GRCh37- gene.gtf and genome.fa (Bowtie2)
    Bowtie2-2.1.0

    RSEM-1.2.18 :

    rsem-prepare-reference --gtf ./genes.gtf --bowtie2 --bowtie2-path /home/"myID"/bowtie2-2.1.0/ ./Bowtie2Index/genome.fa ./rsem_ref/hg19

    rsem-calculate-expression --bowtie2-path /home/peter/Downloads/bowtie2-2.1.0 --paired-end -p 6 ./*_1.fastq ./*_2.fastq /home/"myID"/rsem_ref/hg19 TCGA0002_rsem

    Tophat-2.0.11 + cufflinks-2.1.1

    tophat -p 8 -G /work/"myID"/genes.gtf --no-coverage-search /work/"myID"/Bowtie2_Index/genome ./*_1.fastq ./*_2.fastq; rm ./*.fastq; cufflinks -p 8 -G ../genes.gtf -b ../Bowtie2_Index/genome.fa ./tophat_out/accepted_hits.bam


The results in the same sample
gene_id gene_short_name locus FPKM-cufflinks TPM-rsem FPKM-rsem TCGA-62-8402-01
ENSG00000000005 TNMD X:99839798-99854882 0 0 0 0
ENSG00000001561 ENPP4 6:46097700-46114435 2.76868 4.61 2.96 8.2385
ENSG00000005073 HOXA11 7:27221128-27224842 2.21098 2.7 1.74 6.4946
ENSG00000005075 POLR2J 7:102113564-102119354 55.197 105.06 67.52 9.9242
ENSG00000005513 SOX8 16:1031807-1036979 0.0603328 0.1 0.07 2.442
ENSG00000006059 KRT33A 17:39502343-39507064 0 0 0 0
ENSG00000006075 CCL3 17:34415601-34417515 9.08015 36.39 23.39 8.4706
ENSG00000006116 CACNG3 16:24266873-24374122 0 0 0 0
ENSG00000006128 TAC1 7:97361219-97369784 0 0 0 0
ENSG00000006606 CCL26 7:75398850-75419214 0.352621 0.46 0.3 1.984
ENSG00000006788 MYH13 17:10201400-10276447 0 0 0 0
ENSG00000008438 PGLYRP1 19:46522410-46526323 0.0367952 0.05 0.03 0.5779

By the way, is TPM more suitable for TCGA comparison among all samples? I read Dr. Patcher's blog (https://liorpachter.wordpress.com/tag/fpkm/), according to the formula of FPKM the value would not be consist due to the size of sequencing reads pool. However, I read in TCGA forum that most TCGA value in RNA-seq is FPKM or RPKM. Which data is more reliable??

Thank you very much.

Best Regards,
Chuching
0002.xlsx

Bo Li

unread,
Nov 6, 2014, 3:39:29 AM11/6/14
to rsem-...@googlegroups.com
Hi Chuching,

If you want to compare the *** relative *** expression values of a same
gene across different samples, you should use TPM. It is very easy to
convert FPKM/RPKMs to TPMs. You first normalize your FPKMs from all
genes so that their sum is equal to 1. Then you product 1e6 to each
normalized value and you will obtain TPMs. If you want to compare
absolute expression across samples, you need to do within-sample
normalization, such as TMM (edgeR) and median normalization (DESeq).

For your first question, since I am not involved in analyzing TCGA data,
I'm not sure.

Colin, do you have any suggestions?

Hope it helps,
Bo
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1] http://deweylab.biostat.wisc.edu/rsem/
> [2] http://groups.google.com/group/rsem-users
Reply all
Reply to author
Forward
0 new messages