Clarification regarding TPM and FPKM

4,436 views
Skip to first unread message

rsem_grad_student

unread,
May 13, 2014, 4:54:48 PM5/13/14
to rsem-...@googlegroups.com
Hello, 

I have an extremely naive question. I've read the other posts and original manuscript regarding TPM vs FPKM, but I'm still a bit confused on the following very simple point and would like a clarification. Is TPM already normalized for transcript length? I think it is normalized from my reading, but I'd like a confirmation. 

I understand that when comparing across samples or calling differentially expressed genes, TPM is the better measure to use. However, what would be the better measure when ranking gene expression within a single sample--fpkm or tpm? In a single sample, if I want to say that gene A is expressed at higher levels than gene B, can I just compare the TPM numbers?

Thanks! 

Colin Dewey

unread,
May 13, 2014, 6:03:07 PM5/13/14
to rsem-...@googlegroups.com
Hi,

Here's how I like to explain it: TPM (transcripts per million) is a technology-independent abundance measure (it's just a fraction, like PPM (parts per million)).  RNA-Seq is one technology (of many) that can be used to estimate the abundance (in TPM) for each gene or transcript.  One detail of estimating relative abundances from RNA-Seq data is that you need to take into account (i.e., via normalization) the lengths of the transcripts because longer transcripts tend to produce more reads even if they have the same relative abundance as other shorter transcripts.

Within a single sample, the TPM and FPKM value for a gene are simply off by a constant factor.  So you can use either for ranking genes according to abundance in a single sample.

Colin

--
RSEM website: http://deweylab.biostat.wisc.edu/rsem/
---
You received this message because you are subscribed to the Google Groups "RSEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.
To post to this group, send email to rsem-...@googlegroups.com.
Visit this group at http://groups.google.com/group/rsem-users.

Bo Li

unread,
May 13, 2014, 6:04:37 PM5/13/14
to rsem-...@googlegroups.com
Hi,

Please see my comments below.

On 2014-05-13 13:54, rsem_grad_student wrote:
> Hello,
>
> I have an extremely naive question. I've read the other posts and
> original manuscript regarding TPM vs FPKM, but I'm still a bit
> confused on the following very simple point and would like a
> clarification. Is TPM already normalized for transcript length? I
> think it is normalized from my reading, but I'd like a confirmation.

Yes, TPM is already normalized for transcript length. In fact, if you
normalize FPKM such that the total sum is 1M, you get TPM.

>
> I understand that when comparing across samples or calling
> differentially expressed genes, TPM is the better measure to use.
> However, what would be the better measure when ranking gene expression
> within a single sample--fpkm or tpm? In a single sample, if I want to
> say that gene A is expressed at higher levels than gene B, can I just
> compare the TPM numbers?

For within sample comparison, either FPKM or TPM is fine. If you want to
call differentially expressed genes, only TPM is not enough, you need do
further normalization such as those done in DESeq or edgeR. Please
remember TPM is a relative measure. Using TPM for DE will only tell you
the relative fractions of that gene across samples are different. It is
still possible the number of molecules in the samples is the same.

For more details, you may want to read my frind Harold's blog post:

http://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/

Best,
Bo

>
> Thanks!
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1] http://deweylab.biostat.wisc.edu/rsem/
> [2] http://groups.google.com/group/rsem-users

rsem_grad_student

unread,
May 13, 2014, 6:47:41 PM5/13/14
to rsem-...@googlegroups.com
Thanks Colin and Bo for your explanations!

pbczyd

unread,
May 13, 2014, 7:03:36 PM5/13/14
to rsem-...@googlegroups.com
Hi Bo and Colin

I have another question related to this topic.

In Harold's blog, it is said "NONE of these units are comparable across experiments" .

However, in following single-cell RNAseq paper from Broad, they compared gene expression patterns in 18 singe cells using log2(TMP) from RSEM.



Did I miss something? 


Regards,
Sheng

Colin Dewey

unread,
May 14, 2014, 12:41:50 PM5/14/14
to rsem-...@googlegroups.com
Hi Sheng,

It is important to be clear about what one wishes to compare between experiments.  If one wishes to compare *relative* abundances between samples, then TPM is certainly comparable between experiments since a TPM value *is* a relative abundance.  However, in many cases, researchers are more interested in differences in *absolute* abundances across experiments.  RNA-Seq data itself does not directly give you absolute abundances (it gives you relative abundances).  Thus, one typically needs to apply cross-sample normalization (e.g., TMM, upper quartile normalization), to arrive at values for each sample that are on more of an "absolute" scale, so that differences in absolute abundance may be properly measured.

In the paper you reference I did not see mention of cross-sample normalization, so any conclusions they reach are primarily with respect to relative abundances.

Best,
Colin

pbczyd .

unread,
May 14, 2014, 1:48:41 PM5/14/14
to rsem-...@googlegroups.com
Hi Colin,

I got it.  Thank you so much for your reply. 

Regards,
Sheng
Reply all
Reply to author
Forward
0 new messages