There are several papers showing
different kinds of biases when using RPKM normalisation. Mainly an
over-correction of short exons and to favour transcripts with long
exons as DE genes, resulting in a smaller p-value. There are also
reported differences between samples, sequencing runs and data
sets when using RPKMs:
D Hebenstreit, M Fang, M
Gu… - Molecular systems …, 2011 - Wiley Online Library
... The
fragments used to estimate intron and intergenic RPKM were
based on randomizations using
the ... The accuracy of RNA-seq
is biased toward longer and more
highly expressed genes, eg ... To
explore how this accuracy bias affects the shape of the
LE distribution, we studied ...
... expression
levels from RNA-Seq data, such as reads per kilobase of gene
length per million
reads (RPKM), are biased in terms ... Compared
to previously proposed base level correction
methods, our method reduces bias in gene-level expression
estimates more effectively. ...
... Exon
length bias and GC-content effect
(Jiang, "cell"). ... Counts or RPKMs are
computed using
totcounts, maxcounts, RPKM-corrected totcounts (RPKM) and totcounts
corrected with
within-lane full-quantile normalization over exon length
(FullQ), and averaged across libraries. ...
Best,
Markus