Comparing expected counts between different samples

608 views
Skip to first unread message

ken

unread,
Sep 3, 2012, 12:45:27 AM9/3/12
to rsem-...@googlegroups.com
Hi,

Does it make sense to normalize the expected counts from RSEM with the library size (total read count) if samples have very different library sizes?
Or put another way, when one wants to do differential expression with the expected counts, what does one use to 'normalize' if the library sizes are very different between samples.

Thanks
ken

b...@cs.wisc.edu

unread,
Sep 3, 2012, 12:53:31 AM9/3/12
to rsem-...@googlegroups.com
Hi Ken,

You can use either DESeq's method or edgeR's TMM.

DESeq:
http://genomebiology.com/2010/11/10/R106

edgeR's TMM:
http://genomebiology.com/2010/11/3/R25

Best,
Bo

ken

unread,
Sep 3, 2012, 1:01:38 AM9/3/12
to rsem-...@googlegroups.com, b...@cs.wisc.edu
Hi Bo,

Thanks (Sorry, I know those recommendations were already in your paper). Going back to the first question, would it make much sense to normalize the expected counts with the library size?
The more I think about it, since the expected counts are fragments derived from a given gene, I've manage to confuse myself as to whether it would be valid to normalize this metric with the library size.

Thanks,
Ken

Erik Aronesty

unread,
Sep 3, 2012, 7:58:20 AM9/3/12
to rsem-...@googlegroups.com
If you are a using software that embloys the negative bionmial distribution, like deSEQ and edgeR, then no, you need to pass raw counts.   These programs use the significance of the actual fragment counts, and normalizing is incorrect.

If you are using a t-test or other, similar, tests that compare groups using variability metrics, then yes, you should normalize and you should probably operate in log-space.

Ning Leng

unread,
Sep 3, 2012, 5:21:02 PM9/3/12
to rsem-...@googlegroups.com
Hi ken,

I agreed with Eric. If you are using methods with NB or Poisson
assumption, you shouldn't modify the data (e.g. divide the gene
expression by sample's library size factor). Since the NB and Poisson
models in current papers assume the variance is a function of mean.
And the adjustments of the data will disturb these assumptions.

But if you want to use t-test (with normal assumption and the mean is
independent from the variance) or you want to visualize the data
(e.g.look at the box plots), you could simply adjust the data with the
library size factors.

Hope these are helpful.

Thanks,
Ning
--
Ning Leng
University of Wisconsin Madison
Department of Statistics
4720 Medical Sciences Center
1300 University Avenue
Madison, Wisconsin 53706

ken

unread,
Sep 3, 2012, 7:41:43 PM9/3/12
to rsem-...@googlegroups.com
Thanks Eric and Ning, it was very helpful. 

Erik Aronesty

unread,
Sep 4, 2012, 7:35:03 AM9/4/12
to rsem-...@googlegroups.com
Adjustment by library size results in some biases from high-count transripts.

We usually use upper-quartile normalization on the data.


- Erik
Reply all
Reply to author
Forward
0 new messages