Comparing expected counts between different samples

ken

unread,

Sep 3, 2012, 12:45:27 AM9/3/12

to rsem-...@googlegroups.com

Hi,

Does it make sense to normalize the expected counts from RSEM with the library size (total read count) if samples have very different library sizes?
Or put another way, when one wants to do differential expression with the expected counts, what does one use to 'normalize' if the library sizes are very different between samples.

Thanks
ken

b...@cs.wisc.edu

unread,

Sep 3, 2012, 12:53:31 AM9/3/12

to rsem-...@googlegroups.com

Hi Ken,

You can use either DESeq's method or edgeR's TMM.

DESeq:
http://genomebiology.com/2010/11/10/R106

edgeR's TMM:
http://genomebiology.com/2010/11/3/R25

Best,
Bo

ken

unread,

Sep 3, 2012, 1:01:38 AM9/3/12

to rsem-...@googlegroups.com, b...@cs.wisc.edu

Hi Bo,

Thanks (Sorry, I know those recommendations were already in your paper). Going back to the first question, would it make much sense to normalize the expected counts with the library size?
The more I think about it, since the expected counts are fragments derived from a given gene, I've manage to confuse myself as to whether it would be valid to normalize this metric with the library size.

Thanks,
Ken

Erik Aronesty

unread,

Sep 3, 2012, 7:58:20 AM9/3/12

to rsem-...@googlegroups.com

If you are a using software that embloys the negative bionmial distribution, like deSEQ and edgeR, then no, you need to pass raw counts. These programs use the significance of the actual fragment counts, and normalizing is incorrect.

If you are using a t-test or other, similar, tests that compare groups using variability metrics, then yes, you should normalize and you should probably operate in log-space.

Ning Leng

unread,

Sep 3, 2012, 5:21:02 PM9/3/12

to rsem-...@googlegroups.com

Hi ken,

I agreed with Eric. If you are using methods with NB or Poisson
assumption, you shouldn't modify the data (e.g. divide the gene
expression by sample's library size factor). Since the NB and Poisson
models in current papers assume the variance is a function of mean.
And the adjustments of the data will disturb these assumptions.

But if you want to use t-test (with normal assumption and the mean is
independent from the variance) or you want to visualize the data
(e.g.look at the box plots), you could simply adjust the data with the
library size factors.

Hope these are helpful.

Thanks,
Ning

--
Ning Leng
University of Wisconsin Madison
Department of Statistics
4720 Medical Sciences Center
1300 University Avenue
Madison, Wisconsin 53706

ken

unread,

Sep 3, 2012, 7:41:43 PM9/3/12

to rsem-...@googlegroups.com

Thanks Eric and Ning, it was very helpful.

Erik Aronesty

unread,

Sep 4, 2012, 7:35:03 AM9/4/12

to rsem-...@googlegroups.com

Adjustment by library size results in some biases from high-count transripts.

We usually use upper-quartile normalization on the data.

http://www.biomedcentral.com/1471-2105/11/94/

- Erik

Reply all

Reply to author

Forward