Hello Yun,
I was just rereading our conversation, and I realized that I should explain why you should normalize by number of reads in a sample, not DNA concentration.
Example:
Sample1 11,000 reads
Sample2 11,500 reads
Sample3 22,000 reads
Sample: 1, 2, 3
OTU1 45, 43, 102
OTU2 42, 46, 44
OTU3 0, 2, 92
If we look at the OTUs (and ignore the number of reads) we would make these conclusions:
OTU2 is equally common in all samples.
OTU1 is most common in sample1.
But we know these conclusions are not perfect because samples do not have the same number of reads! Let's randomly sample 10,000 reads from each sample (this is called subsampling or rarefying):
Sample1 10,000 reads
Sample2 10,000 reads
Sample3 10,000 reads
Sample: 1, 2, 3
OTU1 43, 41, 47
OTU2 39, 45, 22
OTU3 0, 0, 45
Ah ha! Now we reach these conclusions:
OTU2 is less common in sample 3.
OTU1 is equally common all samples.
This is why normalizing by number of reads is important.
...
Now let's add DNA concentration
Sample1 11,000 reads 22 ng/ml
Sample2 11,500 reads 8 ng/ml (what!?!)
Sample3 22,000 reads 50 ng/ml
OK, so the sample with the most DNA has the most reads. That makes sense. But Sample2 with very little DNA has a normal amount of reads. This may sound strange, but I see it all the time on the MiSeq. Some samples get lots of reads even with very little input DNA. Because there is not a consistent link between number of reads and DNA concentration, you have to choose which method is better for normalization. The conscientious in the field is to use number of reads, not DNA concentration, because we think number of reads makes more difference in downstream analysis.
I hope that helps!
Happy holidays!
Colin