Hi Ali,
Answers below.
Ali Basuony wrote on 5/24/21 9:02 AM:
> Does this mean that any ddRAD data that been generated without using
> degenerate barcodes are not valid? Or in a different way, stacks is not
> a good choice to manipulate this data.
What does it mean for data to be "not valid" and why would that be
another way of saying that mean that Stacks is not a good choice?
As I mentioned in my previous message, this implies that there is some
secret information that can be obtained about PCR duplicates in ddRAD
data that Stacks is not using, but some other software has access to.
Again, what would that information be? I ask you to think about how one
can identify a PCR duplicate in a sequenced library?
> There are many published ddRAD data that been generated without using
> degenerate barcodes. They mentioned that PCR duplicates are just one of
> ddRAD limitations.
Yes, that is correct. Your ddRAD data may contain a few, or a lot of PCR
duplicates, there is no way around that, regardless of processing
software, full stop. That does not mean your data are not usable. As you
note, many many ddRAD datasets have been successfully published, many
using Stacks. But PCR duplicates, which were not well understaood when
ddRAD was first published, dilute the information your sequencing
library is providing.
> I'm planning to send some samples for ddRAD to a sequencing facility
> which doesn't use degenerate barcode at all. They said to moderate this
> problem, you should target a high coverage. Is that right?
If you plan to make ddRAD libraries there are two primary things you
should focus on: 1) start with high quality DNA in large quantities. If
your DNA is in very small amounts or degraded, it will result in a low
quality library -- that is a library with very few unamplified molecules
of DNA -- which is the information you are trying to get by sequencing.
2) You should reduce the number of PCR cycles you perform on your
libraries. The more PCR amplification you do, the more PCR duplicates
you will generate.
This is how #2 is related to #1, in a low quality library, you have very
little DNA, so most people crank up the PCR, which gives the illusion of
lots of DNA. Well, you do get lots of DNA, but it is all almost
exclusively clones/copies of the very few original molecules you started
out with.
Having good sequencing depth is very important for RAD data. However, if
your ddRAD library is full of PCR duplicates, increasing the sequencing
depth will cause you to sequence many more of the copies of your PCR
duplicates, without providing new information. To be specific,
increasing sequencing depth will provide you with more non-PCR duplicate
reads, however, you will get those at a slower rate than you are
generating PCR duplicate reads.
Best,
julian