Hi everyone,
I was just wondering whether when DArT pre-filters the SNPs data before providing it to the final user, do they also remove potential PCR duplicates (that is "read duplicates", sequence reads that result from sequencing two or more copies of the exact same DNA fragment, and which are due to the fact that by mistake more than 1 PCR copy of the original DNA fragment hybridizes to the flow cell)?
As far as I understand, the "Clone ID" (which also matches the "Allele ID?) identifies unique target sequences (sequence tags), so that multiple sequence tags with the same Clone ID would represent "PCR duplicates", is that correct?
Does DArT already pre-filter and remove these PCR duplicates before providing the user with the final SNPs dataset, or should we filter PCR duplicates ourselves?
And if we have to filter out "PCR duplicates" ourselves, is the "gl.filter.cloneid" function the correct one to use?
When I try to use "gl.filter.cloneid", no loci are filtered out...I assume that is also because I had already filtered out secondaries with "gl.filter.secondaries".
In the DArTR manual, it says:
"gl.filter.secondaries: SNP datasets
generated by DArT include fragments
with more than one SNP and record them separately with the same CloneID
(=AlleleID). These multiple SNP loci within a fragment (secondaries) are
likely to be linked, and so you may wish to remove secondaries"
Would the "gl.filter.secondaries" function also have removed potential PCR duplicates?
I would really appreciate your help on this matter.
Thank you.
Best,
Gabriella