Dear CLARK users,
We would like to bring to your attention our recent preprint in
Biorxiv about "CLARK-S". While CLARK-S was already implemented and made available to you, this preprint provides new results.
In this paper, you will find:
- the evaluation at the species level of CLARK-S against CLARK (the standard variant) on large synthetic datasets (a total of ~23 million reads) and real datasets (a total of ~101 million short reads after trimming).
- a unique dataset of unambiguously mapped reads that allow an unbiased evaluation of a tool's classification accuracy.
These sets of unambiguously mapped reads were created because one can observe that a read as short as 100bp can be mapped to several species and thus may flaw any evaluation of a tool's performance...
For instance, say you want to evaluate tool A and tool B: when A and B disagree to classify the read r, it does mean necessarily that A and B are both wrong or, that only A (or B) is correct, but, they can be both correct - if r can be mapped to multiple species for a given error rate....
Thus we believe that reads mapping to one and only one species can be used for a straightforward but correct/fair evaluation of a tool's classification accuracy.
Under this consideration, we showed that CLARK-S is precise, fast and more sensitive than CLARK, at the same time.
Please feel free to address us your comment/feedback to improve the tool!
Best,
Rachid