Preprint about CLARK-S' evaluation available today !

55 views

Skip to first unread message

Rachid

unread,

May 14, 2016, 6:11:46 PM5/14/16

to CLARK Users

Dear CLARK users,

We would like to bring to your attention our recent preprint in Biorxiv about "CLARK-S". While CLARK-S was already implemented and made available to you, this preprint provides new results.

In this paper, you will find:

- the evaluation at the species level of CLARK-S against CLARK (the standard variant) on large synthetic datasets (a total of ~23 million reads) and real datasets (a total of ~101 million short reads after trimming).

- a unique dataset of unambiguously mapped reads that allow an unbiased evaluation of a tool's classification accuracy.

These sets of unambiguously mapped reads were created because one can observe that a read as short as 100bp can be mapped to several species and thus may flaw any evaluation of a tool's performance...

For instance, say you want to evaluate tool A and tool B: when A and B disagree to classify the read r, it does mean necessarily that A and B are both wrong or, that only A (or B) is correct, but, they can be both correct - if r can be mapped to multiple species for a given error rate....

Thus we believe that reads mapping to one and only one species can be used for a straightforward but correct/fair evaluation of a tool's classification accuracy.

Under this consideration, we showed that CLARK-S is precise, fast and more sensitive than CLARK, at the same time.

The manuscript is under review but CLARK-S is freely available now (http://clark.cs.ucr.edu/).