Meaning of 2nd assignment when the classification is filtered out in Clark-S

37 views
Skip to first unread message

Antonio Diaz Tula

unread,
Mar 30, 2017, 10:33:31 AM3/30/17
to CLARK Users
Hello,

We are using Clark-S to classify metagenomic samples. Clark-S documentation says that classifications are automatically discarded when either gamma < 0.06 or confidence < 0.75. 
For such low-confidence classifications the 1st assignment is set to NA, but the 2nd assignment shows a taxon ID. 

Our doubt is how to handle such situations. Does the 2nd classification should be also NA, or it is the "highest assignment" reported in the column of the 2nd assignment, because the confidence is below the filtering thresholds?

Just to illustrate: 

SEQUENCE_NAME,309,0.0107527,NA,1,795750,1,0.5

In this classification the gamma (bold) is below 0.06, so it is low confidence and the 1st assignment is NA. What is the meaning of the 2nd assignment (taxon ID 795750)?

Thanks in advance!
Antonio

Rachid

unread,
Mar 30, 2017, 11:17:35 PM3/30/17
to CLARK Users
Hello Antonio,

Thank you for your interest! When a read assignment is rejected by CLARK-S because low confidence then the 1st assignment is overridden to "NA" (and the 2nd assignment is left unchanged).
It is enough to override the 1st assignment because this is the most important information in the assignment, note that "estimate_abundance.sh" uses only this information not the 2nd assignment, and nor should you. 
So in your example, "795750" is the taxonomy id of the target that has the second highest k-mers count for that sequence.

Cheers,
Rachid

Antonio Diaz Tula

unread,
Mar 31, 2017, 10:27:26 AM3/31/17
to CLARK Users
Read Rachid

Thanks a lot for your answer! 
Best regards

Antonio
Reply all
Reply to author
Forward
0 new messages