1st and 2nd assignment in output

35 views
Skip to first unread message

Sanjeev Sariya

unread,
May 17, 2017, 10:34:13 AM5/17/17
to CLARK Users
Hi Clark Developers,

Thanks for this tool.
I'm trying to pace up with output of Clark tool and would need some guidance in order to make a tangible form for any upstream analyses.
I used Clark: 1.2.3 on Illumina 300bp PE,16S rRNA reads for V3-V4 region.

bash classify_metagenome.sh -O otus_checked.fna -R resultconfidence -m 0

The output contains columns with name: Object_ID Length Gamma 1st_assignment score1 2nd_assignment score2 confidence

An output looks like:
OTU_7676 425 0.00759494 818 3 NA 0 1

Please help me with following queries:
1) Which one of 1st assignment and 2nd assignment is taken into consideration during estimate_abundance?
2) From my input sequences not all are shown in the output due to the applied minimum confidence threshold of 0.5. Is there a way I could find/retrieve their taxonomy? 

Thanks,
Sanjeev
 

Sanjeev Sariya

unread,
May 22, 2017, 12:57:54 PM5/22/17
to CLARK Users
Hi There,

Please guide.

--Sanjeev
----

Rachid

unread,
May 22, 2017, 1:37:08 PM5/22/17
to CLARK Users
Hi Sanjeev,

Thank you for your interest! Below are the answers to your question.


On Wednesday, May 17, 2017 at 10:34:13 AM UTC-4, Sanjeev Sariya wrote:
Hi Clark Developers,

Thanks for this tool.
I'm trying to pace up with output of Clark tool and would need some guidance in order to make a tangible form for any upstream analyses.
I used Clark: 1.2.3 on Illumina 300bp PE,16S rRNA reads for V3-V4 region.

bash classify_metagenome.sh -O otus_checked.fna -R resultconfidence -m 0

The output contains columns with name: Object_ID Length Gamma 1st_assignment score1 2nd_assignment score2 confidence

An output looks like:
OTU_7676 425 0.00759494 818 3 NA 0 1

Please help me with following queries:
1) Which one of 1st assignment and 2nd assignment is taken into consideration during estimate_abundance?

Please read our peer-reviewed publication (here, specifically the Methods section). A read can be assignment to any of the target, but - to put it simply with relevant hypothesis- only target is the correct one. So CLARK sorts all targets. How? CLARK identifies the targets that have the highest count of k-mers being shared between the sequence/object and each target. The 1st_assignment is the target with the highest count, the 2nd is the target with the second highest count, etc. So the assignment is the "1st_assignment" (ties broken arbitrarily).
 
2) From my input sequences not all are shown in the output due to the applied minimum confidence threshold of 0.5. Is there a way I could find/retrieve their taxonomy? 

Yes, there is a way, you can parse the CLARK results and look for reads that do not pass the filtering and return their taxonomy ID.
I hope this helps,
 
Best,
Rachid


Thanks,
Sanjeev
 
Reply all
Reply to author
Forward
0 new messages