1st and 2nd assignment in output

Sanjeev Sariya

unread,

May 17, 2017, 10:34:13 AM5/17/17

to CLARK Users

Hi Clark Developers,

Thanks for this tool.

I'm trying to pace up with output of Clark tool and would need some guidance in order to make a tangible form for any upstream analyses.

I used Clark: 1.2.3 on Illumina 300bp PE,16S rRNA reads for V3-V4 region.

bash classify_metagenome.sh -O otus_checked.fna -R resultconfidence -m 0

The output contains columns with name: Object_ID Length Gamma 1st_assignment score1 2nd_assignment score2 confidence

An output looks like:

OTU_7676 425 0.00759494 818 3 NA 0 1

Please help me with following queries:

1) Which one of 1st assignment and 2nd assignment is taken into consideration during estimate_abundance?

2) From my input sequences not all are shown in the output due to the applied minimum confidence threshold of 0.5. Is there a way I could find/retrieve their taxonomy?

Thanks,

Sanjeev

Sanjeev Sariya

unread,

May 22, 2017, 12:57:54 PM5/22/17

to CLARK Users

Hi There,

Please guide.

--Sanjeev

----

Rachid

unread,

May 22, 2017, 1:37:08 PM5/22/17

to CLARK Users

Hi Sanjeev,

Thank you for your interest! Below are the answers to your question.

On Wednesday, May 17, 2017 at 10:34:13 AM UTC-4, Sanjeev Sariya wrote:

Hi Clark Developers,

Thanks for this tool.
I'm trying to pace up with output of Clark tool and would need some guidance in order to make a tangible form for any upstream analyses.
I used Clark: 1.2.3 on Illumina 300bp PE,16S rRNA reads for V3-V4 region.

bash classify_metagenome.sh -O otus_checked.fna -R resultconfidence -m 0

The output contains columns with name: Object_ID Length Gamma 1st_assignment score1 2nd_assignment score2 confidence

An output looks like:
OTU_7676 425 0.00759494 818 3 NA 0 1

Please help me with following queries:
1) Which one of 1st assignment and 2nd assignment is taken into consideration during estimate_abundance?

Please read our peer-reviewed publication (here, specifically the Methods section). A read can be assignment to any of the target, but - to put it simply with relevant hypothesis- only target is the correct one. So CLARK sorts all targets. How? CLARK identifies the targets that have the highest count of k-mers being shared between the sequence/object and each target. The 1st_assignment is the target with the highest count, the 2nd is the target with the second highest count, etc. So the assignment is the "1st_assignment" (ties broken arbitrarily).

2) From my input sequences not all are shown in the output due to the applied minimum confidence threshold of 0.5. Is there a way I could find/retrieve their taxonomy?

Yes, there is a way, you can parse the CLARK results and look for reads that do not pass the filtering and return their taxonomy ID.

I hope this helps,

Best,

Rachid

Thanks,
Sanjeev

Reply all

Reply to author

Forward