estimate_abundance.sh on mutliple samples

40 views
Skip to first unread message

Theo Allnutt

unread,
May 29, 2017, 8:37:02 PM5/29/17
to CLARK Users
Hi,

how do I obtain an abundance table for multiple samples using clark? I have tried feeding it a list of classify_metagenome.sh result files, e.g.:

./estimate_abundance.sh -D ~/db/RefSeq -F 002.csv 005.csv 009.csv 011.csv 012.csv 016.csv 019.csv 020.csv 032.csv 033.csv 035.csv 037.csv 038.csv 039.csv 040.csv 041.csv 042.csv 043.csv 044.csv 046.csv 049.csv 052.csv 053.csv 058.csv 059.csv --highconfidence >clark_result.txt

but the result is only for one of the samples - having one column of results. I can run all the samples separately and then write a script to tabulate all the results into one 'otu table', but surely clark can do this somehow? Sorry if I missed something in the readme..

Thanks,

T.

Theo Allnutt

unread,
May 29, 2017, 10:05:43 PM5/29/17
to CLARK Users
It seems that the abundance output rows are not consistent in their format, so parsing them to make a single  table is very difficult..

e.g. the columns should be:
Name,TaxID,Lineage,Count,Proportion_All(%),Proportion_Classified(%)

but I find rows like this with only 5 columns:

UNKNOWN,UNKNOWN,3667279,91.682,-

which should have "-" in the TaxID but does not..

Think I'll have to give up on this for now.

T.



Rachid

unread,
May 30, 2017, 6:18:54 PM5/30/17
to CLARK Users
Hi Theo,

You can indeed process multiple CLARK results files, but they must have the same format (i.e., you can not pass files that have confidence scores with those who do not).
Could you please make sure this was the case in your example?
If they all have same format, then could you send me these files so I can reproduce this on my side?
Thank you!

Cheers,
Rachid

Rachid

unread,
May 30, 2017, 6:23:46 PM5/30/17
to CLARK Users
Hi,
I am not sure what you meant by "very difficult", what is the difficulty? It is explain in the README file that the last line of the report file contains the count of reads that are unknown or filtered out by the options you passed in arguments.
That is one line, that is is always at the end of the report file and it always starts with the keyword "UNKNOWN". Using this pattern should help you to parse any report file.
Thank you,

Cheers,
Rachid
Reply all
Reply to author
Forward
0 new messages