Calculate abundances for Custom databases

49 views
Skip to first unread message

Jens

unread,
Jan 27, 2017, 10:06:41 AM1/27/17
to CLARK Users
Hi!,

I made a Custom database with a taxonomy file that lookes like:
PATH   species_1
PATH   species_2
PATH   species_2
...


The CLARK step works fine. Then I want to calculate the abundances.

1st run: It complained names.dmp doesn't exist. For this particular database I didn't download the full NCBI taxonomy into the taxonomy folder. But if needed, I can do that. However, I made an empty names.dmp file and ran again.

2nd run: I get these warning messages:
Loading nodes of taxonomy tree... done
Start retrieving lineage for each target identified (2375)...
Failed to identify species_62236: Unknown taxonomy id given the provided taxonomy database.
...
The program will estimates abundance per taxonomy id.

And I thought, that's fine! However, when I look in the output I get:
Name,TaxID,Lineage,Count,Proportion_All(%),Proportion_Classified(%)
UNKNOWN,UNKNOWN,6173800,100,-



Now, how do I trick CLARK to summarize into these home-made species cateogires that are not related to NCBI taxonomy? I assume I may have to just make my own version of the names.dmp file. What should it look like?

Thanks!

Rachid

unread,
Jan 27, 2017, 8:36:20 PM1/27/17
to CLARK Users
Hello Jens,

You can do this by not using the option "-D" when you call "estimate_abundance.sh". Please, see the definition of the parameters in the README file.

Cheers,
Rachid
Reply all
Reply to author
Forward
0 new messages