Hi!,
I made a Custom database with a taxonomy file that lookes like:
PATH species_1
PATH species_2
PATH species_2
...
The CLARK step works fine. Then I want to calculate the abundances.
1st run: It complained names.dmp doesn't exist. For this particular database I didn't download the full NCBI taxonomy into the taxonomy folder. But if needed, I can do that. However, I made an empty names.dmp file and ran again.
2nd run: I get these warning messages:
Loading nodes of taxonomy tree... done
Start retrieving lineage for each target identified (2375)...
Failed to identify species_62236: Unknown taxonomy id given the provided taxonomy database.
...
The program will estimates abundance per taxonomy id.
And I thought, that's fine! However, when I look in the output I get:
Name,TaxID,Lineage,Count,Proportion_All(%),Proportion_Classified(%)
UNKNOWN,UNKNOWN,6173800,100,-
Now, how do I trick CLARK to summarize into these home-made species cateogires that are not related to NCBI taxonomy? I assume I may have to just make my own version of the names.dmp file. What should it look like?
Thanks!