taxonomy table from blast search with metabarcoder

27 views
Skip to first unread message

zifca...@gmail.com

unread,
Aug 28, 2017, 3:28:51 AM8/28/17
to MetacodeR
Hello,
I would like to ask if it possible to use metabarcoder to parse/assign taxonomy of genbank accession numbers (in form of blast hit table) using dowloaded genbank taxonomical database? Basically I just need to pair  genbank accession numbers with their taxonomical hierarchical information. Is that possible and how?
Thank you,
Lucia

Zachary Foster

unread,
Aug 28, 2017, 12:19:02 PM8/28/17
to MetacodeR, zifca...@gmail.com
Hi Lucia,

I have not done this before using a download of the NCBI taxonomy database, but it sounds like a good idea. Where did you get the database? here?

ftp://ftp.ncbi.nih.gov/pub/taxonomy/

If so, which file?

It looks like `taxizedb`, a new companion package to `taxize` for using local databases, it going to support this eventually:

https://github.com/ropensci/taxizedb/issues/2

That might be an option in the future. You could comment on that issue and Sckott might prioritize working on it.

Metacoder can look up the classifications from genbank accession numbers, but it is slow. How many do you have? Thanks,

Zach
Message has been deleted
Message has been deleted

Zachary Foster

unread,
Aug 29, 2017, 1:53:05 PM8/29/17
to MetacodeR, zifca...@gmail.com
Hmm, I got your messages, but google groups thought they were spam. Looking at it now.

Zachary Foster

unread,
Sep 7, 2017, 2:00:36 PM9/7/17
to MetacodeR, zifca...@gmail.com
Hi Lucia,

I used your script to download the files you downloaded and looked at them. I don't understand the format well enough to parse it unfortunately. Currently, metacoder cannot handle this format as far as I can tell. Google deleted your messages for some reason, so I am responding from memory; sorry if I don't address some of your questions.

I think you asked about parallel processing to speed up looking up taxonomy from genbank. This will not help because the issue is not processing speed of your computer, but the response speed of NCBI's servers, which are usually overloaded and quite slow. There is nothing I can do to speed that up as far as I know. You can probably look up a few thousand in an hour or so, but it would be hard to do more at once because something will probably go wrong and mess up the query (again, this is an NCBI issue). There are tutorials on the metacoder documentation on how to do this if you want to try.

Sorry I cant be of more help at the moment! If other people are having this same issue, let me know and I will try to figure it out again. Currently, it seems easiest to wait for the taxizedb package to handle this rather than creating redundant functionality.

-Zach
Reply all
Reply to author
Forward
0 new messages