Making taxonomy mapping file with Pycogent

93 views
Skip to first unread message

Aaron

unread,
Sep 2, 2016, 4:04:16 PM9/2/16
to qiime...@googlegroups.com
Hi all - I have been doing some searching on how to set up the correct taxonomy mapping file for a "custom" 18S V4 sequence database. I am following themethod outlined in this older thread using PycoGent and the module NcbiTaxonomyFromFiles:


I would like to ultmately use Qiime for assigning taxonomy to metabarcoding otus, but my question here may be considered more Python related. If someone can point me in the right direction I would appreciate it. 

Prior to trying this I have had no Python experience at all. I have worked through some official Python tutorial material over the past few days, but there is a lot to digest, and I am continuing to work on it. I have the following module I adapted to get the code prom Pycogent presented in the above thread to work on a list of taxids instead of inputting one at a time in the interpreter using a computer with Pycogent installed:

def get_lineage(node,my_ranks):
"""returns lineage information in my_ranks order"""
from cogent.parse.ncbi_taxonomy import NcbiTaxonomyFromFiles
tree = NcbiTaxonomyFromFiles(open('nodes.dmp'),open('names.dmp'))
root = tree.Root
ranks_lookup = dict([(r,idx) for idx, r in enumerate(my_ranks)])
lineage = [None]*len(my_ranks)
with open(node) as f:   #My attempt here to read from a file of taxon ids
for line in f:
curr = tree.ById[int(line.strip('\n'))]
while curr.Parent is not None:
if curr.Rank in ranks_lookup:
lineage[ranks_lookup[curr.Rank]] = curr.Name
curr = curr.Parent
print lineage
I saved the module as taxo.py. In addition I have a text file of ~1500 taxids called "test.txt". In Python, I typed:
>>>import taxo
ranks = ['superkingdom','phylum','class','order','family','genus','species']
>>>taxo.get_lineage("test.txt",ranks)

It outputs the expected taxonomy in the interpreter as follows:
['Eukaryota','Bacillariophyta','Coscinodiscophyceae','Thalassiosirales','Thalassiosiraceae','Thalassiosira','Thalassiosira nordenskioeldii']
...

I do realize that this output format will require some more formatting for a correct taxonomy mapping file, but this is what I am interested in obtaining this point. There are two things I want to do and can't figure out, but should be trivial to someone versed in Python. First, if I have taxid that is not in the ncbi taxonomy database, the script stops at that id with a KeyError, since it is not in the created dictionary. How do I alter the script to write a line that says "Entry not in database" when it encounters a key with no match, and then keeps going through the list? Second, I want to write the output to a file, say "taxonomy_out.txt" rather than to the interpreter. I realize both of these are probably obviuous and
simple but I have been working with this the last three days and remain stuck. Could someone please provide some suggestions? I appreciate your time and help.

- Aaron 

taxo.py
test.txt

Jose Antonio Navas Molina

unread,
Sep 2, 2016, 4:31:33 PM9/2/16
to Qiime 1 Forum
Hi Aaron,

For programming issues, usually stackoverflow is a really good forum, and usually there is a lot of questions already answered in there that helps.

For the KeyError, you need to handle your exceptions, rather than letting them go. Basically this boils down to something along the lines of:
try:
    # Offending line of code that is raising the exception
except KeyError:
   # Do what you want to do in case of error - looks like print Entry not in database

For your second problem, the python documentation is a great source, with a few examples. You need to do something like:

with open("filepath", "w") as f:
    f.write("Whatever you want to write")

Hope this helps!
Reply all
Reply to author
Forward
0 new messages