common ancestor using ncbi taxonomy?

271 views
Skip to first unread message

cbog...@gmail.com

unread,
Sep 7, 2015, 9:54:36 AM9/7/15
to The ETE toolkit
Hi,

I'm new to using ETE and am particularly keen on applying the NCBI Taxonomy functions (get_lineage, get_rank etc) to metagenomic data.

One thing I have been trying is to determine the lowest common ancestor (LCA) for the txids associated with each read pair (1 for fw read, 1 for rev read), for a large file (over 1M read pairs).

My understanding is I could do the following:

from ete2 import NCBITaxa, Tree
tree=NCBITaxa.get_topology([txid1, txid2])
common_ancestor=tree.get_common_ancestor("txid1","txid2")
print common_ancestor.name

However, this doesn't work for, for example "root" (txid1=1) and "Viruses" (txid2=10239), as the tree will just result in the node "10239" without a node "1"
=> Is there a way of getting the LCA for two taxa, even if they're on the same branch? Here, the result I want is "1" as "10239" is a descendant of "1"

Also, is there a better/quicker way of going about this?

Many thanks,

Carlijn

Jaime Huerta Cepas

unread,
Sep 8, 2015, 6:40:18 AM9/8/15
to eteto...@googlegroups.com
Hi Carlijn
You could enable the  intermediate_nodes option. However,  there seems to be a problem when using the root node (taxid 1). 

If you try any other query, for instance eukaryotes and celular organisms, it seems to work:

print ncbi.get_topology([2759, 131567], intermediate_nodes=True).get_ascii()
-131567-2759

even, between viruses and eukaryotes:
print ncbi.get_topology([2759, 10239], intermediate_nodes=True).get_ascii()

   /-10239
-1|
   \131567-2759


get common ancestor should work as you expected from there on:

t = ncbi.get_topology([2759, 10239], intermediate_nodes=True)
print t.get_common_ancestor([1, 10239]).name
1

print t.get_common_ancestor(["2759", "131567"]).name
131567

print t.get_common_ancestor([2759, 10239]).name
1

I need to check why the root taxid 1 does not work as expected when used in the get_topology().

thanks!
-jaime 



--
You received this message because you are subscribed to the Google Groups "The ETE toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etetoolkit+...@googlegroups.com.
To post to this group, send email to eteto...@googlegroups.com.
Visit this group at http://groups.google.com/group/etetoolkit.
For more options, visit https://groups.google.com/d/optout.

cbog...@gmail.com

unread,
Sep 10, 2015, 11:13:10 AM9/10/15
to The ETE toolkit
Thank you Jaime for looking into this and letting me know what the problem is.

In the end I approached my issue differently, by getting the lineages and just looping through the lists of txids in reverse.
Reply all
Reply to author
Forward
0 new messages