Hey Grace,
ah yes, I see the issue there. The reason gappa behaves that way
is that phylogenetically, the orange/brown edge is (in a sense)
the same edge as the gray one on the other side of the root. That
is, the tree is "rooted" only in the sense that there exist a
bifurcating top node in the tree - but the two branches away from
the root are in fact one branch. The likelihood model that is used
in ML tree inference and in phylogenetic placement does not
consider the separation that is induced by the root node. Hence,
I'm hesitant to just add an option as you suggest.
However, as an idea to solve this use case: I could add an option to select whether the branch connecting a clade to the rest of the tree is considered part of that clade or not. Then, the orange/brown branch could be excluded - I think. I'd have to play around with that for a bit to see what a straightforward. Could you maybe send me one of the jplace files where this case occurs?
Thanks and so long
Lucas
PS: Kudos! Your question is really well phrased and clear, and
you outlined potential solutions already! There should be more
people that thorough when asking for user support ;-)
--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/e14bedf4-2b7a-4c7f-9a94-d629b126d38dn%40googlegroups.com.
Hi Grace,
as Lucas points out, this is a bit of an edge case from the point of the methods. The way EPA-ng handles this, is that it unroots the tree, performs the placement, then maps the placements back onto the rooted tree. For the used-to-be-split-by-root-node branch it decides which side of the root to map to based on the distal length (the attachment point). Especially with queries that don't have a strong "pull" toward the tree and don't place so well, it could be the case that they attach high up to the root like that. In these cases the signal is also often low, which makes it more likely that they get "misassigned" to the outgroup.
So here is what I would do:
1) for those queries, check how well they actually placed. One strong indication is how high the highest LWR (third number in the output) is; if its low then probably theres not enough signal to place it confidently. Another indication is the pendant length, which is the distance of the placed query to the branch where it was placed. If its comparatively high (as in as large as the diameter of the tree or larger) then its similarly bad.
2) your tree looks very small; it could be that this issue improves/goes away with a more comprehensive reference tree. The idea there is that more references -> clearer signal -> stronger placement, and stronger toward the leaves rather than the basal branches.
Thank you for the great question, as Lucas said! Please keep asking as this is helping to improve the tools and the overall workflow of doing placement.
Happy Placement,
Pierre
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/a0ee9447-0e16-481e-89b5-6b8641236993%40gmail.com.
You received this message because you are subscribed to a topic in the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/phylogenetic-placement/Dy9EpxbNUdc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/539b6f87-f498-9a3f-8e29-0ee8a6d49543%40h-its.org.
Hi Grace,
as for the pendant length, it's a bit tricky. I can't give definitive answers there since I'm not aware of any previous attempts of classifying them into "close enough" and "too long". I do think it would be reasonable to exclude queries that have very long pendant lengths though, but I would be conservative there.
As for filtering based on LWR, center of mass sounds like a good approach, however that can be deceiving 1) if the LWR distribution is perfectly flat, uniform across the tree 2) ... in which case you might not even see that in the output, since by default (for now) epa-ng only outputs a maximum of 7 placements per query. The latter can be fixed by running with "--filter-acc-lwr 0.95 --filter-max 10000" or similar settings. One of the next releases will overhaul that behaviour to give more reasonable feedback/output in these situations.
Filtering out the specific branches could also be a temporary fix to your issue as you mentioned. First I would try a general filtering, and then apply gappa extract again.
Let us know how it goes!
Pierre
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/CAAJD43AM3%3DCYfM2MhXFaYvTFwO%3DBpOkiYxmgujd%2B74%2BEGuJYug%40mail.gmail.com.
-- MSc Pierre Barbera Phone: +49 6221 533 258 Fax: +49 6221 533 298 E-Mail: pierre....@h-its.org HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger Scientific Director: PD Dr. Wolfgang Müller
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/f0c09080-200a-c44a-e219-f442c17d4f05%40h-its.org.