Hi Zach,
can you clarify which gappa command you're trying to use?
Looking at the LWRs is indeed a good place to start if you have just a few queries; if the highest LWR is indeed 0.05 then it looks like there wasn't any strong signal. However this could also mean that there are multiple closely related taxa in your reference tree, causing the results to be located "smeared" across a subtree. You might also want to check whether you were looking at the right field, as the LWR of a query should be summing up to one.
If the LWR distribution is in fact very flat I recommend you rerun epa-ng with the added commands --filter-acc-lwr 1.0 --filter-max 50 (or some other appropriately high number) which will retain more placement, giving a better sense if it really is smeared across the whole tree, or just a subset.
Let us know how it goes!
Pierre
--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/f513012c-fc91-48c8-ba51-a0cc4b6ff042n%40googlegroups.com.
Hi Zach,
last night, I sent a message to this list explaining placement a
bit more, but apparently it didn't get through... I'll re-send in
a minute, maybe that will explain a bit more.
As for your immediate question here:
but it still does not seem to place the query sequences in the tree
I don't understand what you mean. The two files that you shared
both contain information about AVON-1_ and AVON-2_, which I assume
are the two sequences that you placed. As those are present in the
files that you shared, they must also have been present in the
jplace file that you used as input for the gappa assign command,
and hence they were placed by epa-ng on the tree. They are both
placed somewhere in the Poeciliidae clade, from what those files
show. So, I don't know what you mean by "not placed in the tree" -
you have a jplace file, so you have placements! Just open that
file in a text editor and have a look at the information that the
file contains, maybe that clears things up :-)
Also, could you share that jplace file with us here as well (or
send to our emails directly, if you don't want to share the tree
that is contained in the file publicly)?
Cheers
Lucas
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/1490912c-a28d-4305-8d7e-6f6a9de22255n%40googlegroups.com.
Hi Zach,
thank you for the feedback! The gappa documentation is indeed
more aimed at running gappa, than explaining the background of how
phylogenetic placement works and what it can be used for. We
wanted to add more detail on that at some point, but in the
meantime, I hope to be able to help you out here a bit.
Phylogenetic placement is a method that takes a fixed phylogenetic reference tree of known sequences, and places new sequences (what we call query sequences, such as meta-barcoding sequences, or, in your case, outgroup sequences) on the branches of that tree - without turning these query sequences into actual new tips of the tree, but rather yielding a mapping/assignment of query sequences to branches of the reference tree. The result is stored in a jplace file, which contains the tree, and the mapping for each query sequence to the branches. This file is computed for example by RAxML-EPA, epa-ng, or pplacer. Then, given this mapping (in form of the jplace file), the data can be further analyzed and visualized, for example with gappa. Gappa itself does not perform placement, but is a downstream/post-processing tool.
If I understand your use case correctly, you have a couple of outgroup sequences that you want to place on the tree, and then see where they end up, is that right? For this, you could for example use the gappa graft command (https://github.com/lczech/gappa/wiki/Subcommand:-graft), which takes the most likely position of each mapped query sequence (the branch where it is most likely placed) and turns that into an actual new branch of the tree. This is a simple visualization technique that works for small numbers of query sequences (as in your case), and results in a newick tree that can be visualized with any typical tree viewer.
However, seeing your in- and outgroup are closely related and
seem to have low LWR, it is likely that there is not enough
phylogenetic signal in the sequence data to find a confident
branch to place these sequences. That can either mean that the
region of the genome that your sequences cover is not well suited
for the task at hand, and cannot reliably distinguish ingroup from
outgroup, or that those are in fact just really closely related to
each other. It still can however be the case that all "good"
placement positions of the outgroup sequences are in the same
clade of the tree, meaning that the outgroup is likely "somewhere
in that region", but with not enough signal to determine where
exactly.
To figure this out, you could place each of your outgroup
sequences individually, and visualize the distribution of all
their "good" placement locations individually per outgroup
sequence. For example, placing each of them will yield one jplace
file per outgroup sequence, which you can then visualize with
gappa heat-tree
(https://github.com/lczech/gappa/wiki/Subcommand:-heat-tree),
which will show you the distribution of "good" (likely) branches
on the tree where this sequences was placed.
I'd further suggest to read a couple of papers and other resources on the topic:
Hope that helps, and keep us in the loop with further questions :-)
Lucas
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/ad6681e3-5790-4f38-83b4-d22f6fccc0edn%40googlegroups.com.
Hi Zach,
using your jplace file, I have made two heat trees showing the
placement positions of both your outgroup sequences, see
attachment. The distribution is pretty much all over the tree...
Furthermore, looking at the actual placement data, as you did
before with the LWR, the jplace file only accounts for ~28% of the
placement mass distribution, with almost all LWR values being
equal or close to equal around a value of 0.0055. There are 50
locations reported for each of the two sequences in the jplace
file, and I guess that if you increased that limit to a higher
number, you'd just see more branches with the same probability.
Judging from that, it seems to me that there is close to no phylogenetic signal in the outgroup sequences that would give it any reasonable placement location on the tree. In the absence of that signal, the placement algorithm will then just equally distribute the likelihood of placement over all branches (and then cut off at 50 locations, due to your settings), which is exactly what we are seeing in the attached heat trees.
So, in order to figure out why there is no phylogenetic signal that would help to properly place your outgroup sequences (and thus avoid the unhelpful equal distribution across all branches that we have now), I'd suggest to have a look at the alignment that you used for running the placement (e.g., with https://ormbunkar.se/aliview/). I suspect that the outgroup sequences are not aligned properly to the MSA of the reference tree, and either form a separate block in the alignment, or are so divergent that there is no signal at all. Could you maybe share a screenshot of that alignment, or share the file directly?
Thanks and so long
Lucas
Attempting to send the jplace file again. Trying to zip it this time because it doesn't seem to like the jplace extension.
--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/98b00aa2-2ae5-4492-818e-0e70dd0a5c5fn%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/7193a00b-9beb-4a11-9009-77db656b5462n%40googlegroups.com.
Oh, one more thing. You wrote:
but the only tree or trees I've been able to get from gappa using a few different subcommands (e.g., assign and heat-tree) do not contain my outgroups
That sounds like either I am not understanding what you want to say with that, or that there is still some misunderstanding of how placement works. Your outgroups are placement on the reference tree, as shown in their placement distribution in the two pictures of my previous post. But placement does not make those sequences new branches of the tree - in case that is what you mean by "the tree does not contain the outgroup". What placement gives you is a distribution of potential locations of where your sequences would be attached if they were actually made into branches - but it does not (by itself) turn those sequences into actual branches.
If you want that, have a look at the gappa graft command. However, in your case, it does not make much sense, because what that command does is to take the most likely placement location for each sequence, and turn that into an actual branch of the tree. In your data however, that location is by chance just slightly more likely than all the other 0.0055 LWR locations, and so it is by no means a good or probably position for the sequences. Hence, forcing this to become the actual placement branch of the outgroup would give a false sense of certainty, when in fact your data (currently - before knowing what's going on in the alignment) does not support a proper/stable/reliable outgroup placement location at all.
Hope that helps, and so long :-)
Lucas