gappa not placing queries in tree

142 views
Skip to first unread message

ZWC

unread,
Apr 29, 2021, 2:43:36 PM4/29/21
to Phylogenetic Placement
Hi, 

Now that I am able to get gappa to see my files, I'm trying to interpret what is going on.  It seems it is not placing the outgroup samples (the queries in this case) on the tree.  I guess the simpler place to start is that if I look at the LWRs, LWR1 is 0.005 for both query samples and all other LWRs are 0.  Based on those numbers should even expect gappa to place them on the tree?

If it's relevant, the reason I'm using this method is that our outgroup samples have a very high rate of missing data and with our RAxML runs they were on very long branches nested within the ingroup.  It was suggested to look into tools like EPA-NG and downstream tools to see if we can place the outgroup somewhere on the tree.

Thanks!

Zach

Pierre Barbera

unread,
Apr 29, 2021, 4:37:33 PM4/29/21
to phylogeneti...@googlegroups.com

Hi Zach,

can you clarify which gappa command you're trying to use?

Looking at the LWRs is indeed a good place to start if you have just a few queries; if the highest LWR is indeed 0.05 then it looks like there wasn't any strong signal. However this could also mean that there are multiple closely related taxa in your reference tree, causing the results to be located "smeared" across a subtree. You might also want to check whether you were looking at the right field, as the LWR of a query should be summing up to one.

If the LWR distribution is in fact very flat I recommend you rerun epa-ng with the added commands --filter-acc-lwr 1.0 --filter-max 50 (or some other appropriately high number) which will retain more placement, giving a better sense if it really is smeared across the whole tree, or just a subset.

Let us know how it goes!

Pierre

--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/phylogenetic-placement/f513012c-fc91-48c8-ba51-a0cc4b6ff042n%40googlegroups.com.

ZWC

unread,
Apr 29, 2021, 5:35:44 PM4/29/21
to Phylogenetic Placement
Hi Pierre,

I've run various commands to try to explore the data, but to be honest this is my first foray into this type of analysis and I'm not finding that the documentation helps me understand exactly what I need to be doing!  That's not a knock on anyone, but rather maybe my lack of familiarity.  I've done plenty of phylogenetics stuff before, but I'm just now learning about these placement programs.  If I recall correctly, I used examine lwr to look at the LWRs.  I also tried assign and one other subcommand that is slipping my mind right now.

With regards to the relationship of the ingroup and outgroup, I don't think they are too closely related.  This is a phylogeographic study so we have about 250 samples with an N=5 from around 50 populations.  The outgroup that I'm using as the query are two samples from a species in another genus within the same family.  I'll try to give epa-ng another shot and see what happens.

Thanks!

Zach

ZWC

unread,
Apr 30, 2021, 9:42:16 AM4/30/21
to Phylogenetic Placement
Rerunning epa-ng with --filter-acc-lwr 1.0 --filter-max 50 does change things a bit, but it still does not seem to place the query sequences in the tree unless I'm misunderstanding the assign subcommand.  I've attached the lwr_list and per_query files in case they are helpful.  If there are other files or suggested commands to try to troubleshoot, I'd appreciate any suggestions.  Thanks for your time and help!
per_query.tsv
lwr_list.csv

Lucas Czech

unread,
Apr 30, 2021, 1:16:58 PM4/30/21
to Phylogenetic Placement

Hi Zach,

last night, I sent a message to this list explaining placement a bit more, but apparently it didn't get through... I'll re-send in a minute, maybe that will explain a bit more.

As for your immediate question here:

 but it still does not seem to place the query sequences in the tree

I don't understand what you mean. The two files that you shared both contain information about AVON-1_ and AVON-2_, which I assume are the two sequences that you placed. As those are present in the files that you shared, they must also have been present in the jplace file that you used as input for the gappa assign command, and hence they were placed by epa-ng on the tree. They are both placed somewhere in the Poeciliidae clade, from what those files show. So, I don't know what you mean by "not placed in the tree" - you have a jplace file, so you have placements! Just open that file in a text editor and have a look at the information that the file contains, maybe that clears things up :-)

Also, could you share that jplace file with us here as well (or send to our emails directly, if you don't want to share the tree that is contained in the file publicly)?

Cheers
Lucas

Lucas Czech

unread,
Apr 30, 2021, 1:17:06 PM4/30/21
to Phylogenetic Placement

Hi Zach,

thank you for the feedback! The gappa documentation is indeed more aimed at running gappa, than explaining the background of how phylogenetic placement works and what it can be used for. We wanted to add more detail on that at some point, but in the meantime, I hope to be able to help you out here a bit.

Phylogenetic placement is a method that takes a fixed phylogenetic reference tree of known sequences, and places new sequences (what we call query sequences, such as meta-barcoding sequences, or, in your case, outgroup sequences) on the branches of that tree - without turning these query sequences into actual new tips of the tree, but rather yielding a mapping/assignment of query sequences to branches of the reference tree. The result is stored in a jplace file, which contains the tree, and the mapping for each query sequence to the branches. This file is computed for example by RAxML-EPA, epa-ng, or pplacer. Then, given this mapping (in form of the jplace file), the data can be further analyzed and visualized, for example with gappa. Gappa itself does not perform placement, but is a downstream/post-processing tool.

If I understand your use case correctly, you have a couple of outgroup sequences that you want to place on the tree, and then see where they end up, is that right? For this, you could for example use the gappa graft command (https://github.com/lczech/gappa/wiki/Subcommand:-graft), which takes the most likely position of each mapped query sequence (the branch where it is most likely placed) and turns that into an actual new branch of the tree. This is a simple visualization technique that works for small numbers of query sequences (as in your case), and results in a newick tree that can be visualized with any typical tree viewer.

However, seeing your in- and outgroup are closely related and seem to have low LWR, it is likely that there is not enough phylogenetic signal in the sequence data to find a confident branch to place these sequences. That can either mean that the region of the genome that your sequences cover is not well suited for the task at hand, and cannot reliably distinguish ingroup from outgroup, or that those are in fact just really closely related to each other. It still can however be the case that all "good" placement positions of the outgroup sequences are in the same clade of the tree, meaning that the outgroup is likely "somewhere in that region", but with not enough signal to determine where exactly.

To figure this out, you could place each of your outgroup sequences individually, and visualize the distribution of all their "good" placement locations individually per outgroup sequence. For example, placing each of them will yield one jplace file per outgroup sequence, which you can then visualize with gappa heat-tree (https://github.com/lczech/gappa/wiki/Subcommand:-heat-tree), which will show you the distribution of "good" (likely) branches on the tree where this sequences was placed.

I'd further suggest to read a couple of papers and other resources on the topic:

Hope that helps, and keep us in the loop with further questions :-)

Lucas

ZWC

unread,
Apr 30, 2021, 2:23:20 PM4/30/21
to Phylogenetic Placement
Hi Lucas,

Thanks for the all the additional info.  I'm drowning in final exam grading right now so I will reply more fully later.  I realize my outgroups are in the jplace file and showing in the files I sent.  I guess what I'm struggling with is visualizing them on the tree.  My understanding was that gappa was for downstream processing/visualization of jplace files, but the only tree or trees I've been able to get from gappa using a few different subcommands (e.g., assign and heat-tree) do not contain my outgroups.

I tried to attach my jplace file but was getting an error.  If this message gets posted without it, I'll have to try to send it separately.  

More later and, as always, thank you!

Zach

ZWC

unread,
Apr 30, 2021, 2:24:56 PM4/30/21
to Phylogenetic Placement
Attempting to send the jplace file again.  Trying to zip it this time because it doesn't seem to like the jplace extension.

epa_result.jplace.zip

Lucas Czech

unread,
May 1, 2021, 3:26:59 PM5/1/21
to Phylogenetic Placement

Hi Zach,

using your jplace file, I have made two heat trees showing the placement positions of both your outgroup sequences, see attachment. The distribution is pretty much all over the tree... Furthermore, looking at the actual placement data, as you did before with the LWR, the jplace file only accounts for ~28% of the placement mass distribution, with almost all LWR values being equal or close to equal around a value of 0.0055. There are 50 locations reported for each of the two sequences in the jplace file, and I guess that if you increased that limit to a higher number, you'd just see more branches with the same probability.

Judging from that, it seems to me that there is close to no phylogenetic signal in the outgroup sequences that would give it any reasonable placement location on the tree. In the absence of that signal, the placement algorithm will then just equally distribute the likelihood of placement over all branches (and then cut off at 50 locations, due to your settings), which is exactly what we are seeing in the attached heat trees.

So, in order to figure out why there is no phylogenetic signal that would help to properly place your outgroup sequences (and thus avoid the unhelpful equal distribution across all branches that we have now), I'd suggest to have a look at the alignment that you used for running the placement (e.g., with https://ormbunkar.se/aliview/). I suspect that the outgroup sequences are not aligned properly to the MSA of the reference tree, and either form a separate block in the alignment, or are so divergent that there is no signal at all. Could you maybe share a screenshot of that alignment, or share the file directly?

Thanks and so long
Lucas


On 4/30/21 11:24 AM, 'ZWC' via Phylogenetic Placement wrote:
Attempting to send the jplace file again.  Trying to zip it this time because it doesn't seem to like the jplace extension.

--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Phylogenetic Placement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylogenetic-plac...@googlegroups.com.
AVON-1_tree.png
AVON-2_tree.png

Lucas Czech

unread,
May 1, 2021, 3:36:02 PM5/1/21
to Phylogenetic Placement

Oh, one more thing. You wrote:

but the only tree or trees I've been able to get from gappa using a few different subcommands (e.g., assign and heat-tree) do not contain my outgroups

That sounds like either I am not understanding what you want to say with that, or that there is still some misunderstanding of how placement works. Your outgroups are placement on the reference tree, as shown in their placement distribution in the two pictures of my previous post. But placement does not make those sequences new branches of the tree - in case that is what you mean by "the tree does not contain the outgroup". What placement gives you is a distribution of potential locations of where your sequences would be attached if they were actually made into branches - but it does not (by itself) turn those sequences into actual branches.

If you want that, have a look at the gappa graft command. However, in your case, it does not make much sense, because what that command does is to take the most likely placement location for each sequence, and turn that into an actual branch of the tree. In your data however, that location is by chance just slightly more likely than all the other 0.0055 LWR locations, and so it is by no means a good or probably position for the sequences. Hence, forcing this to become the actual placement branch of the outgroup would give a false sense of certainty, when in fact your data (currently - before knowing what's going on in the alignment) does not support a proper/stable/reliable outgroup placement location at all.

Hope that helps, and so long :-)
Lucas

Reply all
Reply to author
Forward
0 new messages