Hi Eduardo,
you are referring to the paper "Accuracy of Morphology-based Phylogenetic Fossil Placement under Maximum Likelihood".
If I understand you correctly, your input are two trees (in
Newick format?) that are exactly identical except for the position
of one taxon, right? Are they also identically rooted and rotated
(i.e., both ladderized)? Or are you by any chance working with
phylogenetic placement of sequences, i.e., pplacer or RAxML-EPA,
and have Jplace files as input? That makes it easier to find the
one differing taxon (which would be a placement then, instead of
an actual branch on the tree). This also makes it possible to run
the whole thing for many different single taxa (each of them
placed on the same constance reference tree).
I don't know whether RAxML has an option for this - Alexis can answer this question better. But could you maybe post two exemplary files? I have an idea how to easily implement that. Don't know when I get to do this, but sounds like a fun thing to implement.
So long
Lucas
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
If I understand you correctly, your input are two trees (in Newick format?) that are exactly identical except for the position of one taxon, right?
Are they also identically rooted and rotated (i.e., both ladderized)?
Or are you by any chance working with phylogenetic placement of sequences, i.e., pplacer or RAxML-EPA, and have Jplace files as input?
But could you maybe post two exemplary files? I have an idea how to easily implement that. Don't know when I get to do this, but sounds like a fun thing to implement.
Okay, that makes it way easier ;-)
Next question: I had a look at your example trees. They seem to have exactly the same branch lengths, and even the parts that get divided by the moving taxon in one tree add up to the corresponding branch length on the other tree. So, in this example, calculating the branch distance is doable, because there are no inconsistencies (except for rounding of the numbers). But can we generally assume that this is the case? What if not?
Also, this leads me to some meta-questions: How did you obtain
those trees? If they result from different bootstrap runs or the
like, I'd expect different branch lengths. Also, why is it that
only one taxon moves around? Maybe understanding this helps me in
understanding how to solve those implementation questions.
Thanks, so long
Lucas
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
Hi Eduardo,
thanks for the detailed answer!
The reason why a single species is moving around is because I want to apply the leave-one-out test like the ones from Berger & Stamatakis (2010) and Berger & al. (2011), where every species in a fixed reference tree is pruned out and then re-inserted, and distance between the original position of the query species and its reinsertion point is a measure of the topological error of the phylogenetic placement method being used.
My ultimate goal is to insert a fossil species in a molecular tree. Because the morphological data seems noisy, I want to try out different character weighting methods and use a leave-one-out test to compare their performance. The weighting methods would include the one from Berger & Stamatakis (2010), and a few parsimony-based ones.
I am still working on the character coding in the morphological matrix, so I haven't produced empirical trees at this point. The trees that I sent you were just artificial examples with dummy branch lengths. I think I would use EPA or simple RAxML searches with a skeletal constraint, so whatever changes in branch lengths those methods induce I would expect to be accounted for in the branch length placement distance. However, because the branch length placement distance doesn't make much sense for comparing results between ML and parsimony placements, this distance is not that important for my specific purposes.
My ultimate goal is to insert a fossil species in a molecular tree. Because the morphological data seems noisy, I want to try out different character weighting methods and use a leave-one-out test to compare their performance. The weighting methods would include the one from Berger & Stamatakis (2010), and a few parsimony-based ones.
Okay, that sounds reasonable. So this would give you two trees to compare, the original one (always the same) and one after re-inserting each taxon, so that, for each taxon, you can calculate the distance from where it ended up in the re-insertion-tree to where it originally was. Correct?
So you want to use a similar alignment as shown on the first page of Berger & Stamatakis "Accuracy of Morphology-based Phylogenetic Fossil Placement under Maximum Likelihood", where the first columns correspond to morphological characters, and the rest columns correspond to their molecular data? Thus, the molecular columns would be empty for fossils?
If I understand you correctly, the node and edge distance that you want to calculate here are calculated between the placement positions of your fossil when placing it with ML and with parsimony, respectively. Right?
So, if my previous assumption about the shape of the alignment is right, you are only using the morphological characters for this, right?
This might really mess up the branch lengths... Maybe Alexis can tell you a bit more about this.
So, in your case, the best you can get with a parsimony weighting (from a theoretical point of view) is something similar to the EPA result. So what do you hope to learn from the placement distances?
Finding the best parsimony weighting scheme, the one with the lowest distance to the EPA result? But if this is the logic, why using parsimony weighting at all?!
Are you planning to make simulations with morphological partitions increasingly incompatible with the true or molecular tree to test which method performs best?
Frankly, I would not worry about applying any parsimony weighting scheme. If you have a good molecular guide tree and want to place your fossils individually in that guide tree, you just use the EPA algorithm. If the reviewers mourn, you just do a quick parsimony-optimisation to please them (can also be done with RAxML) and then discuss, why in some cases the most-parsimonious placement is not the most likely one (because it didn't cope with data patterns incompatible with the molecular tree).
And if you want to study the signal in your morphological partition with respect to the fossils. Well, since the data are non-treelike, you anyway should not rely on tree inference but do network-based exploratory data analysis, so also no point in parsimony-weighting here.
For total evidence, you just used the EPA-generated weights for weighting the morphological partition.
Hi Eduardo,
all right, I'm starting to understand ;-) Also, Guido's answer shed some light on this issue, thanks for that!
Okay, so you want to measure the distance of each re-inserted taxon to its original position, and do that once with ML and once with parsimony, and then compare. Did I get it right this time?
As for the branch lengths: We briefly discussed this in our group. From an algorithmic point of view, it makes sense to re-optimize the branch length with morphological data, while keeping the molecular topology. However, depending on the number of columns/traits that you have, this might give unstable/unreliable numbers. How long is your morphological alignment?
Lastly, using this tree (molecular topology, morphological branch
length) with EPA would be easiest, I think (as compared to using
RAxML with the tree as constraint). Also, calculating the needed
distances from the resulting placement files is way easier for me
to implement than the two-trees version.
So long
Lucas
Okay, so you want to measure the distance of each re-inserted taxon to its original position, and do that once with ML and once with parsimony, and then compare. Did I get it right this time?
As for the branch lengths: We briefly discussed this in our group. From an algorithmic point of view, it makes sense to re-optimize the branch length with morphological data, while keeping the molecular topology. However, depending on the number of columns/traits that you have, this might give unstable/unreliable numbers. How long is your morphological alignment?
Hi,
Fairly short, I'm afraid. I estimate that the final matrix will have between 72 to 85 characters, and some of those may end up with weights of 0.
That would only work for EPA with RAxML, not with parsimony, right?
Hi
RAxML also has an option to run EPA with parsimony (-f y). And for the normal ML EPA (-f v), it should be possible to specify a binary model for the morphological characters (-m). I guess that EPA is able to use this model. Maybe Alexis can answer this. Also, see the RAxML Manual for details.That would only work for EPA with RAxML, not with parsimony, right?
The main reason for trying parsimony as well is that RAxML has some limitations for the analysis of morphological data. It does not accept cells with polymorphisms (of which I have plenty in multistate charcters), and it cannot work with combinations of ordered and unordered multistate characters. How much useful signal I would be really losing with those restrictions, I don't know, which is why I want to try the different methods.
I am not used to working with phylogenetic networks, but I am aware of your papers on the topic, and I will try to learn more about it.
Hi Guido,It didn't occur to me to recode the matrix into binary characters! That would indeed be the solution to the missing and mixed character types problems, and would allow me to do everything with RAxML. However, wouldn't the 'subcharacters' created with the recoding violate the assumption of site independence?
Do you know of papers discussing this in the context of statistical phylogenetics?
Thanks for the tips on phylogenetic networks. Where will you be blogging?
I think Guido's suggestion is a really good one, I can't propose any
good alternative.
"... (Alexandros Stamatakis, pers. comm., 2017) ..."