Den tors 1 juli 2021 kl 14:36 skrev Grimm <
grim...@gmail.com>:
>
> Hi Erik,
>
> yes and yes.
>
> For most identification applications we don't need a comprehensive tip set, but a meaningful one. The identical sequences are little to bother because RAxML (at least the classic version, haven't checked for RAxML-ng) eliminates them pre-inference (being useless for tree optimisation).
>
> More annoying are the near-identical, highly similar ones leading to flat terminal subtrees. Probabilistics need something to work with, and a (often random) mutation or two only increase computation time (including bootstrapping). Inference-wise, there's little to gain from a large-sampled flat subtree regarding the deeper relationships (there are good arguments for large samples in other circumstance, e.g. where we want to diminish support for a branching artefact and to better reflect topological ambiguity). Also, from a theoretical point of view the model assumptions of probabilistic tree inference (we are optimising mutation probabilities using a dichotomous evolution model, but we look increasingly at fixation probabilities affected by population dynamics) hardly apply the closer we come to the very leaves of a tree.
>
> So, the much more efficient way is to infer a meaningfully sampled background topology, with the tips being representative for your question, and then place the rest using EPA.
>
> PS EPA will also be negatively affected, in a sense, by too many, too similar tips: you will have placements with low LWR because they are going to be split between the many, near-identical tips; it's not really a big issue because one can easily see the difference between a poorly placed/hard to place query (low LWR but scattered in different subtrees or along the root-proximal tree parts) and one well-placed (equally low LWR but within a clearly distinct subtree)
>
> Here's a screenshot from a table (3 MB large file) how a fully investigated jPlace file can look like (using HTS reads from a plant genus).
> My idea was simple: Labelling the main aspects of the backbone tree (Clade x, y subtree, x root, deep branches etc), then use PivotTable to calculate cumulative LWRs (in case the aspect is a subtree not a single tip) and sort that visually to quickly interpret the jPlace result phylogenetically.
>
> Good placing, Guido
>
>