Hi Jacek,
> I was wondering recently what is the exact influence of sites that
> bear limited amount of phylogenetic signal on the inference in RAxML?
> By 'sites with limited signal' I consider the following:
>
> 1. Sites with gaps
> 2. Sites with identical residues in all seqs
> 3. Sites with identical residues in all seqs but one (singletons)
>
> I remember that case 1 was discussed earlier at some point, but what
> about cases 2 and 3? Are they simply omitted from the analysis because
> they are not informative?
Informativeness is a concept from parsimony! They do contribute signal,
but in general very little and are hence not removed from the alignment.
> To test it out on my side I used a single source alignment, but with
> the abovementioned sites removed. Interestingly, I always got
> identical amino acid rate exchangeabilities and frequencies and nearly
> identical topologies and very close average bootstraps, with a roughly
> 30-50% speedup when I removed all three kinds of uninformative sites.
Makes sense and is kind of expected, but the question is if we can
develop objective criteria for removing sites rather than ad hoc
criteria.
> So, it is possible to gain quite a good speedup by removing them, at
> virtually no cost in support or topology accuracy.
> Naturally, this requires some more extensive testing to determine the
> exact amount of potential speedup, the effect on topology (using e.g.
> RF distances) and branch supports, but I just wanted to discuss this
> early on, to know if going this way in my work makes any sense at all?
> Is the information stored in these sites useful for RAxML in some way?
It's worth exploring, but my feeling is that it will be difficult to
come up with good criteria to do this.
Alexis
> I am particularly worried about the effect of singletons, because they
> are not informative sites, but could probably globaly inflate
> bootstrap supports (is 1 OTU vs all-the-rest a valid split or not?)
> and thus make poor branches look better than they actually do.
>
> -Jacek
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
raxml+un...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
--
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson
www.exelixis-lab.org