There is no "correct" and "wrong" here. It depends to what end you want to filter the rogues for. Whether an OTU goes rogue, and too which extent, can have many reasons (lack of discriminate signal, being primitive, i.e. acting like an actual ancestor, incomplete lineage sorting issues, and being the product of reticulation).
So, if you data is messy producing a tree with poor branch support and you don't have the luxury to bother about the density of the tip sample (number and coverage of OTUs), using a threshold is a consistent way to do it. Since there are so many reasons for going rogue (see below), there is no actual fixed cut-off value; so one would always be better off with using more than one and see how the topology changes (e.g. in Alexi's example screenshot, 0.4 and 0.7 would be cut-offs, I'd go for rather than sticking to 0.5)
But if you want to assess how much individual rogues bug the tree inference, you do an iteration inference dropping the most roguish in the list and stop when the tree is clear (e.g. all branches supported by BS ~ 100).
Which one is the better choice also depends mainly on the basic tree-likeness of the data. RogueNaRok
tests the behaviour in direct relation to the data set and the (in)capacity of the data to find a tree. E.g. when
there's only a few OTUs messing up an otherwise trivial tree, they
will be
very easy to spot in RogueNaRok's output (like in Alexi's example screenshot) and there would be no reason for thinking about applying a thresholds. But if your matrix does not produce very treelike signal
at all, i.e. you sticked your finger in a rogue's nest, RogueNaRok may come up with a more or less continuous list and one may be better off comparing topologies using different cut-offs and increasingly reduced OTU sets. For instance if you have a noisy matrix, high cut-off threshold may filter for deep signal, while low will give you less biased terminal subtrees – the basic assumption being here that evolution becomes messier (less treelike) the closer we get to the tips of the Tree of Life, but in the long run sorts out towards a coalescent.
Good de-roguing, Guido
PS principal
tree-likeness of the matrix and its OTUs can be quickly assessed using Delta
Values (Holland et al. 2002. Mol. Biol. Evol. 19:2051-2059),
here you can find a simple programme:
dist_stats to calculate them.
RAxML8 has a ML distance
export using the optimised model (doesn't seem to be yet implemented in RAxML-NG, only topological distances). You can't read in RAxML's output directly because it is a list of pairwise distance, but input for dist_stats has to be a distance matrix in extended PHYLIP format. Then compare how
the iDV, the "individiual Delta Values" match with RogueNaRok's score (on the dist_stats page you find links to papers that made use of Delta Values.
In a messy matrix, you'll have iDVs ≥ 0.25 and building a continuum, in a tree-like iDVs will typically be ≤ 0.15 and a few outliers identified as obvious rogues in the RogueNaRok output (the values are just very rough rules of thumbs, Delta Values are affected by taxon sampling and number of sequence patterns in the matrix).