Hi All,
My 2 cents. Bootstrap replicates are really not something we will deal with in PartitionFinder. There are lots of reasons, but the main one is that if we find a good partitioning scheme that has some small subsets (which we often do), it doesn't seem sensible to reject that just because bootstrapping is difficult for those small subsets. More generally, I am not convinced about bootstrapping on two fronts: (1) with large datasets I'm not sure it's a sensible approach at all; (2) with small subsets (even inside large datasets) it doesn't seem sensible either. For example, at the limit, bootstrapping small partitions is completely pointless: a partition of a single base cannot be meaningfully bootstrapped in the way that RAxML builds bootstraps. I'd suggest that a more sensible approach here is to analyse branch support with methods that don't require bootstraps - these could be the aLRT (implemented in PhyML) which just tests for 0 length branches, or use MCMC methods (e.g. ExaBayes). Both of these would work fine on large datasets, and would also (in my opinion) give results that are easier to interpret than bootstraps in any case (especially bootstraps that are made by resampling each partition).
The stuff with PF I will look into. It will take a few weeks, because I'm pretty much booked up until the end of the month with other things and deadlines. My aim here is that if PF spits out a partiitoning scheme, it should run in RAxML. Thus, the most useful test dataset you can send me is one for which PF produces a partitioning scheme that will not run in RAxML. It might be as simple as turning on the -O flag in RAxML. If that's the case, I don't really see that I have anything to fix in PF, since it is already doing what it should.
However, here are some options of things I could implement, that folks here can comment on:
1. I could put in an option to remove the -O flag inside PartitionFinder. That way we could guarantee that what PF spits out will indeed run in RAxML without the -O flag.
2. If you can tell me exactly what the problems are in EXaML, I could look into putting in catches for these inside PF, with the aim that what PF spits out will work in EXaML too (though we could cross our fingers that option 1 would fix this straight away!).
Finally, if you are using kmeans, there is a better option for bootstraps than what you are currently doing, which would get around the problem of RAxML's bootstrap algorithm creating issues when resampling small subsets. Instead of bootstrapping a partitioned dataset, you could do this:
1. Create 1000 bootstraps of the entire dataset (i.e. as one partition).
2. Run each through the kmeans algorithm
3. Run RAxML / EXaML on each dataset
This may take some time, but as long as we can solve the issue of what PF spits out, it is guaranteed to solve your problem. I also think it's a more rigorous solution in general, because I'm really not sure what bootstraps mean when you resample partitioned datasets, particularly when there are a lot of partitions. I think that 100 bootstraps with this approach might be more meaningful than more bootstraps with some other approach.
Thoughts?
Cheers,
Rob