Hi Karen,
unfortunately I cannot help you with the main issue - seems to be yet another problem in LG4X parameter optimization
routine...
But here are the answers to some of you side questions:
> (I could mix BS from version 3.0.14 and tree searches from version 3.0.15 but this to me seems not like it should
> be. If I do so - are the binaries - though created with different parser versions compatible ?
There is a safety check in the code which will prevent ExaML from loading binaries generated by the parser from
different versions. However, I'm pretty sure that the binary format itself didn't change between 3.0.14 and 3.0.15, so
technically it should be possible to load older binaries. So if you really want/have to do this, please let me know and
I'll tell you where you can disable the safety check in the code.
>2) is it normal that tree searches on BS replicates in general
>a) run much faster then on the original dataset and
This is something I've observed as well, although as far as I remember the difference in running time was not that
dramatric (i.e. they ran faster, but not much). One explaination I have for this is that BS replicates will have less
unique site patterns compared to the original alignment (cause some columns will be sampled twice). You can check the
difference in the number of patterns by looking at the ExaML output.
>b) the size of the BS binaries is different to the original (reduced) one (if you make a binary out of it) - all BS
>binaries are of same size same for one run different to the original (reduced) binary size - sometimes larger
>sometimes smaller
BS binary size could be different due to pattern compression (s. above), although I cannot see why it could become
larger that the original...
>3) running partitioned datasets and BS: sometimes BS replicates on aa data (and nt) are generated that not fullfill
>RaxML/ExamL criteria (not having 20 aa states.
>Is there another way to circumvent this (e.g. modify it or write a wrapper that it continues so long and only produce
>replicates that fullfill criteria? Or might this be biased?
@Alexis: I also have a feeling this requirement might be too strict, especially for AA data (e.g. if we have 19 out of
20 states in alignment). Would it be possible to use some kind of smoothing, e.g. assign some low but non-zero
frequencies to the missing states:
https://en.wikipedia.org/wiki/Additive_smoothing
?
Hope this helps at least a bit...
Alexey
On 18.09.2015 14:05, Karen wrote:
>