Hi,
> On Sun, Dec 2, 2018 at 2:05 PM Michael Duncan
> <
mjsd...@gmail.com <mailto:
mjsd...@gmail.com>> wrote:
> can any moses gurus comment on using the diversity scoring
> options? for some bioinformatics work we are more
> interested in evolving a minimal ensemble of models to
> maximally represent the patterns distinguishing two sample
> sets rather than just maximizing out of sample prediction
> accuracy. in other words, we want to maximize the number of
> unique features in a model ensemble of a given size and
> accuracy. more generally, are there procedures for choosing
> optimal ensemble models beyond combining the top n models
> from different cross-validation runs?
Yes, moses can take into account model diversity when sorting the top n
models of the metapopulation.
I recall that it works very well, but of course that depends on how
moses is being used.
My advice would be to start by setting --diversity-pressure to 0.1, then
double till you get passed 10.
See if you obtain a more diverse population.
I think the the tool
eval-diversity
may help you to measure the diversity of each run (if moses isn't enough).
If you're not happy with the result it might mean that you don't let
moses evolve enough demes. Remember that diversity work at the
metapopulation level, thus it affects the choice of the next deme
exemplar, so you really need to let moses explore multiple demes to
build-up diversity.
Also, are you looking for feature set diversity? Or are you looking for
candidate diversity (expressing different output behaviors, regardless
of whether they use different features)?
If you're mostly interested in feature set diversity then I would
recommend to enable diversity at the feature selection stage, see
--fs-diversity-pressure
of course it's only gonna work if you're using feature selection to
begin with (which I would recommend).
The other option, that doesn't involve using any diversity flag, is to
use tune feature selection to be highly sensitive of random
fluctuations. I forgot how to do that but I could dig it up if you want
me to. The advantage is that it's gonna yield diversity at a really low
computational cost.
Anyway, hope it helps, feel free to send me you're moses commands and
data so I can provide you more guidance.
Nil