I agree with Aki — what you really want to do for any kind of adaptive
trial is stop based on some posterior expectation, not a posterior
predictive expectation.
In addition to the variability issues with WAIC and LOO that Aki notes,
there’s a deeper problem. These measures do not have a natural
calibration, so it’s hard to assign any meaning to the explicit values
outside of their ordering. In particular, while a larger difference between
WAIC or LOO of two models means that one is getting better than
the other, it doesn’t inform how much better one is than another.
Bayes factors are actually similar in that any threshold decision is
arbitrary. The advantage of Bayes factors are that they are probabilities
(or at least proportional to probabilities) and that helps in informing that
decision.