On 5/28/21, Álvaro Begué <
alvaro...@gmail.com> wrote:
> This idea is old. For instance, see the 1999 paper "Propagating
> Distributions Up Directed Acyclic Graphs" by Eric Baum and Warren D. Smith
> (who used to hang out in this forum).
>
> However, I think this is fundamentally wrong. Rational decision making is
> achieved by maximizing the expected value of the utility of the outcome
> (see Von Neumann–Morgenstern utility theorem).
--yes.
> This is a single number,
--yes.
> so I think it is correct for engines to have a one-dimensional evaluation
> space.
--that does not follow.
>If a move is risky, it's because it increases the probability of a
> bad outcome, and the evaluation already contains a penalty for it.
--well, no: a move can have a high variance: possibly high reward but
also possibly high negative reward.
You might find out which one if explore it deeper; but if do not, then
you do not know which.
But even without exploring deeper, you can still know "it has high variance"
versus "low" IF you have a prob-distribution-valued eval function.
And then you can (1) use that knowledge to try to guide your decisions about
which parts of the search tree to explore deeper. You then will
search more effectively.
Also, even without exploring deeper, (2) the expectation
values of internal nodes of the tree will change
if the leaves have prob-distribution-values, versus if those leaves
just had scalar values
equal to the expected utility of that prob-distribution.
Baum and Smith both invented all this, and then demonstrated experimentally that
the better internal node valuations (2) yielded stronger gameplayers,
and also that the smarter search (1) also yielded stronger players.
(I'm Smith.)
These strength advantages were very large for some games.
And chess and go looked like they would be games for which it would work well,
in fact probably better than any of the games we tried.
[But we never experimented on chess+go because we only experimented on a
variety of simpler games, basically because we were not good enough programmers
and/or not enough time. Also, my interest in chess came at the wrong
time, just before IBM did deep blue hardware. An my interest in go
also at exactly the wrong time, just before alphazero team.]
So sorry: your criticism here holds no water.
However, a different criticism which actually might pack a punch is this:
when we did those experiments, it was back before stockfish introduced
powerful selective AB search tricks (late move reduction...) which
greatly increased
the strength of chessprogs. So it might be that those tricks decrease the
advantage of BPIP search versus AB, perhaps by enough that we should not
care about BPIP anymore. That is unknown. More recently I realized
that MCTS searches in fact can be regarded as a kind of "poor man's
BPIP search" which may or may not be a better way to get BPIP-like
effects. (They kind of sample from a prob-distribution rather than
represent it explicitly, but near-same effect.) Alphazero's success
with MCTS then could be regarded, therefore, as more evidence Baum+Smith were on
right track.
The Baum+Smith work may not have gotten the exploration it deserves.
It may still have the potential to vastly increase the strength of
chess and/or go programs. Or not. Hard to say until/unless worked on.
--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)