On January 12, 2020 at 11:49:35 AM UTC-7, Tim Chow wrote:
> On January 12, 2020 at 10:59:22 AM UTC-5,
bgbl...@googlemail.com wrote:
>> Am 11. Januar 2020 19:20:33 UTC+1 schrieb Tim Chow:
>>> Does BGBlitz include "human biases" or does it learn
>>> purely from self-play?
>> It depends on what you regard as "human bias".
>> The inputs are more or less in line with what Berliner
>> already did, so are expert inputs "Human bias"?
Of course. What else?
>> BGBlitz learns purely due to self play.
> Well, I assume that what Murat is complaining about is
> the distinction between versions 0.0 and 1.0 of TD-Gammon,
> as described for example here:
Why do you characterize my arguments as "complaining"?
I'm pointing out defects in so-called "NN" or "AI" bots
to show that they are not "NN" or "AI" by definition.
See this article:
https://www.cs.cornell.edu/boom/2001sp/Tsinteris/gammon.htm
that talks about creating a bg bot "free from the biases
of existing human knowledge".
If I follow the versions correctly, first TD-Gammon was
trained unsuperwised, through self-learning without any
human bias and could only play cubeless single games.
Tesauros first bot Neurogammon had relied on supervised
learning as did TD-Gammon starting with the second version.
Human bias was added back to make it play cubeful matches.
Here is another article by Tesauro:
https://www.bkgm.com/articles/tesauro/tdl.html
where he says:
"Strategy for use of the doubling cube was not
"included in TD-Gammon's training. Instead, a
"doubling algorithm was added after training
"that makes decisions by feeding TD-Gammon's
"expected reward estimates into a theoretical
"doubling formula developed in the 1970s.
How is that for "AI"?
I say it's "FAI", "Fartificial Intelligence"...
>
http://papers.nips.cc/paper/1302-on-line-policy-improvement-using-monte-carlo-search.pdf
Apparently this version was cubeless also since he says:
"In future work, we plan to augment the program
"with a similar Monte-Carlo algorithm for making
"doubling decisions.
What is also interesting in that article is about the CPU
power, where he says:
"Our Monte-Carlo simulations were performed on the
"IBM SP1 and SP2 parallelRISC supercomputers at IBM
"Watson and at Argonne National Laboratories. Each
"SP node is equivalent to a fast RSj6000, with
"floating-point capability on the order of 100 Mflops.
"Typical runs were on configurations of 16-32 SP nodes,
"with parallel speedup efficiencies on the order of 90%.
Let's say 24 nodes x 100 Mflops x 90% = 2.16 Gigaflops.
Today we have $2,000 desktop PC's achieving Teraflops.
IBM's latest supercomputers are pushing 200 Petaflops,
with even the Blue Gene/Q (or whatever the latest) is
capable of 20 petaflops. That's 10,000,000,000 times
the power of 24 SP2's...!
With that, now you all should re-read all those articles
striking out everything related to CPU power limitations
and/or costs, which was their excuse of taking shortcuts
to substitute human bias for machine self-learning match
play and cube play.
BTW: I just searched Tesauro and found this:
https://researcher.watson.ibm.com/researcher/view.php?person=us-gtesauro
Apparently hes is still quite young (60ish?) and is still
at IBM. Considering his 30+ more years of seniority since
TD-Gammon, I'm sure sure IBM would spare a couple of Blue
Genes for him to work on a AlphaZero BG bot...
MK