Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

[NON-SOLUTION] Learning Tic-Tac-Toe (#11)

0 views

Skip to first unread message

Hans Fugal

unread,

Dec 13, 2004, 1:06:51 PM12/13/04

Ok, I hope I don't become infamous for non-solutions, but I like to
share what didn't work because that can be almost as instructive as what
did work.

As TTT is a trivial and largely uninteresting game (as has been
discussed), I was more interested in the journey than the result. I
decided to brush up on my Q-Learning and so coded up a Q learner AI. I
then pitted my q learner against itself and let it go for several hours.
(I didn't have the time to observe it closely or the inclination to
implement some reporting to make pretty graphs or whatnot) I expected
that when I came back it would be drawing most of the time with itself,
but it was not. It seemed to be playing more or less decent games, with
each one winning roughly half the time, but very rarely would they draw.
I think perhaps they were taking more risks than they ought to have, or
perhaps I had my constant of randomness on their decisions too high.

Of course it's also entirely possible I have an implementation bug. :)

The other possiblity that I'm afraid of is a theoretical bug. I'm not
entirely sure q-learning is appropriate for this solution the way I
implemented it. My concern is that the way I implemented it it might not
be a true MDP, because the state transition is not deterministic. (The
reward/next state pair is taken from when it's our turn again, and what
the other player does is nondeterministic)

I left the laptop home and turned off, but I'd be happy to send the code
later if anyone's interested in seeing it. :-)

James Edward Gray II

unread,

Dec 13, 2004, 1:48:32 PM12/13/04

On Dec 13, 2004, at 12:07 PM, Hans Fugal wrote:

> Ok, I hope I don't become infamous for non-solutions, but I like to
> share what didn't work because that can be almost as instructive as
> what did work.

I too ran into road blocks. I tried to recreate Donald Michie's famous
"matchbox machine", MENACE, in Ruby. Unfortunately, i tripped myself
up with my own cleverness.

I was optimizing the amount of positions the machine had to consider
(see my earlier message about rotation and mirroring). My
implementation was flawed though and failed to shift the weights for
possible moves accordingly. This lead my AI to try bogus moves.

There's also a more subtle flaw in my matchbox/bead logic, affecting
the machines ability to learn from it's mistakes. I was in the process
of tracking that down when I began running into the above problem.

Both of these problems are fixable, of course, but I ran out of time
this weekend. In the end, I blame myself. I over complicated it and
could have kept things easier on myself. Live and learn, as the saying
goes.

James Edward Gray II

0 new messages