deep rl for chess

105 views

Skip to first unread message

Leif Johnson

unread,

Sep 15, 2015, 9:43:14 PM9/15/15

to ut-flare

Thought this paper might be of interest to folks in this group (and
probably the RL group too):

http://arxiv.org/abs/1509.01549

Giraffe: Using Deep Reinforcement Learning to Play Chess
Matthew Lai

This report presents Giraffe, a chess engine that uses self-play to
discover all its domain-specific knowledge, with minimal hand-crafted
knowledge given by the programmer. Unlike previous attempts using
machine learning only to perform parameter-tuning on hand-crafted
evaluation functions, Giraffe's learning system also performs
automatic feature extraction and pattern recognition. The trained
evaluation function performs comparably to the evaluation functions of
state-of-the-art chess engines - all of which containing thousands of
lines of carefully hand-crafted pattern recognizers, tuned over many
years by both computer chess experts and human chess masters. Giraffe
is the most successful attempt thus far at using end-to-end machine
learning to play chess.

--
http://www.cs.utexas.edu/~leif
https://github.com/lmjohns3

Matthew Hausknecht

unread,

Sep 17, 2015, 4:02:07 PM9/17/15

to Leif Johnson, ut-flare

Thanks, for the link. Also on the Deep-Reinforcement-Learning side of things, Deep Mind recently released a paper extending their previous results into continuous action spaces:

http://arxiv.org/abs/1509.02971

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

--
You received this message because you are subscribed to the Google Groups "FLARE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ut-flare+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages