1-Agent guesses a 3-digit secrete number.
2-Gets a reward of how close the guess is to the secrete number. T
3-The aim is to guess the secret number with fewest trials possible.
4-Computer compares the guess number against the secret number, digit
by digit, and counts the number of HITs and NEARs. A HIT is given when a correct
digit is in the correct position. A NEAR is given when a correct digit is in the wrong
position
5-The reward the agent gets is an integer number based on the HIT (10 points)
and NEAR (1 point)counts.
6-A game terminates if the agent correctly guesses the secret number, or the
number of trials reaches timeout.
The game code is properly working against a human right now. What I will do is that I simulate a human with an agent which will make guesses but will refine its predictions with reinforcement learning.
I planned to use Q-Learning although no restriction about the learning algorithm exists.
I also found some useful tutorials about Q-learning and trying to figure out how to implement this agent.
I will be happy to see any comment or sample code regarding to this implementation.
Thanks.
Unique digits, or are repetitions allowed?
If repetitions are allowed, and the agent guesses the same digit in all
positions, and that digit happens to appear only once in the actual
number, then what score would be obtained? 10 for the right digit in the
right place, clearly, but the other copies of the digit that were
entered would also qualify as "a correct digit in the wrong position",
so would that mean a total score of 12?
Under certain scoring methods, there are strategies with a bounded
number of moves, as the game is very similar to Master Mind
http://en.wikipedia.org/wiki/Mastermind_%28board_game%29#Five-guess_algorithm
Number repetition is not allowed. So each secret number has unique digits.
> Number repetition is not allowed. So each secret number has unique digits.
In that case there is a fixed strategy with a bounded number of moves, and no
learning required (other than the fact that humans have a hard time keeping
the complete solution in memory.) The challenge would then involve setting the
maximum number of trials to below the upper bound of the number of moves for
solving the problem.
I have not worked out the details of the solution, but Knuth's algorithm that
I referenced before for Mastermind should provide a solid basis for it.
Yes the agent does not learn something really but just refines its guesses based on its previous ones.Since the computer gives cues in the form of HIT and NEAR with respect to the location of digit in the secret number.
I will check Knuth's algorithm.
Thanks.