Reversi program using Monte Carlo Tree Search

Implementing a Reversi Playing Program Using Monte Carlo Tree Search

Reversi Rules and Gameplay

One of the intriguing features of AlphaZero is that it learned to play chess (and Go, and Shogi) extremely well given only the rules of chess (or Go, or Shogi), and no special knowledge about how to make good moves. Here, we want you to write a program that does the same thing but for a much simpler game: Reversi Your task is to implement (in C++) a Reversi playing program where a human can play against the computer, and the computer makes all its moves using random playouts as described below. Importantly, your program should not use any information about Reversi beyond the essential rules of how to play it, how to determine legal moves, and how to recognize if the game is a win, loss, or draw. Your program should work as follows. When itâ€™s the computers turn to make a move on a board, it should make a list of all legal moves. Then for each of the moves it does some number of random playouts. A random playout is when the computer simulates playing the game until it is over. During a random playout, the computer makes random moves for each player until a win, loss, or draw is reached. When a playout is done, the result (win, loss, or draw) is recorded, and then some more random playouts are done. After random playouts are done for all legal moves, it choses the move that resulted in the greatest number of wins (or least number of losses, or most number of wins + draws, etc. â€” whatever formula you find is the best way to make a decision based on these win/loss/draw statistics). Do enough random playouts so that your program never loses against a smart player. The exact number of random playouts that the computer should do is an interesting question, and you should determine this number by doing some experiments. Aim for the fewest number of playouts that results in your program never losing. Doing lots of playouts as quickly as possible is the key to success. You should keep an estimate of how many playouts per second your program can do as a measure of its performance. You program should play as well as possible. So that games donâ€™t go on too long, you should put a limit on the maximum amount of time the computer can use (e.g. 5 seconds at most per move). In addition to the pure Monte Carlo Tree Search version of your program, create a second modified version of your program that uses heuristics â€” and any other additions you think would help â€” to make (hopefully) better-than-random moves during the playouts. Make this version of your program play against the other version to see which one is better. A written report of at least one page that explains your implementations, discusses how well they play, the extra techniques used by your second version, a comparison of which program is stronger, etc. This should include graphs and tables of relevant data.

Get instant help from 5000+ experts for