RL for Blackjack | 2 Agent Q-Learning DT

CS50 Introduction to Artificial Intelligence with Python

Answered

Question:

Reinforcement Learning

You will program an RL agent that plays Blackjack with a dealer. If you aare not familiar with the game, please search on the internet to learn the rules. In summary, a player attempts to beat the dealer by getting a count as close to 21 without going over 21. It is up to each individual player if an ace is worth 1 or 11. Face cards are 10 and any other card is its pip value. The game will be played between a player and the dealer. The objective is to develop an agent that will act as the player and try to win against the dealer.

We have provided the starter code that has everything to run the game except that the user is playing against the dealer. You will need to develop an agent that can play the game automatically (replacing the human player's input) and win money in the long run (after many hands). If you run the starter code, you should get an output like the image below. You have to modify the code so that instead of taking input from the user, the agent plays based on a policy.

Part 1

In this part, you need to modify the provided code to integrate it with an agent so your agent can play with the computer dealer for as many hands as needed. You should implement two agents:

Agent 1 - alway select “hit” or “stay” randomly
Agent 2 - follow the same rule as the dealer (hit if count is less than 17)

You should simulate 1,000 hands and report the overall win or loss for each agent. Submission You should submit the code and a text file contains the result of the 1,000 hands of each agent.

Part2

In this part you should develop a reinforcement learning approach to learn a policy to play against the dealer to win the maximum amount of money (or loss as little as possible). You should model the Blackjack game as a MDP(Markov Decision Process) problem and develop a Q-Learning (DT) approach to learn a pohcy.

Submission

- Code for Q-Learning
- Evaluation of your Q-Learning (reward curve during learning of every 50 hands) in a PDF file
- Learned Q-table
- Complete code of playing with your learned policy.
- Result summary of 1,000 hands

Get instant help from 5000+ experts for