Spring 2022 CS 444

Assignment 5: Deep Reinforcement Learning

Due date: Wednesday, May 4th, 11:59:59PM

In this assignment, you will implement the famous Deep Q-Network (DQN) and its successor Double DQN on the game of Breakout using the OpenAI Gym. Additionally, you will also implement a DQN agent that uses LSTM to encode previous observations rather than past frames as history. The goals of this assignment are to (1) understand how deep reinforcement learning works when interacting with the pixel-level information of an environment and (2) implement a recurrent state to encode and maintain history.

Download the starting code here.

The top-level notebook (MP5.ipynb) will guide you through all the steps of the DQN. You will mainly implement the training of the Agent in the agent.py file for DQN, and agent_double.py for double DQN. We provide you with the neural network. Do NOT change the architecture of the neural network (for consistency of grading). Due to the computational constraints, we only expect you to reach a mean score of 10 (should take around 2000 episodes). config.py contains most of the hyperparemeters. You may play around with these parameters if you want, but the provided parameters will be enough to reach the desired score.

Once you have implemented DQN and double DQN, you are required to implement a DQN agent that uses LSTM with minimal architectural changes. Unlike the previous case, this agent only sees current frames as observations and there is no explicit history. Training code will remain nearly the same as that of DQN, but with small modifications. You are expected to reach a mean score of 8.

Note, as you look in the ipython notebook, in our terminology, a single episode is a game played by the agent till it loses all its lives (in this case, your agent has 5 lives). In the paper, however, an episode refers to almost 30 minutes of training on the GPU and such training is not feasible for us.

We will provide a more thorough table of expected rewards vs. number of episodes on Campuswire to help with your debugging. Your goal is to have either agent.py or agent_double.py reach an evaluation score of 10. To have a fair comparison in the report, we ask you to run both files the same amount of episodes, but only one model is required to reach the evaluation score of 10. For DQN with LSTM, you are required to reach a mean score of 8.

We recommend that you look at the following links:

Official DQN Pytorch Tutorial
Official DQN paper
Official Double DQN paper
DQN Tutorial on Medium (Double DQN is the target DQN variant)

We highly recommend that you understand the Official DQN Pytorch tutorial before starting this assignment. This will give you a great starting point to implement DQN and Double DQN as the tutorial implements a version of double DQN for cartpole! However, we expect you to follow our code instructions and implement code in our format. Uploading code that does not follow our format will receive a zero.

This is a computationally expensive assignment. It is expected that your code should run for at least 4 hours to complete 2000 episodes. You can stop training early if you reach a mean score of 10 in the game. As mentioned, we will be providing some initial expectations of score values with respect to episodes on Campuswire.

This assignment requires a GPU, so use your Google Cloud credits (colab could work for this assignment as well).

Extra Credit

Train a DQN agent for one or more additional Atari games from OpenAI gym and report on any implementation/hyperparameter changes you had to make, and your agent's performance.
Implement policy gradient training or another advanced RL method for Breakout or another Atari game and compare performance (including convergence speed) to your DQN method. You need to write your own code from scratch, not train an off-the-shelf method.

Environment Setup

The assignment is given to you in the MP5.ipynb file. If you are using a local machine, ensure that ipython is installed (https://ipython.org/install.html). You may then navigate the assignment directory in terminal and start a local ipython server using the jupyter notebook command. Instructions to install dependencies are provided at the top of the notebook. Please use environment with Python 3.7. These instructions should work for local machines, google cloud, and google colab. We have test this assignment on pytorch version 1.7, so please install this version if there are other dependency issues.

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended.

Submission Instructions

This is your last assignment, so feel free to use up your remaining late days if you so choose!

All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid1_netid2_mp5_code.zip.
Upload your policy net model (in .pth format) as a separate file.
Your ipython notebooks with output cells converted to PDF format. The filename should be netid1_netid2_mp5_output.pdf.
A brief report in PDF format using this template. The filename you submit should be netid1_netid2_mp5_report.pdf.

Please refer to course policies on collaborations, late submission, etc.