Fall 2018 CS498DL

Assignment 5: Deep Reinforcement Learning

Due date: Thursday, December 20th, 11:59:59PM -- No late submissions accepted!


In this assignment, you will implement the famous Deep Q-Network (DQN) on the game of Breakout using the OpenAI Gym. The goal of this assignment to understand how Reinforcement Learning works using deep neural networks when interacting with the pixel-level information of an environment.

Download the starting code here.

The top-level notebook (MP5.ipynb) will guide you through all the steps of the DQN. You will mainly implement the training of the Agent in the agent.py file. We provide you with the neural network. Do NOT change the architecture of the neural network (for consistency of grading). We are consistent with the high level concepts of the paper, but due to the computational constraints, we expect you to reach a mean score of 10 after training for 5000 episodes.

Note, as you look in the ipython notebook, in our terminology, a single episode is a game played by the agent till it loses all its lives (in this case, your agent has 5 lives). In the paper, however, an episode refers to almost 30 minutes of training on the GPU and such training is not feasible for us. We will provide a more thorough table of expected rewards vs. number of episodes on Piazza soon to help with your debugging.

We recommend that you look at the following links provided.

We highly recommend that you do the Official DQN Pytorch tutorial before starting this assignment.

This is a computationally expensive assignment. It is expected that your code should run for at least 13-15 hours to complete 5000 episodes. You can stop training early if you reach a mean score of 10 in the game. As mentioned, we will soon be providing some initial expectations of score values with respect to episodes on Piazza, so stay tuned and in the meanwhile please get started.

This assignment requires a GPU, so use your Google Cloud credits.

Extra Credit

  • Train a DQN agent for one or more additional Atari games from OpenAI gym and report on any implementation/hyperparameter changes you had to make, and your agent's performance.

  • Implement policy gradient training or another advanced RL method for Breakout or another Atari game and compare performance (including convergence speed) to your DQN method. You need to write your own code from scratch, not train an off-the-shelf method.

Environment Setup (Local)

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended.

IPython

The assignment is given to you in the MP5.ipynb file. If you are using a local machine, ensure that ipython is installed (https://ipython.org/install.html). You may then navigate the assignment directory in terminal and start a local ipython server using the jupyter notebook command.

Submission Instructions

Late submissions will not be accepted!

  1. All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid_mp5_code.zip.
  2. Upload your policy net model (in .pth format) as a separate file.
  3. Your ipython notebooks with output cells converted to PDF format. The filename should be netid_mp5_output.pdf.
  4. A brief report in PDF format using this template. The filename should be netid_mp5_report.pdf.

Please refer to course policies on collaborations, late submission, and extension requests.