Fall 2018 CS498DL

Assignment 3 Part 2: YOLO Object Detection on PASCAL VOC

Due date: Tuesday, November 13th, 11:59:59PM

START EARLY!!! If you haven't started Part 1 yet, you might be in trouble. YOU HAVE BEEN WARNED.

In this part of the assignment you will implement a YOLO-like object detector on the PASCAL VOC 2007 dataset to produce results like in the above image. The goal is to help you understand the fundamentals of training an object detector, gain experience with PyTorch as well as teaching how to use pretrained models provided by the deep learning community.

Download the starting code here.

The top-level notebook (MP3_P2.ipynb) will guide you through all the steps. You will mainly focus on implementing the loss function of YOLO in the yolo_loss.py file. You will already be provided a pre-trained network structure for the model. The network structure has been inspired by DetNet, however you are not required to understand it. In principle, it can be replaced by a different network architecture and trained from scratch, but to achieve a good accuracy with a minimum of computational expense and tuning, you should stick to the provided one.

The instructions in the yolo_loss.py file should be sufficient to guide you through the assignment, but it will be really helpful to understand the big picture of how YOLO works and how the loss function is defined.

As you start this part, you will realize that this is a more computationally intensive assignment than what you are used to. We will soon be providing some initial expectations of mAP values as a function of epoch so you can get an early idea whether your implementation works without waiting a long time for training to converge.

You will need a GPU for this assignment, hence you should use the provided Google Cloud credits.

DISCLAIMER: This is a pilot assignment, so if something goes wrong because of us, Keep Calm and Post on Piazza and we will sort it out.

Useful Resources

The following resources are useful for understanding YOLO in detail:

Lecture 9 slides 37-41 (recommended)
YOLO original paper (recommended)
Great post about YOLO on Medium
Differences between YOLO, YOLOv2 and YOLOv3
Great explanation of the Yolo Loss function (recommended)
YOLO on SNL :)

For bonus points, feel free to show results of your detector on selected interesting keyframes.

Environment Setup (Local)

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended. Please use Google Cloud for this assignment.

Be careful using GOOGLE CLOUD!! Do not use all your credits! We will soon post on Piazza how long the training is expected to take on the Cloud, but initial estimates tell us a fully trained model should take around 7-8 hours.

Data Setup (Local)

Once you have downloaded the zip file, go to the Assignment3 folder and execute the download_data script provided:

    
cd Assignment3_p2/
./download_data.sh

IPython

The assignment is given to you in the MP3_P2.ipynb file. If you are using a local machine, ensure that ipython is installed (https://ipython.org/install.html). You may then navigate the assignment directory in terminal and start a local ipython server using the jupyter notebook command.

Submission Instructions

This part of the assignment is due on Compass with Part 1 on due date specified above. You must upload the following files for this part.

Upload your output file to the Kaggle competition for the YOLO detector. The Part 2 materials have been updated with Kaggle submission code.
All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid_mp3_part2_code.zip.
Your ipython notebook with output cells converted to PDF format. The filename should be netid_mp3_part2_output.pdf.
A brief report for both parts in PDF format using this template. The filename should be netid_mp3_report.pdf.

Please refer to course policies on collaborations, late submission, and extension requests.