Spring 2021 CS498DL

Assignment 3 Part 2: YOLO Object Detection on PASCAL VOC

Due date: Thursday, April 8th, 11:59:59PM

Created by Daniel McKee and Maghav Kumar. Updated by Aiyu Cui and Adam Stewart.

Part 2 Task

In this part of the assignment you will implement a YOLO-like object detector on the PASCAL VOC 2007 dataset to produce results like in the above image. The goal is to help you understand the fundamentals of training an object detector, gain experience with PyTorch, and learn how to use pre-trained models provided by the deep learning community.

How to start

Download the starting code here.

The top-level notebook (MP3_P2.ipynb) will guide you through all of the steps. You will mainly focus on implementing the loss function of YOLO in the yolo_loss.py file. You will be provided a pre-trained network structure for the model. The network structure has been inspired by DetNet, however you are not required to understand it. In principle, it can be replaced by a different network architecture and trained from scratch, but to achieve a good accuracy with a minimum of computational expense and tuning, you should stick to the provided one.

As you start this part, you will realize that this is a more computationally intensive assignment than what you are used to. In order to get an idea whether your implementation works without waiting a long time for training to converge, here are some average values to expect:

Epoch	Test loss
1	5.25
2	4.62
5	3.66
10	3.21

Epoch	mAP
5	0.0000
10	0.2412
15	0.3754
20	0.4096
25	0.4823
30	0.4892
35	0.5092
40	0.5049
45	0.5109
50	0.5088

To train this model in a reasonable amount of time, you'll need to use a GPU. This can either be your personal GPU, Google Colab, or Google Cloud Platform.

Environment Setup (Local)

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended. Please use Google Colab or Google Cloud Platform for this assignment.

Be careful using GOOGLE CLOUD PLATFORM!! Do not use all of your credits! A fully-train model can take up to 7-8 hours to train.

Data Setup (Local)

Once you have downloaded the zip file, go to the assignment3_part2 directory and execute the download_data script provided:

sh download_data.sh

IPython

The assignment is given to you in the MP3_P2.ipynb file. If you are using a local machine, ensure that IPython is installed (https://ipython.org/install.html). You may then navigate to the assignment directory in the terminal and start a local IPython server using the jupyter notebook command.

Useful Resources

The instructions in the yolo_loss.py file should be sufficient to guide you through the assignment, but it will be really helpful to understand the big picture of how YOLO works and how the loss function is defined.

The following resources are useful for understanding YOLO in detail:

Lecture 9 (recommended)
Original YOLO paper (recommended)
Great post about YOLO on Medium
Differences between YOLO, YOLOv2 and YOLOv3
Great explanation of the Yolo Loss function (recommended)
- For bonus points, feel free to show results of your detector on selected interesting keyframes.

Submission Instructions

This part of the assignment is due at the same time as Part 1 and all the files need to be uploaded to the same Compass submission by the same partner (the code and notebook files are separate from Part 1, but there is only one report for both).

Upload your output file to the Kaggle competition for the YOLO detector.
All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid_mp3_part2_code.zip. Do NOT include datasets in your zip file.
Your ipython notebook with output cells converted to PDF format. The filename should be netid_mp3_part2_output.pdf.
A brief report for both parts in PDF format using this template. The filename should be netid_mp3_report.pdf.

Please refer to course policies on collaborations, late submission, and extension requests.