Spring 2022 CS 444

Assignment 3 Part 2: YOLO Object Detection on PASCAL VOC

Due date: Tuesday, April 5th, 11:59:59PM


Part 2 Task

In this part of the assignment you will implement a YOLO-like object detector on the PASCAL VOC 2007 dataset to produce results like in the above image. The goal is to help you understand the fundamentals of training an object detector, gain experience with PyTorch, and learn how to use pre-trained models provided by the deep learning community.

How to start

Download the starting code here.

The top-level notebook (MP3_P2.ipynb) will guide you through all of the steps. You will mainly focus on implementing the loss function of YOLO in the yolo_loss.py file. You will be provided a pre-trained network structure for the model. The network structure has been inspired by DetNet, however you are not required to understand it. In principle, it can be replaced by a different network architecture and trained from scratch, but to achieve a good accuracy with a minimum of computational expense and tuning, you should stick to the provided one.

As you start this part, you will realize that this is a more computationally intensive assignment than what you are used to. In order to get an idea whether your implementation works without waiting a long time for training to converge, here are some average values to expect:

Epoch mAP
5 0.2013
10 0.2545
15 0.0.2749
20 0.2898
25 0.3069
30 0.3355
35 0.3402
40 0.3347
45 0.2588
50 0.3836

To train this model in a reasonable amount of time, you'll need to use a GPU. This can either be your personal GPU, Google Colab, or Google Cloud Platform.

Environment Setup (Local)

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended. Please use Google Colab or Google Cloud Platform for this assignment. Instructions on setting up vm instances can be found here.

Be careful using GOOGLE CLOUD PLATFORM!! Do not use all of your credits! A fully-train model can take up to 7-8 hours to train.

Data Setup (Local)

Once you have downloaded the zip file, go to the assignment3_part2 directory and execute the download_data script provided:
sh download_data.sh

IPython

The assignment is given to you in the MP3_P2.ipynb file. If you are using a local machine, ensure that IPython is installed (https://ipython.org/install.html). You may then navigate to the assignment directory in the terminal and start a local IPython server using the jupyter notebook command.

Useful Resources

The instructions in the yolo_loss.py file should be sufficient to guide you through the assignment, but it will be really helpful to understand the big picture of how YOLO works and how the loss function is defined.

The following resources are useful for understanding YOLO in detail:

Submission Instructions

This part of the assignment is due at the same time as Part 1 and all the files need to be uploaded to the same Compass submission by the same partner (the code and notebook files are separate from Part 1, but there is only one report for both).

  1. Upload your output file to the Kaggle competition for the YOLO detector.
  2. All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid_mp3_part2_code.zip. Do NOT include datasets in your zip file.
  3. Your ipython notebook with output cells converted to PDF format. The filename should be netid_mp3_part2_output.pdf.
  4. A brief report for both parts in PDF format using this template. The filename should be netid_mp3_report.pdf.

Please refer to course policies on collaborations, late submission, and extension requests.