Fall 2020 CS498DL

Assignment 3 Part 2: YOLO Object Detection on PASCAL VOC

Due date: Thursday, November 5th, 11:59:59PM

Created by Daniel McKee and Maghav Kumar. Updated by Aiyu Cui and Jeffrey Zhang.


Part 2 Task

In this part of the assignment you will implement a YOLO-like object detector on the PASCAL VOC 2007 dataset to produce results like in the above image. The goal is to help you understand the fundamentals of training an object detector, gain experience with PyTorch as well as teaching how to use pretrained models provided by the deep learning community.

How to start

Download the starting code here.

The top-level notebook (MP3_P2.ipynb) will guide you through all the steps. You will mainly focus on implementing the loss function of YOLO in the yolo_loss.py file. You will already be provided a pre-trained network structure for the model. The network structure has been inspired by DetNet, however you are not required to understand it. In principle, it can be replaced by a different network architecture and trained from scratch, but to achieve a good accuracy with a minimum of computational expense and tuning, you should stick to the provided one.

We also provide yolo_loss_debug_tool.ipynb for you to debug.

As you start this part, you will realize that this is a more computationally intensive assignment than what you are used to. We will soon be providing some initial expectations of mAP values as a function of epoch so you can get an early idea whether your implementation works without waiting a long time for training to converge.

You will need a GPU for this assignment, hence you should use the provided Google Cloud credits.

Environment Setup (Local)

If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).

Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended. Please use Google Cloud for this assignment.

Be careful using GOOGLE CLOUD!! Do not use all your credits! We will soon post on Piazza how long the training is expected to take on the Cloud, but initial estimates tell us a fully trained model should take around 7-8 hours.

Data Setup (Local)

Once you have downloaded the zip file, go to the Assignment3 folder and execute the download_data script provided:
        
    ./download_data.sh
            
    

IPython

The assignment is given to you in the MP3_P2.ipynb file. If you are using a local machine, ensure that ipython is installed (https://ipython.org/install.html). You may then navigate the assignment directory in terminal and start a local ipython server using the jupyter notebook command.

Useful Resources

The instructions in the yolo_loss.py file should be sufficient to guide you through the assignment, but it will be really helpful to understand the big picture of how YOLO works and how the loss function is defined.

The following resources are useful for understanding YOLO in detail:

Submission Instructions

This part of the assignment is due on Compass on due date specified above. One partner must upload the following files for this part (the netid below should be that of the submitting partner).

  1. Upload your output file to the Kaggle competition for the YOLO detector.
  2. All of your code (python files and ipynb file) in a single ZIP file. The filename should be netid_mp3_part2_code.zip.
  3. Your ipython notebook with output cells converted to PDF format. The filename should be netid_mp3_part2_output.pdf.
  4. A brief report for both parts in PDF format using this template. The filename should be netid_mp3_report.pdf.

Please refer to course policies on collaborations, late submission, and extension requests.