In this part of the assignment you will implement a YOLO-like object detector on the PASCAL VOC 2007 dataset to produce results like in the above image. The goal is to help you understand the fundamentals of training an object detector, gain experience with PyTorch, and learn how to use pre-trained models provided by the deep learning community.
Download the starting code here.
The top-level notebook (MP3_P2.ipynb
) will guide you through all of the steps. You will mainly focus on implementing the loss function of YOLO in the yolo_loss.py
file. You will be provided a pre-trained network structure for the model. The network structure has been inspired by DetNet, however you are not required to understand it. In principle, it can be replaced by a different network architecture and trained from scratch, but to achieve a good accuracy with a minimum of computational expense and tuning, you should stick to the provided one.
As you start this part, you will realize that this is a more computationally intensive assignment than what you are used to. In order to get an idea whether your implementation works without waiting a long time for training to converge, here are some average values to expect:
Epoch | mAP |
---|---|
5 | 0.2013 |
10 | 0.2545 |
15 | 0.0.2749 |
20 | 0.2898 |
25 | 0.3069 |
30 | 0.3355 |
35 | 0.3402 |
40 | 0.3347 |
45 | 0.2588 |
50 | 0.3836 |
To train this model in a reasonable amount of time, you'll need to use a GPU. This can either be your personal GPU, Google Colab, or Google Cloud Platform.
If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).
Unless you have a machine with a GPU, running this assignment on your local machine will be very slow and is not recommended. Please use Google Colab or Google Cloud Platform for this assignment. Instructions on setting up vm instances can be found here.
Be careful using GOOGLE CLOUD PLATFORM!! Do not use all of your credits! A fully-train model can take up to 7-8 hours to train.
assignment3_part2
directory and execute the download_data
script provided:
sh download_data.sh
The assignment is given to you in the MP3_P2.ipynb
file. If you are using a local machine, ensure that IPython is installed (https://ipython.org/install.html). You may then navigate to the assignment directory in the terminal and start a local IPython server using the jupyter notebook
command.
The instructions in the yolo_loss.py
file should be sufficient to guide you through the assignment, but it will be really helpful to understand the big picture of how YOLO works and how the loss function is defined.
The following resources are useful for understanding YOLO in detail:
This part of the assignment is due at the same time as Part 1 and all the files need to be uploaded to the same Compass submission by the same partner (the code and notebook files are separate from Part 1, but there is only one report for both).
Please refer to course policies on collaborations, late submission, and extension requests.