Fall 2018 CS498DL
Assignment 4: GANs and RNNs
Due date: Tuesday, December 4th, 11:59:59PM
Sample images from a GAN trained on the Celeb A dataset
This assignment has two parts. In the first you will use a generative adversarial network to train on the CelebA Dataset and learn to generate face images. In the second part, you will train an RNN for two tasks on text data: language classification and text generation. In the generation task your RNN will learn to generate text by predicting the most likely next character based on previous characters. In the language classification task, your RNN will learn to detect which language a chunk of text is written in (similar to a feature you might find in an online translator). While this might be relatively easy to do if the input text is unicode and unique characters indicate a particular language, we address the case where all input text to converted to ASCII characters so our network must learn instead to detect letter patterns.
In addition to familiarizing you with generative models and recurrent neural networks, this assignment will help you gain experience with how to implement GANs/RNNs in PyTorch and how to process text data for character based prediction tasks.
Download the starting code here.
Part 1: Face Generation with a GAN
Data set up
Once you have downloaded the zip file, go to the Assignment folder and execute the CelebA download script provided:
The Celeb A data provided to you is a preprocessed version which has been filtered using a simple face detector to obtain good images for generation. The images are also all cropped and resized to the same width and height.
The top-level notebook (
We also provide with a notebook to help with debugging called
You will need to use a GPU for training your GAN. We recommend using Colab to debug, but a Google Cloud machine once your debugging is finished as you will have to run the GAN for a few hours to train fully.
GAN output images during training (each iteration has batch size of 128)
Extra CreditExtra credit options for this portion of the assignment:
Part 2: Text Generation and Language classification with an RNN
Data set up
To download the data for the RNN tasks, go to the Assignment folder and run the download_language_data python script provided:
The data for the generation task is the complete works of Shakespeare all concatenated together. The data we use for the language classification task is a set of translations of the Bible in 20 different Latin alphabet based languages (i.e. languages where converting from unicode to ASCII may be somewhat permissible).
Extra CreditFor extra credit in this portion you may download or scrape your own dataset. Your data could be a book from Project Gutenberg by your favorite author, a large codebase in your favorite programming language, or other text data you have scraped off the internet.
This part of the assignment is split into two notebooks (
Both of the RNN tasks in this assignment are not as computation heavy and can be trained in a short amount of time on GPU or CPU.
Environment Setup (Local)
If you will be working on the assignment on a local machine then you will need a python environment set up with the appropriate packages. We suggest that you use Conda to manage python package dependencies (https://conda.io/docs/user-guide/getting-started.html).
The assignment is given to you in the
This part of the assignment is due on Compass on the due date specified above. You must upload the following files for this part.
Please refer to course policies on collaborations, late submission, and extension requests.