CS 598 LAZ Reading Lists
January 19: Overview of CNN architectures
- https://culurciello.github.io/tech/2016/06/04/nets.html
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition,
Proc. IEEE 86(11): 2278–2324, 1998
- A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
- M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014
- K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015
- M. Lin, Q. Chen, and S. Yan, Network in network, ICLR 2014
- C. Szegedy et al., Going deeper with convolutions, CVPR 2015
- C. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016
- K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, CVPR 2016
January 24, 26: RNN Tutorial (Arun Mallya)
- R. Pascanu, T. Mikolov, and Y. Bengio,
On the difficulty of training recurrent neural networks, ICML 2013
- S. Hochreiter, and J. Schmidhuber, J.,
Long short-term memory,
Neural computation, 1997 9(8), pp.1735-1780
- F.A. Gers, and J. Schmidhuber, J.,
Recurrent nets that time and count, IJCNN 2000
- K. Greff , R.K. Srivastava, J. Koutník, B.R. Steunebrink, and J. Schmidhuber,
LSTM: A search space odyssey,
IEEE transactions on neural networks and learning systems, 2016
- K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio,
Learning phrase representations using RNN encoder-decoder for statistical machine translation,
ACL 2014
- R. Jozefowicz, W. Zaremba, and I. Sutskever,
An empirical exploration of recurrent network architectures,
JMLR 2015
January 31: Advanced CNN Architectures (Akshay Mishra, Hong Cheng)
- K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, CVPR 2016
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Identity Mappings in Deep Residual Networks, ECCV 2016
- Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten: Densely Connected Convolutional Networks
- Andreas Veit, Michael Wilber, Serge Belongie, Residual Networks Behave Like Ensembles of Relatively Shallow Networks, NIPS 2016
- Klaus Greff, Rupesh K. Srivastava & Jürgen Schmidhuber, Highway and Residual Networks Learn Unrolled Iterative Estimation
February 2: Advanced training techniques (Prajit)
- D. Kingma, and J. Ba, Adam: a method for stochastic optimization, ICLR 2015
- J. Dean et al., Large scale distributed deep networks, NIPS 2012
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, JMLR 2014
- S. Ioffe and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, ICML 2015
- K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, ICCV 2015
February 7: Network compression and speedup (Shuochao, Yiwen, Daniel)
- Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient evaluation." Advances in Neural Information Processing Systems. 2014.
- Jin, Jonghoon, Aysegul Dundar, and Eugenio Culurciello. "Flattened convolutional neural networks for feedforward acceleration." arXiv preprint arXiv:1412.5474 (2014).
- Gong, Yunchao, et al. "Compressing deep convolutional networks using vector quantization." arXiv preprint arXiv:1412.6115 (2014).
- Han, Song, et al. "Learning both weights and connections for efficient neural network." Advances in Neural Information Processing Systems. 2015.
- Guo, Yiwen, Anbang Yao, and Yurong Chen. "Dynamic Network Surgery for Efficient DNNs." Advances In Neural Information Processing Systems. 2016.
- Gupta, Suyog, et al. "Deep Learning with Limited Numerical Precision." ICML. 2015.
- Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during propagations." Advances in Neural Information Processing Systems. 2015.
- Courbariaux, Matthieu, et al. "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1." arXiv preprint arXiv:1602.02830 (2016).
- Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015).
- Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
February 9: Object detection (Jiajun, Sihao, Kevin)
- Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra,
Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014
- He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian,
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV 2014
- Jifeng Dai, Yi Li, Kaiming He, Jian Sun R-FCN: Object Detection via Region-based Fully Convolutional Networks , NIPS 2016
- Girshick, Ross,
Fast R-CNN, ICCV 2015
- Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian,
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, CVPR 2015
- Erhan, Dumitru and Szegedy, Christian and Toshev, Alexander and Anguelov, Dragomir,
Scalable Object Detection using Deep Neural Networks, CVPR 2014
- Bell, Sean and Lawrence Zitnick, C and Bala, Kavita and Girshick, Ross,
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks, CVPR 2016
- Redmon, Joseph and Divvala, Santosh and Girshick, Ross and Farhadi, Ali,
You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
- Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C,
SSD: Single Shot MultiBox Detector, ECCV 2016
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie,
Feature Pyramid Networks for Object Detection, arXiv 2016
- Huang, Jonathan and Rathod, Vivek and Sun, Chen and Zhu, Menglong and Korattikara, Anoop and Fathi, Alireza and Fischer, Ian and Wojna, Zbigniew and Song, Yang and Guadarrama, Sergio and others,
Speed/accuracy trade-offs for modern convolutional object detectors, arXiv 2016
February 14: Semantic segmentation, pixel labeling (Liwei)
- E. Shelhamer, J. Long, and T. Darrell, Fully Convolutional Networksfor Semantic Segmentation, CVPR 2015
- P. O. Pinheiro, T. Lin, R. Collobert, P. Dollar, Learning to Refine Object Segments, ECCV 2016
- P. Fischer, A. Dosovitskiy, E. Ilg , P. Hausser, C. Hazırbas¸, V. Golkov FlowNet: Learning Optical Flow with Convolutional Networks, ICCV 2015
- F. Yu, V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016
- M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, CVPR 2015
- B. Hariharan, P. Arbelaez, and R. Girshick, Hypercolumns for Object Segmentation and Fine-grained Localization, CVPR 2015
February 16: Similarity learning, Siamese networks (Moitreya, Yunan)
- Essential:
- Bell, Sean, and Kavita Bala, Learning visual similarity for product design with convolutional neural networks, ACM Transactions on Graphics (TOG), 2015
- Chopra, Sumit, Raia Hadsell, and Yann LeCun, Learning a similarity metric discriminatively, with application to face verification, CVPR 2005
- Zagoruyko, Sergey, and Nikos Komodakis, Learning to compare image patches via convolutional neural networks, CVPR 2015
- Hoffer, Elad, and Nir Ailon, Deep metric learning using triplet network, arXiv:1412.6622
- Simo-Serra, Edgar, et al., Discriminative Learning of Deep Convolutional Feature Point Descriptors, ICCV 2015
- Optional:
- Vo, Nam N., and James Hays, Localizing and Orienting Street Views Using
Overhead Imagery, ECCV 2016
- Ahmed, Ejaz, Michael Jones, and Tim K. Marks, An Improved Deep Learning Architecture for Person Re-Identification, CVPR 2015
- Hu, Baotian, et al., Convolutional neural network architectures for matching natural language sentences, NIPS 2014
- Kulis, Brian, Metric learning: A survey, Foundations and Trends in Machine Learning, 2013
- Su, Hang, et al., Multi-view convolutional neural networks for 3d shape recognition, ICCV 2015
- Zheng, Yi, et al., Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks, WAIM 2014
- Yi, Kwang Moo, et al., LIFT: Learned Invariant Feature Transform, arXiv:1603.09114
February 21: Visualization, adversarial examples (Ralf, Jyoti, Jiahui)
- Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014
- Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, arXiv:1312.6034v2
- Alexey Dosovitskiy and Thomas Brox, Inverting Visual Representations with Convolutional Networks, CVPR 2016
- Anh Nguyen, Jason Yosinski, and Jeff Clune, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, CVPR 2015
- Christian Szegedy, et al., Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199v4
- Seyed-Mohsen Moosavi-Dezfooli, et al, Universal adversarial perturbations, arXiv preprint arXiv:1610.08401v2
- Ian J. Goodfellow, et al, Explaining and Harnessing Adversarial Examples, arXiv preprint arXiv:1412.6572
- A. Kurakin et al., Adversarial examples in the physical world, ICLR 2017
- D. Krotov and J. Hopfield, Dense Associative Memory is Robust to Adversarial Inputs, arXiv preprint arXiv:1701.00939
February 23: Generative Adversarial Networks (Shashank, Bhargav, Binglin)
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets, NIPS (2014).
- Goodfellow, Ian NIPS 2016 Tutorial: Generative Adversarial Networks, NIPS (2016).
- Radford, A., Metz, L. and Chintala, S., Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. (2015)
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. Improved techniques for training gans. NIPS (2016).
- Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets, NIPS (2016).
- Zhao, Junbo, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
- Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
- Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. (2016).
- Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. Generative adversarial text to image synthesis. JMLR (2016).
- Antipov, G., Baccouche, M., & Dugelay, J. L. Face Aging With Conditional Generative Adversarial Networks. arXiv preprint arXiv:1702.01983. (2017).
- Liu, Ming-Yu, and Oncel Tuzel. Coupled generative adversarial networks. NIPS (2016).
- Denton, E.L., Chintala, S. and Fergus, R., 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. NIPS (2015).
- Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., & Courville, A. Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016).
February 28: Variational Autoencoders (Raymond, Junting, Teck-Yian)
- D. Kingma, M. Welling,
Auto-Encoding Variational Bayes, ICLR, 2014
- Carl Doersch,
Tutorial on Variational Autoencoders arXiv, 2016
- Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee,
Attribute2Image: Conditional Image Generation from Visual Attributes, ECCV, 2016
- Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert,
An Uncertain Future: Forecasting from Static Images using Variational Autoencoders, ECCV, 2016
- Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, Jeff Clune,
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space, arXiv, 2016
- Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey,
Adversarial Autoencoders, ICLR, 2016
- Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther,
Autoencoding beyond pixels using a learned similarity metric, ICML, 2016
- Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, David Forsyth,
Learning Diverse Image Colorization, arXiv, 2016
- Jiajun Lu, Aditya Deshpande, David Forsyth,
CDVAE: Co-embedding Deep Variational Auto Encoder for Conditional Variational Generation, arXiv, 2016
- Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling,
Semi-Supervised Learning with Deep Generative Models, NIPS, 2014
- Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, Ole Winther,
Auxiliary Deep Generative Models arXiv, 2016
- Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala,
Semantic Facial Expression Editing using Autoencoded Flow arXiv, 2016
March 2: Advanced generation methods (Hsiao-Ching, Ameya, Anand)
- A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu.
Pixel recurrent neural networks. ICML 2016
- A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu.
Conditional image generation with pixelcnn decoders. NIPS 2016
- N. Kalchbrenner, A. van den Oord, K. Simonyan, I. Danihelka, O. Vinyals, A. Graves, and K. Kavukcuoglu. Video pixel networks. arXiv 2016
- K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra. DRAW: a recurrent neural network for image generation. ICML 2015
- K. Gregor, F. Besse, D. Rezende, I. Danihelka, and D. Wierstra. Towards conceptual compression. NIPS 2016
- B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science 2015
- D. J. Rezende, S. Mohamed, I. Danihelka, K. Gregor, and D. Wierstra. One-shot generalization in deep generative models. ICML 2016
- L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. CVPR 2016
- J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. ECCV 2016
- C. Castillo, S. De, X. Han, B. Singh, A. K. Yadav, and T. Goldstein. Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation. ICASSP 2017
March 7: 3D + Graphics (Qi, Juho)
- Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, Josh Tenenbaum, Deep Convolutional Inverse Graphics Network, NIPS 2015.
- John Flynn, Ivan Neulander, James Philbin, Noah Snavely, DeepStereo: Learning to Predict New Views from the World’s Imagery, CVPR 2016.
- Alex Kendall, Matthew Grimes, Roberto Cipolla, PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV 2015.
- Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman, Single Image 3D Interpreter Network, ECCV 2016.
- Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee, Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision, NIPS 2016.
- Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess, Unsupervised Learning of 3D Structure from Images, NIPS 2016.
- Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum, Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016.
March 9: Self-supervised Learning (Nate, Christian, Pratik)
- Carl Doersch, Abhinav Gupta, Alexei A. Efros, Context as Supervisory Signal: Discovering Objects with Predictable Context, ECCV 2014.
- Xiaolong Wang, Abhinav Gupta, Unsupervised Learning of Visual Representations using Videos, ICCV 2015.
- Dinesh Jayaraman, Kristen Grauman, Slow and steady feature analysis: higher order temporal coherence in video, CVPR 2016.
- Armand Joulin, Laurens van der Maaten, Allan Jabri, Nicolas Vasilache, Learning Visual Features from Large Weakly Supervised Data, ECCV 2016.
- Richard Zhang, Phillip Isola, Alexei A. Efros, Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, CVPR 2017.
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR 2016.
- Richard Zhang, Phillip Isola, Alexei A. Efros, Colorful Image Colorization, ECCV 2016.
- Gustav Larsson, Michael Maire, Gregory Shakhnarovich, Learning Representations for Automatic Colorization, ECCV 2016.
- Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba, Ambient Sound Provides Supervision for Visual Learning, ECCV 2016.
- Chelsea Finn, Ian Goodfellow, Sergey Levine, Unsupervised Learning for Physical Interaction through Video Prediction, NIPS 2016.
March 10: Introduction to Reinforcement Learning - Bonus lecture (Garima, Karan and Unnat)
March 14: Deep Reinforcement Learning - Q-Learning (Garima, Karan and Unnat)
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin A. Riedmiller. "Playing Atari with Deep Reinforcement Learning." arXiv preprint arXiv:1312.5602 (2013).
- Hasselt, Hado van, Arthur Guez and David Silver. "Deep Reinforcement Learning with Double Q-Learning." AAAI (2016).
- Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot and Nando de Freitas. "Dueling Network Architectures for Deep Reinforcement Learning." ICML (2016).
- Hausknecht, Matthew J. and Peter Stone. "Deep Recurrent Q-Learning for Partially Observable MDPs." arXiv preprint arXiv:1507.06527 (2015).
- Kulkarni, Tejas D., Karthik Narasimhan, Ardavan Saeedi and Joshua B. Tenenbaum. "Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation." In Advances in Neural Information Processing Systems, pp. 3675-3683. 2016.
- He, Frank S., Yang Liu, Alexander G. Schwing and Jian Peng. "Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening." arXiv preprint arXiv:1611.01606 (2016).
- Caicedo, Juan C. and Svetlana Lazebnik. "Active Object Localization with Deep Reinforcement Learning." In Proceedings of the IEEE International Conference on Computer Vision (pp. 2488-2496).
- Zhu, Yuke, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei and Ali Farhadi. "Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning." arXiv preprint arXiv:1609.05143 (2016).
- Narasimhan, Karthik, Tejas D. Kulkarni and Regina Barzilay. "Language Understanding for Text-based Games using Deep Reinforcement Learning." EMNLP (2015).
- Lample, Guillaume, and Devendra Singh Chaplot. "Playing FPS games with deep reinforcement learning." arXiv preprint arXiv:1609.05521 (2016).
March 16: Policy Gradient and Planning (Raj, Tanmay, Zhizhong)
- Andrej Karpathy, Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy's Blog
- Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention,
NIPS 2014
- Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei, End-to-end Learning of Action Detection from Frame Glimpses in Videos, CVPR 2016
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ICML 2016
- Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, and Garrett Thomas, Value Iteration Networks, NIPS 2016
- David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold et al., The Predictron:
End-to-End Learning and Planning, arXiv 2016
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser et al., Mastering the game of Go with deep neural networks and tree search, Nature 2016
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra, Continuous control with deep reinforcement learning, arXiv 2015
- David Silver, Lecture slides on Reinforcement Learning, UCL Course 2015
March 28: Deep learning for manipulation, navigation (Andrey, Tanmay)
- Sergey Levine, Vladlen Koltun, Guided Policy Search, ICML 2013
- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, End-to-end training of deep visuomotor policies, arXiv 2016
- Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine, Learning to poke by poking, arXiv 2017
- Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen, Learning hand-eye coordination with large-scale data collection, Google 2016
- Lerrel Pinto, Dhiraj Gandhi, Yuanfeng Han, Yong-Lae Park, Abhinav Gupta, The curious robot, arXiv 2016
- Lerrel Pinto, Abhinav Gupta, Supersizing self-supervision, arXiv 2015
- Fereshteh Sadeghi, Sergey Levine, Real single-image flight without a single real image, arXiv 2016
- Mariusz Bojarski, Davide Del Testa, et.al., End to End Learning for Self-Driving Cars, Nvidia 2016
- Alessandro Giusti, Jérôme Guzzi, et.al., A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots, IEEE 2016
- Max Jaderberg, Volodymyr Mnih, et.al., Reinforcement Learning with Unsupervised Auxiliary Tasks, DeepMind 2016
- Piotr Mirowski, Razvan Pascanu, et.al., Learning to Navigate in Complex Environments, ICLR 2017
March 30: Recurrent Architectures: LSTM, GRU, RNN (Abhishek, Anusri)
- Survey Papers
- Training
- Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118 (2016).
- Arjovsky, Martin, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks. arXiv preprint arXiv:1511.06464 (2015).
- Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015).
- Cooijmans, Tim, et al. Recurrent batch normalization. arXiv preprint arXiv:1603.09025 (2016).
- Architectural Complexity Measures
- RNN Variants
- Visualization
April 4: Image captioning (Anjali, Ruihan, Liaonan)
- Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Vinyals, Oriol, et al. Show and tell: A neural image caption generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Vinyals, Oriol, et al. Show and tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016.
- Fang, Hao, et al. From captions to visual concepts and back. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Xu, Kelvin, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML. Vol. 14. 2015.
- Liu, Chenxi, et al. Attention correctness in neural image captioning. arXiv preprint arXiv:1605.09553 (2016).
- Venugopalan, Subhashini, et al. "Captioning images with diverse objects." arXiv preprint arXiv:1606.07770 (2016).
- Dai, Bo, et al. "Towards Diverse and Natural Image Descriptions via a Conditional GAN.” arXiv preprint arXiv:1703.06029. 2017.
April 6: Image-text embeddings, grounding (Yang, Qing)
- Plummer, Bryan A., et al. "Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models." Proceedings of the IEEE International Conference on Computer Vision. 2015.
- Krishna, Ranjay, et al. "Visual genome: Connecting language and vision using crowdsourced dense image annotations." arXiv preprint arXiv:1602.07332 (2016).
- Kiros, Ryan, Ruslan Salakhutdinov, and Richard S. Zemel. "Unifying visual-semantic embeddings with multimodal neural language models." arXiv preprint arXiv:1411.2539 (2014).
- Wang, Liwei, Yin Li, and Svetlana Lazebnik. "Learning deep structure-preserving image-text embeddings." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- Vendrov, Ivan, et al. "Order-embeddings of images and language." arXiv preprint arXiv:1511.06361 (2015).
- Rohrbach, Anna, et al. "Grounding of textual phrases in images by reconstruction." European Conference on Computer Vision. Springer International Publishing, 2016.
- Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). "Multimodal compact bilinear pooling for visual question answering and visual grounding." arXiv preprint arXiv:1606.01847.
- Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- Wang, M., Azab, M., Kojima, N., Mihalcea, R., & Deng, J. (2016, October). "Structured matching for phrase localization." In European Conference on Computer Vision (pp. 696-711). Springer International Publishing
April 11: Visual Question Answering (Liang-Wei, Shuai)
- Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel. "Visual Question Answering: A Survey of Methods and Datasets." arXiv preprint arXiv:1607.05910 (2016).
- Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh. "VQA: Visual Question Answering." arXiv preprint arXiv:1505.00468 (2016).
- Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick. "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning." arXiv preprint arXiv:1612.06890 (2016).
- Kevin J. Shih, Saurabh Singh, Derek Hoiem. "Where To Look: Focus Regions for Visual Question Answering." arXiv preprint arXiv:1511.07394 (2016).
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein. "Neural Module Networks." arXiv preprint arXiv:1511.02799 (2016).
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein. "Learning to Compose Neural Networks for Question Answering." arXiv preprint arXiv:1601.01705 (2016).
- Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach. "Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding." arXiv preprint arXiv:1606.01847 (2016).
- Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh. "Hierarchical Question-Image Co-Attention for Visual Question Answering."." arXiv preprint arXiv:1606.00061 (2017).
- Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra. "Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning." arXiv preprint arXiv:1703.06585 (2017).
- Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei. "Visual7W: Grounded Question Answering in Images." arXiv preprint arXiv:1511.03416 (2016).
- Allan Jabri, Armand Joulin, Laurens van der Maaten. "Revisiting Visual Question Answering Baselines." arXiv preprint arXiv:1606.08390 (2016).
- Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel. "Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources." arXiv preprint arXiv:1511.06973 (2016).
April 13: Deep Learning for NLP (Zeqiu, Dongming, Quan)
- Language Models:
- Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
- James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576, 2016.
- Yoon Kim, Yacine Jernite, David Sontag, and Alexander Rush. Character-Aware Neural Language Models. AAAI, 2016.
- Constituency Parsing:
- Richard Socher, John Bauer, Christopher D. Manning and Andrew Y. Ng. Parsing with Compositional Vector Grammars. ACL, 2013.
- Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic compositionality through recursive matrix-vector spaces. EMNLP-CoNLL, 2012.
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP, 2013.
- Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng and Christopher D. Manning. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML, 2011.
April 18: Deep Learning for Machine Translation (Yiren, Yisi and Shaoshi)
- Essential:
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks, Advances in neural information processing systems. 2014.
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473(2014).
- Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025(2015).
- Wiseman, Sam, and Alexander M. Rush. Sequence-to-sequence learning as beam-search optimization, arXiv preprint arXiv:1606.02960(2016).
- Luong, Minh-Thang, et al. Addressing the rare word problem in neural machine translation, arXiv preprint arXiv:1410.8206(2014).
- Sennrich, Rico, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units, arXiv preprint arXiv:1508.07909(2015).
- Wu, Yonghui, et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, arXiv preprint arXiv:1609.08144(2016).
- Optional:
April 20: Deep Learning for Audio (Matt, Chris, Yuchen)
-
Text to Speech (TTS)
-
A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A.
Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu.
WaveNet: A Generative
Model for Raw Audio.
arXiv preprint arXiv:1609.03499 (2016).
-
S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo,
A. Courville, and Y. Bengio.
SampleRNN: An
Unconditional End-to-End Neural Audio Generation Model.
arXiv preprint arXiv:1612.07837v2 (2017).
-
J. Sotelo, S. Mehri, K. Kumar, J. Santos, K. Kastner, A.
Courville, and Y. Bengio.
Char2Wav:
End-to-End Speech Synthesis.
ICLR (2017).
-
S. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y.
Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M.
Shoeybi.
Deep Voice: Real-time
Neural Text-to-Speech.
arXiv preprint arXiv:1702.07825v2 (2017).
-
Y. Wang, and et al.
Tacotron: A Fully
End-to-End Text-to-Speech Synthesis Model.
arXiv preprint arXiv:1703.10135v1 (2017).
-
Automatic Speech Recognition (ASR)
-
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber.
Connectionist
Temporal Classification: Labelling Unsegmented Sequence Data
with Recurrent Neural Networks.
ICML (2006).
-
G. Hinton, et al.
Deep
Neural Networks for Acoustic Modeling in Speech Recognition.
Signal Processing Magazine (2012).
-
A. Graves, A. Mohamed, and G. Hinton.
Speech Recognition with
Deep Recurrent Neural Networks.
arXiv preprint arXiv:1303.5778v1 (2013).
-
H. Sak, A. Senior, and F. Beaufays.
Long Short-Term Memory
Recurrent Neural Network Architectures for Large Scale Acoustic
Modeling..
rXiv preprint arXiv:1402.1128v1 (2014).
-
O.Abdel-Hamid, et al.
Convolutional
Neural Networks for Speech Recognition.
IEEE/ACM Transactions on Audio, Speech, and Language Processing
(2014).
-
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E.
Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A.
Ng.
Deep Speech: Scaling up
end-to-end speech recognition.
arXiv preprint arXiv:1412.5567v2 (2014).
-
D. Amodei, et al.
Deep Speech 2: End-to-End
Speech Recognition in English and Mandarin.
arXiv preprint arXiv:1512.02595v1 (2015).
-
Y. Wang.
Connectionist
Temporal Classification: A Tutorial with Gritty Details.
Github (2015).
-
D. Bahdanau, et al.
End-to-End
Attention-based Large Vocabulary Speech Recognition.
arXiv preprint arXiv:1508.04395 (2015).
-
W. Xiong, et al.
Achieving Human Parity in
Conversational Speech Recognition.
arXiv preprint arXiv:1610.05256 (2016).
-
Music Generation
April 25: Architectures with Memory (Nitish, Shreya)
- Jason Weston, Sumit Chopra & Antoine Bordes Memory networks, ICLR, 2015
- Sainbayar Sukhbaatar, Arthur Szlamm, Jason Weston & Rob Fergus End-To-End Memory Networks, NIPS 2015
- Alexander H. Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, Jason Weston Key-Value Memory Networks for Directly Reading Documents, ACL, 2016
- A Kumar et. al. Ask Me Anything: Dynamic Memory Networks for Natural Language, ICML, 2016
- Alex Graves, Greg Wayne & Ivo Danihelka Neural Turing Machines, ArXiv, 2014
- Marcin Andrychowicz & Karol Kurach Learning Efficient Algorithms with Hierarchical Attentive Memory, ArXiv, 2016
- Gulcehre et. al. Memory Augmented Neural Networks with Wormhole Connections, ArXiv, 2017
- Armand Joulin & Tomas Mikolov Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets, ArXiv, 2015
- Karol Kurach, Marcin Andrychowicz & Ilya Sutskever Neural Random-Access Machines, ICLR, 2016
- Graves et. al. Hybrid computing using a neural network with dynamic external memory, Nature, 2016
- Emilio Parisotto & Ruslan Salakhutdinov Neural Map: Structured Memory for Deep Reinforcement Learning, ArXiv, 2017
- Pritzel et. al. Neural Episodic Control, ArXiv, 2017
April 27: Meta-Algorithms (Mariya and Safa)
-
Learning to Learn for Architecture Search
-
M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. de Freitas.
Learning to Learn by Gradient Descent by Gradient Descent.
NIPS (2016).
-
B. Baker, O. Gupta, N. Naik, and R. Raskar.
Designing Neural Network Architectures using Reinforcement Learning.
ICLR (2017).
-
B. Zoph, Q. V. Le.
Neural Architecture Search with Reinforcement Learning.
arXiv preprint arXiv:1611.01578 (2017).
-
R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, H. Shahrzad, A. Navruzyan, N. Duffy, and B. Hodjat.
Evolving Deep Neural Networks.
arXiv preprint arXiv:1703.00548 (2017).
-
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, Q. Le, and A. Kurakin.
Large-Scale Evolution of Image Classifiers.
arXiv preprint arXiv:1703.01041 (2017).
-
L. Xie and A. Yuille.
Genetic CNN.
arXiv preprint arXiv:1703.01513 (2017).
-
D. Ha, A. Dai, and Q. V. Le.
HyperNetworks.
arXiv preprint arXiv:1609.09106 (2016).
-
C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, D. Wierstra.
PathNet: Evolution Channels Gradient Descent in Super Neural Networks.
arXiv preprint arXiv:1701.08734 (2017).
-
Learning to Explore
-
Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, and N. de Freitas.
Learning to Learn for Global Optimization of Black Box Functions.
arXiv preprint arXiv:1611.03824 (2016).
-
P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. J. Ballard, A. Banino, M. Denil, R. Goroshin, L. Sifre, K. Kavukcuoglu, D. Kumaran, and R. Hadsell.
Learning to Navigate in Complex Environments.
arXiv preprint arXiv:1611.03673 (2017).
-
P. Agrawal, A. Nair, P. Abbeel, J. Malik, and S. Levine.
Learning to Poke by Poking: Experiential Learning of Intuitive Physics.
NIPS (2016).
-
Learning to Seek Knowledge
-
Learning to Communicate