[Paper review] UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

2 minute read

Published:

DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS (https://arxiv.org/pdf/1511.06434.pdf)

0. Introduction

Limitation of GAN

  • Unstable to train, no explanation of understanding, no evaluation criteria for new images generated

Propose new framework for unsupervised representation learning i) Bridge the gap between the success of CNNs for supervised learning and unsupervised learning. ii) Introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning.

  • Replace pooling (# of parameters = 0) layers with strided convolutions for Discriminator and fractional-strided convolutions (deconvolutions) for Generator
  • Use batch normalization, remove fully connected hidden layers

0-1. Contribution

Train Generative Adversarial Networks (GANs), and later reuse parts of the generator and discriminator networks as feature extractors for supervised tasks to build good image representations Use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms Visualize the filters learnt by GANs and empirically show that specific filters have learned to draw specific objects Show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples

REPRESENTATION LEARNING FROM UNLABELED DATA

  • Classic approach to unsupervised representation learning (K-means)
  • Hierarchical clustering of image, auto-encoders, ladder structures encode images into a compact code and decode the code to reconstruct the image
  • Deep belief networks works well in learning hierarchical representations

GENERATING NATURAL IMAGES

  • Parametric
    • Used in matching from a database of existing images, matching patches of images, and have been used in texture synthesis, super-resolution and in-painting
  • Non-parametric
    • Variational sampling approach to generating images often suffer from being blurry.
    • Iterative forward diffusion process
    • Generative Adversarial Networks generated images suffering from being noisy and incomprehensible.
    • Laplacian pyramid extension to this approach showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models.
    • A recurrent network approach and a deconvolution network approach have had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.

VISUALIZING THE INTERNALS OF CNNS

  • By using deconvolutions and filtering the maximal activations, one can find the approximate purpose of each convolution filter
  • By using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters

2. How it works

Adopt modified CNN architectures
1) All convolutional net: Replaces deterministic spatial pooling functions (maxpooling) with strided convolutions -> Network to learn its own spatial downsampling (Use this approach in generator to learn its own spatial upsampling and discriminator)
2) Eliminate fully connected layers on top of convolutional features
3) Batch normalization to stabilize learning
4) ReLU activation in generator for all layers except output (Tanh)
5) LeakyReLU in discriminator for all layers

DCGAN1