[Paper review] Generative Adversarial Nets
Published:
Generative Adversarial Nets (https://arxiv.org/pdf/1406.2661.pdf)
0. Introduction
Propose new framework for generative models via adversarial process
i) Generative model G: Captures data distribution
ii) Discriminative model D: Estimates the probability that a sample came from the training data rather than G
iii) G train to maximize the probability of D making a mistake
1. Related work
- Undirected graphical models with latent variables (Restricted Boltzmann Machines, DBMs)
- Intractable except the most trivial instances
- Deep belief networks (DBNs)
- Hybrid models containing a single undirected layer and several directed layers.
Fast approximate layer-wise training criterion exists, but incur the computational difficulties associated with both undirected and directed models. - Alternative criteria do not approximate or bound the log-likelihood (Score matching, Noise-contrastive estimation)
- Discriminative training criterion is employed to fit a generative model. However, rather than fitting a separate discriminative model, the generative model itself is used to discriminate generated data from samples a fixed noise distribution.
- Generative stochastic network (GSN) framework, generalized denoising auto-encoders
- Defines a parameterized Markov chain that one learns the parameters of a machine that performs one step of a generative Markov chain.
- Adversarial nets framework does not need a Markov chain in sampling so no difficulties in sampling
2. How it works
D, G play the following two-player minimax game
Implements the game using an iterative, numerical approach while k steps of optimizing D and one step of optimizing G In practice, early in learning, when G is poor, D can reject samples with high confidence because they are clearly different from the training data.
for i in iteration:
for k in steps:
sample m noise samples from noise prior
sample m examples from data generating distribution
update D by ascending its stochastic gradient
sample m noise samples from noise prior
update G by descending its stochastic gradient
3. Advantages and disadvantages
Advantage: Markov chains are never needed, only backprop is used to obtain gradients, no inference is needed during learning, and a wide variety of functions can be incorporated into the model
Disadvantage: G must not be trained too much without updating D