# [Paper review] Generative Adversarial Nets

** Published:**

Generative Adversarial Nets (https://arxiv.org/pdf/1406.2661.pdf)

### 0. Introduction

Propose new framework for generative models via adversarial process

i) Generative model G: Captures data distribution

ii) Discriminative model D: Estimates the probability that a sample came from the training data rather than G

iii) G train to maximize the probability of D making a mistake

### 1. Related work

- Undirected graphical models with latent variables (Restricted Boltzmann Machines, DBMs)
- Intractable except the most trivial instances
- Deep belief networks (DBNs)
- Hybrid models containing a single undirected layer and several directed layers.

Fast approximate layer-wise training criterion exists, but incur the computational difficulties associated with both undirected and directed models. - Alternative criteria do not approximate or bound the log-likelihood (Score matching, Noise-contrastive estimation)
- Discriminative training criterion is employed to fit a generative model. However, rather than fitting a separate discriminative model, the generative model itself is used to discriminate generated data from samples a fixed noise distribution.
- Generative stochastic network (GSN) framework, generalized denoising auto-encoders
- Defines a parameterized Markov chain that one learns the parameters of a machine that performs one step of a generative Markov chain.

- Adversarial nets framework does not need a Markov chain in sampling so no difficulties in sampling

### 2. How it works

D, G play the following two-player minimax game

Implements the game using an iterative, numerical approach while k steps of optimizing D and one step of optimizing G In practice, early in learning, when G is poor, D can reject samples with high confidence because they are clearly different from the training data.

```
for i in iteration:
for k in steps:
sample m noise samples from noise prior
sample m examples from data generating distribution
update D by ascending its stochastic gradient
sample m noise samples from noise prior
update G by descending its stochastic gradient
```

### 3. Advantages and disadvantages

Advantage: Markov chains are never needed, only backprop is used to obtain gradients, no inference is needed during learning, and a wide variety of functions can be incorporated into the model

Disadvantage: G must not be trained too much without updating D