Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

()
Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

Score-Based Diffusion Models (SDMs)

  • SDMs are closely connected to noising diffusion probabilistic models (DDPMs).
  • DDPMs can be interpreted as a VAE where the encoder adds noise to the data and the decoder denoises it.
  • Optimizing the evidence lower bound in DDPMs corresponds to learning a sequence of denoisers, similar to noise conditional score models.
  • The diffusion version of DDPMs considers a continuous spectrum of noise levels, allowing for more efficient sampling and likelihood evaluation.
  • The process of adding noise is described by a stochastic differential equation (SDE).
  • The drift term in the SDE becomes important when reversing the direction of time.
  • The reverse SDE has a drift term that is the score of the corresponding perturbed data density at time T.
  • Both the forward and reverse SDEs describe the same kind of trajectories, and the only difference is the direction of time.
  • Score-based models can be used to learn generative models by estimating score functions using a neural network.
  • The score-based MCMC method uses Langevin dynamics to generate samples from a density corresponding to a given time.
  • Discretizing the time axis in the score-based SDE leads to numerical errors, which can be reduced by using larger Langevin dynamics steps.
  • Score-based models can be converted into flow models by eliminating noise at every step, resulting in an infinitely deep continuous time normalizing flow model.

Efficient Sampling Techniques

  • The sampling process in SDMs can be reinterpreted as solving an ODE, where the dynamics of the ODE are defined by the score function of the diffusion model.
  • This perspective allows for leveraging techniques from numerical analysis and scientific computing to improve sampling efficiency and generate higher-quality samples.
  • Consistency models are neural networks that directly output the solution of the ODE, enabling fast sampling procedures.
  • Parallel-in-time methods can further accelerate the sampling process by leveraging multiple GPUs to compute the solution of the ODE in parallel.
  • Distillation techniques can be used to train student models that can approximate the solution of the ODE in fewer steps, leading to even faster sampling.

Stable Diffusion

  • Stable Diffusion uses a latent diffusion model, which adds an extra encoder and decoder layer at the beginning of the model.
  • This allows for faster training on low-resolution images or low-dimensional data.
  • Stable Diffusion pre-trains the outer encoder and then keeps it fixed while training the diffusion model over the latent space.
  • To incorporate text into the model, a pre-trained language model is used to map the text to a vector representation, which is then fed into the neural network architecture.

Conditional Generation

  • To control the generation process without training a different model, the prior distribution of the generative model is combined with a classifier's likelihood to sample from the conditional distribution of images given a specific label.
  • Computing the denominator of the posterior distribution is intractable, making it difficult to directly sample from the posterior.
  • Working at the level of scores simplifies the computation of the posterior score, allowing for easy incorporation of pre-trained models and classifiers.
  • By modifying the drift in the SDE or ODE to include the score of the classifier, one can steer the generative process towards images that are consistent with a desired class or caption.
  • Classifier-free guidance is a technique that avoids explicit classifier training by taking the difference of two diffusion models, one conditioned on side information and the other not.

Overwhelmed by Endless Content?