Stanford CS236: Deep Generative Models I 2023 I Lecture 16 - Score Based Diffusion Models
06 May 2024 (7 months ago)
Score-based models
- Score-based models estimate the gradient of the log-likelihood (score) of a data distribution using a neural network.
- Denoising score matching is an efficient way to estimate the score of a noise-perturbed data distribution by training a model to denoise the data.
- Score-based models can be seen as the limit of infinite noise levels.
Diffusion models
- Diffusion models can be interpreted as a type of variational autoencoder, where the score function acts as the encoder and the denoising process acts as the decoder.
- Diffusion models can be converted into ordinary differential equations (ODEs), allowing for exact likelihood computation and efficient sampling methods.
- Controllable generation in diffusion models can be achieved by incorporating additional information or side information into the model.
- Diffusion models are a type of generative model that can generate realistic-looking images.
- They work by gradually adding noise to an image until it becomes completely random, and then gradually removing the noise to generate a new image.
- The training objective of a diffusion model is to maximize the evidence lower bound, which is a measure of how well the model can reconstruct the original image.
- The encoder in a diffusion model is fixed and simply adds noise to the image, while the decoder is a neural network that learns to remove the noise.
- The loss function for a diffusion model is the same as the denoising score matching loss, which means that the model is learning to estimate the scores of the noise-perturbed data distributions.
- The sampling procedure for a diffusion model is similar to the Langevin dynamics used in score-based models, but with different scalings of the noise.
- Traditional diffusion models use a discrete number of steps to add noise, but a continuous-time diffusion process can be described using a stochastic differential equation.
- The reverse process of going from noise to data can also be described using a stochastic differential equation, and the solution to this equation can be used to generate data.
- The score function is a key component of the stochastic differential equation, and it can be estimated using score matching.
- In practice, continuous-time diffusion models can be implemented by discretizing the stochastic differential equation and using numerical solvers to solve it.
- Score-based models attempt to correct numerical errors in diffusion models by running Langevin dynamics for a time step.
- DDPM is a predictor type of discretization of the underlying stochastic differential equation, while score-based models are corrector type.
- The diffusion implicit model (DIM) converts the stochastic differential equation into an ordinary differential equation with the same marginals at every time step.
- DIM has two advantages: it can be more efficient and it can be converted into a flow model with exact likelihood evaluation.
Noise Conditional Score Network (NCSN)
- The Noise Conditional Score Network (NCSN) estimates the scores of noise-perturbed data distributions by iteratively reducing the amount of noise in the sample.
- The inverse process of NCSN, which generates samples for the denoising score matching law, involves adding noise to the data at every step until pure noise is reached.
- The process of going from data to noise can be seen as a Markov process where noise is added incrementally, and the joint distribution over the random variables is defined as the product of conditional densities.
- The encoder in NCSN is a simple procedure that maps the original data point to a vector of latent variables by adding noise to it.
- The marginals of the distribution are also Gaussian, and the probability of transitioning from one noise level to another can be computed in closed form.
- NCSN can efficiently generate samples at a specific time step without simulating the whole chain, making it computationally efficient.
- The diffusion process in NCSN is analogous to heat diffusion, where probability mass is spread out over the entire space.
- To invert the NCSN process during inference, several conditions need to be met, including the ability to smooth out the structure of the data distribution to facilitate sampling.
- The goal is to learn a probabilistic model that can generate data by inverting a process that destroys structure and adds noise to the data.
- The process of adding noise is defined by a transition kernel that spreads out the probability mass in a controllable way, such as Gaussian noise.
- The key idea is to learn an approximation of the reverse kernel that removes noise from a sample, which can be done variationally through a neural network.
- The generative distribution is defined by sampling from a simple prior and then sampling from the conditional distributions of the remaining variables one at a a time, going from right to left.
- The parameters of the conditional distributions are learned such that the generated samples have low signal-to-noise ratio, essentially reaching a steady state of pure noise.
- Alternatively, Langevin dynamics can be used to generate samples by correcting the mistakes made in the vanilla procedure, which requires more computation.
Training diffusion models
- The encoder in a diffusion model is fixed and simply adds noise to the image, while the decoder is a neural network that learns to remove the noise.
- Fixing the encoder to be a simple noise-adding function simplifies the training process.
- The Lambda parameters control the importance of different noise levels.
- The Beta parameters control how quickly noise is added.
- The architecture is similar to a noise-conditional score model, with a single decoder amortized across different noise levels.
- Training is efficient because the computation can be broken down into smaller, more manageable steps.