Menu Logo

Modular Diffusion

GitHub Discord PyPI

Noise Type

In Diffusion Models, a noise type defines a specific parametrization of the stationary, prior, posterior, and approximate posterior distributions, q(xT)q(x_{T}), q(xtx0)q(x_{t}|x_{0}), q(xt1xt,x0)q(x_{t-1}|x_{t},x_{0}), and pθ(xt1xt)p_\theta(x_{t-1} | x_t), respectively. Modular Diffusion includes the standard Gaussian noise parametrization, as well as a few more noise types.

Gaussian noise

Gaussian noise model introduced in Ho et al. (2020), for which the diffusion process is defined as:

  • q(xT)=N(xT;0,I)q(x_{T})=\mathcal{N}(x_T; 0, \text{I})
  • q(xtx0)=N(xt;αˉtxt1,(1αˉt)I)q(x_{t}|x_{0})=\mathcal{N}(x_{t};\sqrt{\bar{\alpha}_{t}}x_{t-1},(1 - \bar{\alpha}_{t})\text{I})
  • q(xt1xt,x0)=N(xt;αt(1αˉt1)xt+αˉt1(1αt)x01αˉt,(1αt)(1αˉt1)1αˉtI)q(x_{t-1}|x_{t},x_{0})=\mathcal{N}(x_{t};\frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_{t} + \sqrt{\bar\alpha_{t-1}}(1-\alpha_t)x_0}{1 -\bar\alpha_{t}},\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I})
  • pθ(xt1xt)=N(xt;μ^θ,(1αt)(1αˉt1)1αˉtI)p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t};\hat{\mu}_\theta,\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I}),

where, depending on the parametrization:

  • μ^θ=αt(1αˉt1)xt+αˉt1(1αt)x^θ1αˉt\hat{\mu}_\theta = \frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_{t} + \sqrt{\bar\alpha_{t-1}}(1-\alpha_t)\hat{x}_\theta}{1 -\bar\alpha_{t}}
  • μ^θ=1αtxt1αt1αˉtαtϵ^θ\hat{\mu}_\theta = \frac{1}{\sqrt{\alpha_t}}x_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar\alpha_t}\sqrt{\alpha_t}}\hat{\epsilon}_\theta.

Parameters

  • parameter (default "x") -> Parameter to be learned and used to compute μ^θ\hat{\mu}_\theta. If "x" (x^θ\hat{x}_\theta) or "epsilon" (ϵ^θ\hat{\epsilon}_\theta) are chosen, μ^θ\hat{\mu}_\theta is computed using one of the formulas above. Selecting "mu" means that μ^θ\hat{\mu}_\theta is predicted directly. Typically, authors find that learning ϵ^θ\hat{\epsilon}_\theta leads to better results.
  • variance (default "fixed") -> If "fixed", the variance of pθ(xt1xt)p_\theta(x_{t-1} | x_t) is fixed to (1αt)(1αˉt1)1αˉtI\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I}. If "learned", the variance is learned as a parameter of the model.

Parametrization

If you have the option, always remember to select the same parameter both in your model’s Noise and Loss objects.

Example

from diffusion.noise import Gaussian

noise = Gaussian(parameter="epsilon", variance="fixed")

Visualization

Applying Gaussian noise to an image using the Cosine schedule with T=1000T=1000, s=8e3s=8e-3 and e=2e=2 in equally spaced snapshots:

Image of a dog gradually turning noisy.

Uniform categorical noise

Uniform categorical noise model introduced in Austin et al. (2021). In each time step, each token either stays the same or transitions to a different state. The noise type is defined by:

  • q(xT)=Cat(xT;11TK)q(x_T) = \mathrm{Cat}(x_T; \frac{\mathbb{1}\mathbb{1}^T}{K})
  • q(xtx0)=Cat(xt;x0Qt)q(x_t | x_0) = \mathrm{Cat}(x_t; x_0\overline{Q}_t)
  • q(xt1xt,x0)=Cat(xt1;xtQtx0Qt1x0Qtxt)q(x_{t-1}|x_t, x_0) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot x_0 \overline{Q}_{t-1}}{x_0 \overline{Q}_t x_t^\top}\right)
  • pθ(xt1xt)=Cat(xt1;xtQtx^θQt1x^θQtxt)p_\theta(x_{t-1} | x_t) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot \hat{x}_\theta \overline{Q}_{t-1}}{\hat{x}_\theta \overline{Q}_t x_t^\top}\right),

where:

  • 1\mathbb{1} is a column vector of ones of length kk.
  • Qt=αtI+(1αt)11TQ_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}\mathbb{1}^T
  • Qt=αˉtI+(1αˉt)11T\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}\mathbb{1}^T

One-hot representation

The Uniform noise type operates on one-hot vectors. To use it, you must use the OneHot data transform.

Parameters

  • k -> Number of categories kk.

Example

from diffusion.noise import Uniform

noise = Uniform(k=26)

Visualization

Applying Uniform noise to an image with k=255k=255 using the Cosine schedule with T=1000T=1000, s=8e3s=8e-3 and e=2e=2 in equally spaced snapshots:

Image of a dog gradually turning noisy.

Absorbing categorical noise

Absorbing categorical noise model introduced in Austin et al. (2021). In each time step, each token either stays the same or transitions to an absorbing state. The noise type is defined by:

  • q(xT)=Cat(xT;1emT)q(x_T) = \mathrm{Cat}(x_T; \mathbb{1}e_m^T)
  • q(xtx0)=Cat(xt;x0Qt)q(x_t | x_0) = \mathrm{Cat}(x_t; x_0\overline{Q}_t)
  • q(xt1xt,x0)=Cat(xt1;xtQtx0Qt1x0Qtxt)q(x_{t-1}|x_t, x_0) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot x_0 \overline{Q}_{t-1}}{x_0 \overline{Q}_t x_t^\top}\right)
  • pθ(xt1xt)=Cat(xt1;xtQtx^θQt1x^θQtxt)p_\theta(x_{t-1} | x_t) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot \hat{x}_\theta \overline{Q}_{t-1}}{\hat{x}_\theta \overline{Q}_t x_t^\top}\right),

where

  • 1\mathbb{1} is a column vector of ones of length kk.
  • eme_m is a vector with a 1 on the absorbing state mm and 0 elsewhere.
  • Qt=αtI+(1αt)1emTQ_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}e_m^T
  • Qt=αˉtI+(1αˉt)1emT\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}e_m^T

One-hot representation

The Absorbing noise type operates on one-hot vectors. To use it, you must use the OneHot data transform.

Parameters

  • k -> Number of categories kk.
  • m -> Absorbing state mm.

Example

from diffusion.noise import Uniform

noise = Absorbing(k=255, m=128)

Visualization

Applying Absorbing noise to an image with k=255k=255 and m=128m=128 using the Cosine schedule with T=1000T=1000, s=8e3s=8e-3 and e=2e=2 in equally spaced snapshots:

Image of a dog gradually turning gray.


If you spot any typo or technical imprecision, please submit an issue or pull request to the library's GitHub repository .