Definition

A Variational Autoencoder (VAE) is a generative model that learns to encode data into a structured latent space and decode from it to reconstruct data, with a probabilistic foundation that facilitates sampling and meaningful latent representations.

Intuition

Unlike standard Autoencoder, VAE impose a probabilistic structure on the latent space, encouraging it to follow a known distribution (commonly a Gaussian). This ensures that the latent representations are smooth and continuous, enabling meaningful interpolation and generation of data.

Key Concepts and Workflow

  1. Distributions:

    • Data distribution:
    • Latent distribution: (assumed to be a normal distribution )
    • Mappings:
      • : Decoder that reconstructs data from latent variables.
      • : Posterior distribution (unknown; approximated).
  2. Approximation:

    • Approximate using a variational distribution , parameterized by a Gaussian .
  3. Reparameterization Trick:

    • Direct sampling from disrupts backpropagation due to non-differentiability.
    • Solution: Represent as , where . This separates sampling from learnable parameters and , enabling backpropagation.
  4. Loss Function:

    • Combines a reconstruction loss and a regularization term:
    • Reconstruction Loss: Measures data reconstruction quality.
    • KL Divergence: Encourages to be close to , regularizing the latent space.

Variants

  1. Conditional VAE (CVAE):
    Incorporates additional information (e.g., class labels) as a condition: and .

  2. Beta-VAE:
    Introduces a scaling factor to the KL divergence term, controlling the trade-off between reconstruction quality and disentanglement in the latent space:

  3. VQ-VAE:
    Uses discrete latent variables and vector quantization for learning representations, addressing issues like blurry reconstructions.


Applications

  • Data generation (e.g., synthetic images, text, audio).
  • Representation learning for tasks like clustering and interpolation.
  • Semi-supervised learning (CVAE).
  • Disentangled representation learning (Beta-VAE).

Known Issues and Improvements

  • Blurry Reconstructions: Due to regularization enforcing overly smooth latent representations, leading to loss of detail in generated samples.
  • Solution: Improved variants like Beta-VAE and VQ-VAE focus on balancing quality and latent structure.

ToDo:

  • Draw and annotate a conditional distribution graph to visualize relationships between , , , and .
  • Analyze implementation nuances (e.g., optimizing KL divergence and reparameterization trick).