- categories: ToDo
- source: VAEs and Reparameterization Trick
Definition
A Variational Autoencoder (VAE) is a generative model that learns to encode data into a structured latent space and decode from it to reconstruct data, with a probabilistic foundation that facilitates sampling and meaningful latent representations.
Intuition
Unlike standard Autoencoder, VAE impose a probabilistic structure on the latent space, encouraging it to follow a known distribution (commonly a Gaussian). This ensures that the latent representations are smooth and continuous, enabling meaningful interpolation and generation of data.
Key Concepts and Workflow
-
Distributions:
- Data distribution:
- Latent distribution: (assumed to be a normal distribution )
- Mappings:
- : Decoder that reconstructs data from latent variables.
- : Posterior distribution (unknown; approximated).
-
Approximation:
- Approximate using a variational distribution , parameterized by a Gaussian .
-
- Direct sampling from disrupts backpropagation due to non-differentiability.
- Solution: Represent as , where . This separates sampling from learnable parameters and , enabling backpropagation.
-
Loss Function:
- Combines a reconstruction loss and a regularization term:
- Reconstruction Loss: Measures data reconstruction quality.
- KL Divergence: Encourages to be close to , regularizing the latent space.
Variants
-
Conditional VAE (CVAE):
Incorporates additional information (e.g., class labels) as a condition: and . -
Beta-VAE:
Introduces a scaling factor to the KL divergence term, controlling the trade-off between reconstruction quality and disentanglement in the latent space: -
VQ-VAE:
Uses discrete latent variables and vector quantization for learning representations, addressing issues like blurry reconstructions.
Applications
- Data generation (e.g., synthetic images, text, audio).
- Representation learning for tasks like clustering and interpolation.
- Semi-supervised learning (CVAE).
- Disentangled representation learning (Beta-VAE).
Known Issues and Improvements
- Blurry Reconstructions: Due to regularization enforcing overly smooth latent representations, leading to loss of detail in generated samples.
- Solution: Improved variants like Beta-VAE and VQ-VAE focus on balancing quality and latent structure.
ToDo:
- Draw and annotate a conditional distribution graph to visualize relationships between , , , and .
- Analyze implementation nuances (e.g., optimizing KL divergence and reparameterization trick).