What are Variational Autoencoders?

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.

Autoencoders (AEs) comprise two connected feed-forward neural networks: the encoder and the decoder. The encoder network compresses an input into a latent encoding, allowing the decoder to reconstruct the original input. The encoder learns a non-linear transformation, mapping each input vector from the high-dimensional input space to a lower-dimensional latent space. The decoder, in turn, projects the latent space back into the original high-dimensional input space.

The AE's main goal is to analyze lower-dimensional feature representations of unlabeled data and generate distinct samples from the training set. It must be properly constrained to avoid learning trivial identity functions that merely copy the input. AEs excel in compression tasks with a low-cardinality latent space, preserving essential features. Drawbacks include being limited to compressing identical data, producing deteriorated decompressed outputs, and lacking a clear generative interpretation as it focuses solely on reconstruction.

*Image Credit: TippaPatt / Shutterstock*

Variational autoencoders (VAEs)

Traditional encoders map each input independently, leading to discontinuous latent spaces and impacting the decoder's generative ability. VAEs address this by learning probabilistic representations, enhancing the decoder's robustness to small changes in latent space.

VAEs, introduced by Diederik P. Kingma and Max Welling in 2013, are powerful generative models widely used in unsupervised learning. VAEs excel at producing high-quality images by encoding input data into a latent space represented as a mixture of Gaussian distributions. The encoder, typically a feedforward convolutional network, transforms input data into the latent space, while the decoder, also a convolutional neural network (CNN), converts the latent space back into the original input, maximizing the likelihood of generating images closely resembling the input data.

In machine learning, there are two major divisions: generative modeling and discriminative modeling. Generative models aim to learn the joint distribution of all variables, providing insights into data generation processes and causal relations and making them intuitive and interpretable. On the other hand, discriminative models directly map inputs to predictions and are preferred for tasks focused solely on prediction. VAEs belong to the generative model category, blending graphical models with deep learning. These models have been compared and extended with Generative Adversarial Networks (GANs) to explore their complementary properties.

Comparing AEs and VAEs

AEs and VAEs are powerful unsupervised learning models used for data representation, compression, and generation. While they share some similarities, they have distinct properties and applications. AEs aim to learn a compressed representation of input data and reconstruct it with minimal loss, consisting of an encoder, a decoder, and an embedding space. However, traditional AEs face challenges in selecting points for image creation and suffer from discontinuities in the latent space. In contrast, VAEs map input images to distributions in the latent space, allowing for uncertainty and variability. They use a different loss function, incorporating the KL divergence term as a regularization to encourage a structured latent space and generate diverse, realistic samples.

Comparing VAEs with GANs

VAEs and GANs are generative models used for data generation. GANs excel in producing high-quality images according to human perception but may not accurately model the density based on the likelihood criterion. On the other hand, VAEs tend to generate blurry images but demonstrate strong density modeling concerning the likelihood criterion. VAEs are also more stable during training compared to GANs.

Variants of VAEs

Variations of VAEs offer diverse improvements. These extensions involve changes in priors, posteriors, regularization, and architecture.

Adversarial autoencoders (AAEs) combine VAEs with GANs, utilizing an adversarial network to match prior and aggregated variational posteriors, enhancing regularization. The information-theoretic learning autoencoder (ITL-AE) replaces the KL divergence with alternate divergence measures estimated through kernel density estimation.

Other variations include Conditional Variational autoencoders (CVAEs), variational recurrent autoencoders (VRAEs), Variational deep embedding (VaDe), Gaussian mixture Variational autoencoder (GMVAE), vector quantization variational autoencoder (VQVAE), Wasserstein autoencoder (WAE), and 2-Stage VAE. These adaptations target various applications, such as clustering, generation, and image fidelity enhancement, contributing to the advancement of generative models.

Applications of VAEs

Generative modeling in machine learning allows algorithms to synthesize new data, such as audio, text, and images, by estimating data density and sampling from it. VAEs, along with other deep generative models like GANs, have revolutionized generative modeling. VAEs combine Bayesian variational inference with deep learning, providing stability and better probability distribution estimation. There are various frameworks, such as adversarial AEs and VAE-GAN, that integrate different strengths and address specific weaknesses.

VAEs find applications in various domains. For instance, in finance, they are used for synthetic volatility surface generation, and in bio-signal analysis, they aid in disease detection and data augmentation. The versatility of VAEs extends to insurance data analysis, where they can reconstruct complex insurance data and generate synthetic insurance policies, addressing privacy, bias, and data availability challenges in the industry.

In de novo drug design, VAEs convert molecular representations into continuous vectors for chemical space exploration, facilitating the generation of novel molecules with desired properties. Additionally, VAEs have been applied in cancer prediction through multi-omic and clinical data analysis, helping identify cancer biomarkers and predict patient survival.

VAEs are used for detecting anomalies in Fourier transform infrared spectroscopy (FTIR) data from iron ore deposits in the Pilbara region of Australia. The VAE effectively separates anomalous spectra from typical ones, providing valuable insights into mineralogical differences. It can also serve as a pre-processing step to ensure FTIR data quality for machine learning applications, making it a cost-effective and rapid mineralogical characterization technique for geosciences.

Speech enhancement: Utilizing Mel-frequency cepstral coefficients (MFCC) features, VAE achieves a two-time improvement over short-time Fourier transform (STFT) features in denoising very noisy speech. The unsupervised approach with pre-trained deep speech priors and parametric noise models outperforms existing methods. The proposed new approach based on Langevin dynamics and total variation-based regularization strikes a balance between computational efficiency and enhancement quality, demonstrating promising results.

Speech synthesis: The CVAEs model can synthesize a selected speaker's speech with any desired target accent. The comprehensive experiments on text-to-speech synthesis demonstrate that the CVAE model is effective and has remarkable performance in accent manipulation in synthesized speech.

Language modeling: Attention-Driven VAE (ADVAE) with Transformers demonstrates the ability to separate core syntactic roles in sentences, while the query key value autoencoder (QKVAE), based on ADVAE, disentangles syntactic and semantic information in neural representations. These models achieve competitive performance without relying on annotated data, making them valuable for language modeling.

Challenges and Future Scope of VAEs

Despite their success, VAEs face certain challenges and limitations. Variance loss and image blurriness are issues due to averaging, while achieving disentanglement and learning meaningful factors of variation remain challenging for vanilla VAEs. The balance between reconstruction loss and regularization affects image quality and disentanglement.

Concerns also arise from variational pruning, posterior collapse, and the origin gravity effect. Furthermore, sampling in high dimensions is inefficient due to the curse of dimensionality. To address these problems, researchers have explored various approaches, such as 2-Stage VAE and Gaussian mixture model (GMM)-based models, as well as Hamiltonian Monte Carlo for sampling in high dimensions.

The low-dimensional manifold and tokenization in VAE models may result in limitations when handling complex graphics and image clarity. To improve image quality, researchers suggest integrating other deep-learning techniques or transform-based methods like wavelet transforms or Gabor filters. Combining VAE and GAN models can optimize image generation efficiency and prevent mode collapse, enhancing output clarity.

In conclusion, VAEs are versatile generative models with widespread applications in diverse domains. They have been successful in generating data, reducing dimensionality, and recognizing patterns. However, addressing challenges and limitations will be crucial to further enhancing their performance on complex tasks and graphics.

Future research can focus on integrating other deep-learning techniques, exploring tokenization for simplification, and developing VAE-GAN hybrid models to maximize VAE's strengths in various industrial applications and game design. By addressing these challenges, VAEs can continue to play a pivotal role in advancing the field of generative modeling.

References and Further Readings

Diederik P. Kingma and Max Welling (2019), "An Introduction to Variational Autoencoders", Foundations and Trends in Machine Learning: Vol. 12: No. 4, pp 307-392. http://dx.doi.org/10.1561/2200000056

Jamotton, C., & Hainaut, D. (2023). Variational autoencoder for synthetic insurance data. Dial.uclouvain.be. https://dial.uclouvain.be/pr/boreal/object/boreal:276128
Singh A, Ogunfunmi T. (2021). An Overview of Variational Autoencoders for Source Separation, Finance, and Bio-Signal Applications. Entropy. DOI: https://doi.org/10.3390/e24010055
Mak, Hugo Wai Leung, et al. (2023). Application of Variational AutoEncoder (VAE) Model and Image Processing Approaches in Game Design. Sensors 23, no. 7: 3457. DOI: https://doi.org/10.3390/s23073457
Koutroumpa, et al. (2023). A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation. International Journal of Molecular Sciences 24 (7): 6573. DOI: https://doi.org/10.3390/ijms24076573
R. Zemouri, et al. (2022). Recent Research and Applications in Variational Autoencoders for Industrial Prognosis and Health Management: A Survey. Prognostics and Health Management Conference (PHM-2022 London), London, United Kingdom, pp. 193-203. DOI: https://doi.org/10.1109/PHM2022-London52454.2022.00042
Simidjievski N, et al. (2019). Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice. Front Genet. DOI: https://doi.org/10.3389/fgene.2019.01205

Last Updated: Jul 20, 2023

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, July 20). What are Variational Autoencoders?. AZoAi. Retrieved on July 11, 2025 from https://www.azoai.com/article/What-are-Variational-Autoencoders.aspx.
MLA
Lonka, Sampath. "What are Variational Autoencoders?". AZoAi. 11 July 2025. <https://www.azoai.com/article/What-are-Variational-Autoencoders.aspx>.
Chicago
Lonka, Sampath. "What are Variational Autoencoders?". AZoAi. https://www.azoai.com/article/What-are-Variational-Autoencoders.aspx. (accessed July 11, 2025).
Harvard
Lonka, Sampath. 2023. What are Variational Autoencoders?. AZoAi, viewed 11 July 2025, https://www.azoai.com/article/What-are-Variational-Autoencoders.aspx.