Deep neural networks (DNNs) are a class of machine learning (ML) algorithms similar to the artificial neural network (ANN), which mimics the structure and principles of a human neural network/the information processing of the brain. DNNs leverage cascading nonlinear operations to achieve superior representational power. This article deliberates on different DNN architectures and their applications.
DNN Architectures and Applications
Restricted Boltzmann Machine (RBM): RBMs primarily consist of a Boltzmann machine (BM) variant and are used to generate stochastic ANN models, which can learn the probability distribution regarding their inputs. BMs can interpreted as neural networks (NNs) with bidirectionally connected stochastic processing units.
RBMs can be used to improve the model efficiency and simplify the network topology, as learning aspects of an unknown probability distribution is extremely difficult. RBMs excel in core ML tasks like classification, feature learning, collaborative filtering, dimensionality reduction, and topic modeling.
Additionally, RBMs have been employed both as a generative and discriminative models. The hybrid discriminative restricted Boltzmann machine model is effective for online learning with large datasets owing to their combined advantages of discriminative and generative learning. Conditional restricted Boltzmann machines serve as non-linear generative models, specifically in high-dimensional time series.
A self-contained discriminative restricted Boltzmann machine recently developed based on an innovative discriminative learning algorithm. In this method, the output for any class vector and input is computed based on an RBM’s negative free energy. Moreover, the free energy can be scaled by a network size-based constant to improve the function approximation robustness in the self-contained discriminative restricted Boltzmann machines.
Deep Belief Network (DBN): DBNs combine multiple RBMs with latent layers, using randomness to learn complex features. In a DBN, every two adjacent layers form an RBM, with every RBM’s visible layer connected to the previous RBM’s hidden layer, and the top two layers remain non-directional.
DBNs are special Bayesian probabilistic generative model forms that are more effective compared to ANNs when applied to unlabeled data-related problems. Additionally, the stacked model increases the log-likelihood’s upper bound compared to a single RBM, which implies robust learning abilities.
The DBN training process involves two steps. The first step is an unsupervised bottom-up layer-by-layer pre-training, while the second step is a supervised up-down fine-tuning. Unlabeled data can be processed effectively, and the underfitting and overfitting problems can be prevented using DBNs.
A top-level model for DBNs was introduced in a study for a three-dimensional (3D) object recognition task. A third-order BM was utilized as the top-level model and trained using a hybrid algorithm combining discriminative and generative gradients. Similarly, another method was proposed to learn a good covariance kernel for a Gaussian process using a DBN and unlabeled data. Deep convex networks (DCNs), composed of different layered modules, can overcome the limitations in learning scalability.
In DCNs, the learning method is batch-mode based, which leads to parallel training. Moreover, the DCN performance can be enhanced using the structure-exploited fine-tuning process. A novel convolutional DBNs (CDBNs) model has also been proposed to increase the DBN flexibility.
Autoencoder (AE): AEs operate as unsupervised neural networks that excel in dimensionality reduction through efficient data encoding. Recently, AEs have been used to learn generative data models. AEs first condense input data into an abstract "code" and then reconstruct the original from this code. A major advantage is that AE can continuously extract crucial features during propagation and filter out irrelevant information.
Additionally, the learning process efficiency can be improved using AE as the input vector is transformed into a lower dimensional representation during the coding process. Conventional AEs can be denoised using denoising AEs (DAEs) DAEs intentionally add noises into the training data and train the AEs using these corrupted data. The DAE can recover the noise-free training data version during the training process, which implies improved robustness. Several standard optimization methods can be utilized in DAEs.
Denoising is crucial in computer vision as digital images are corrupted by noise through transmission and acquisition. Effective noise removal can be achieved using stacked sparse denoising autoencoders. Moreover, a k-sparse AE has been proposed in a study that performs better than RBM and DAE. The k-sparse AE contains the basic standard AE architecture while retaining only the highest k activations within the hidden layers.
The k-sparse AEs are easily trainable, and the advanced encoding process can ensure satisfactory performance for large-scale problems. In another study, the contractive AEs (CAEs) were proposed where a well-selected penalty term is introduced to the standard cost function during reconstruction. The penalty term penalizes the feature sensitivity corresponding to the inputs. CAEs demonstrate a similar or better performance compared to regularized AEs, such as DAEs.
Deep Convolutional Neural Networks (CNNs): CNNs are primarily a discriminative deep architecture subtype that demonstrates satisfactory performance in processing two-dimensional (2D) data having a grid-like topology, such as videos and images. The CNN concept is inspired by time-delay neural networks (TDNNs), where the weights are shared in a temporal dimension, leading to a reduction in computation.
In CNNs, the convolution replaced the standard NNs’ general matrix multiplication, which reduces the number of weights, and the network complexity. Additionally, the images can be imported directly as raw inputs to the network to avoid the feature extraction procedure required in standard learning algorithms.
This multi-layer NN contains two layers: sub-sampling or s-layers and convolution layers or c-layers. S-layers and c-layers are alternately connected and form the network’s middle part. Equivariant representation, parameter sharing, and sparse interaction are crucial in a CNN’s learning process. A novel model, designated as recursive convolutional networks (RCNs), was proposed in a study with architecture similar to a CNN with the same number of feature maps in every layer and tied filter weights across layers.
In another study, a novel model was proposed that combines convolution with an AE for feature extraction in fields such as object recognition. The predictive sparse decomposition unsupervised feature learning can be used with sparsity constraints on the feature vector based on the AE architecture.
The feature extraction stage primarily involves a feature pooling layer, a non-linear transformation, and a filter bank. An advanced stacked convolutional AE can be employed for unsupervised feature learning. In the training process, the conventional gradient descent algorithm is utilized by every convolutional AE without including additional regularization terms. Studies demonstrated that the stacked convolutional AE can realize satisfactory CNN initializations by avoiding the highly non-convex objective function local minima.
CNNs excel in computer vision applications, too. In vision applications, convolutional restricted Boltzmann machines (CRBMs) can realize a higher convergence rate with a smaller negative likelihood function value than standard RBMs. Additionally, CDBNs can be used for unsupervised feature learning and scalable unsupervised learning for audio classification and hierarchical representations, respectively. An advanced CNN algorithm has been developed for speech recognition by replacing the mel-filter bank with an extra filter bank layer.
Conclusion and Future Outlook
DNNs are increasingly becoming crucial for several applications in both academia and industry. The trade-off between computational complexity and accuracy can be adjusted with flexibility in most DNNs. However, interpretability is essential while using ML, specifically deep learning, algorithms in practical applications.
Interpretability and explanation methods can assist in obtaining a better understanding of the problem-solving strategies and abilities of nonlinear ML such as DNNs. Different classes of eXplainable AI methods, such as layerwise relevance propagation (LRP), gradient-based techniques, occlusion analysis, and interpretable local surrogates, can be utilized in the context of DNNs.
Additionally, more research is required to design deep models that can learn from fewer training data, use optimization algorithms for network parameter adjustments, effectively analyze the DNN stability, and apply semi-supervised, unsupervised, and reinforcement-learning approaches to DNNs for complex systems.
References and Further Reading
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11-26. https://doi.org/10.1016/j.neucom.2016.12.038
Deep Neural Network [Online] Available at https://www.sciencedirect.com/topics/computer-science/deep-neural-network (Accessed on 25 December 2023)
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., Müller, K. -R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109, 3, 247-278. https://doi.org/10.1109/JPROC.2021.3060483.