In an article published in the journal Scientific Reports, researchers from China proposed a novel method for constructing deep neural networks using fragmented images and ensemble learning algorithms. They mentioned that their technique could provide an accessible and efficient approach for practical applications by reducing technical difficulty and hardware requirements in deep learning. Moreover, they tested the method on two image classification datasets, achieving comparable or even better accuracy than full models trained with complete data.
Background
A deep neural network learns and makes decisions by mimicking the way the human brain works. It uses layers of interconnected nodes to process information, identify patterns, and perform tasks like image recognition or language understanding. Moreover, it evolved rapidly in engineering applications, with models becoming deeper and more specialized. However, developing large models for small and medium enterprises is costly and risky, and most businesses do not need high-level artificial intelligence (AI).
Ensemble learning is an effective technique that combines multiple base learners to obtain a stronger learner. It can improve the accuracy and robustness of the models, as well as the diversity and generalization of the learners. However, many ensemble methods prioritize enhancing test accuracy by designing distinct ensemble architectures without considering the enormous computation and design costs. These performance-oriented ensembles are not well suited for practical business applications.
About the Research
In the present paper, the authors presented a fragmented neural network inspired by the random forest machine learning model. Random forest is a classical ensemble learning algorithm that makes more than one decision tree in parallel based on feature sampling. Its advantages include high accuracy, low complexity, easy parallelism, and natural distributed computing.
The designed model applies the same sampling strategy and randomly samples the samples and features on image data. Images were split into small pieces, and weak neural networks were trained using these images. Using a voting technique, many weak neural networks were then connected to form a strong neural network. In this way, sufficient accuracy is achieved by reducing the complexity and the data volume needed for each base learner.
The authors conducted experiments using two classic image classification datasets, including the modified National Institute of Standards and Technology (MNIST) and the Canadian Institute for Advanced Research 10 (CIFAR-10). The MNIST dataset comprised 70,000 images of handwritten digits from 0 to 9, whereas the CIFAR10 encompassed 60,000 images of 10 classes of objects, such as airplanes, cars, and animals.
The study used feedforward neural networks (FNN), convolutional neural networks (CNN), and deep residual networks (ResNet) as the base learners. For each type of network, 20 to 30 base learners were trained with different sampling window sizes and sample sizes. The sampling window size indicates the number of retained original images, and the sample size shows the number of images used for each base learner. Furthermore, unweighted voting, majority voting, winner-takes-all-based unweighted voting, and winner-takes-all-based majority voting strategies were used to combine the predictions of the base learners.
Research Findings
The outcomes showed that the newly designed method can effectively improve the accuracy and stability of the ensemble model compared to the base learner. The accuracy of the combined model is highly dependent on the base learners, and the complex base learners do not have a significant advantage over the simple base learners. The study highlighted that the new method achieved comparable or even better accuracy compared to the full models trained with full images. The presented model is more sensitive to the sampling window size than the sample size and the number of base learners, and the optimal window size varies depending on the type of neural network.
The authors also analyzed the effects of different parameters, such as the sample size, the sampling window size, and the number of base learners, on the accuracy and time consumption of the ensemble model. They highlighted that increasing the sample size and the number of base learners can improve accuracy but also increase the time and cost. Increasing the sampling window size can improve the accuracy of the CNN and ResNet models but not the FNN model. Moreover, they found that unweighted voting has the highest average accuracy among the four voting methods but also requires more time compared to other methods.
The proposed method has potential applications in various fields that require image classification, such as face recognition, object detection, medical diagnosis, and remote sensing. It can enable parallel and distributed computing to improve the computational speed and scalability of the models. Moreover, it can enhance the robustness and generalizability of the models, which can handle noise and uncertainty better.
Conclusion
In summary, the developed neural network is efficient, effective, robust, and stable for building deep neural networks using fragmented images and ensemble learning. It works on the principle of random forest and aims to reduce the technical difficulty and hardware needed for deep learning. Moreover, it was tested on MNIST and CIFAR10 image classification datasets and achieved comparable or even better accuracy than the full models trained with full data.
The researchers acknowledged limitations and challenges and suggested directions for further research. They recommended future work to develop a control device for dynamic model pools, facilitating operations within the pool. Moreover, they advised applying the proposed method in federated learning scenarios, where each data node can train a local model and transfer it to the central node for ensemble decision-making.