In an article recently published in the Journal of Imaging, researchers reviewed the recent developments and improvements in image processing using reinforcement learning (RL) and deep learning (DL).
Background
Image processing has gained significant attention from the industry and scientific community due to its extensive applicability. Artificial intelligence (AI), specifically machine learning (ML), has transformed several areas of technology, including image processing.
In image processing, the use of ML techniques can lead to several advantages, including the optimization of processes and improved data analysis accuracy. Moreover, the advances in the application of neural networks have enabled the identification of objects, recognition of patterns, and the execution of complex analyses on images with high accuracy.
In this paper, the authors reviewed the recent successes and advances of ML in digital image processing. They considered a large number of scientific publications that involved image-processing methods using ML, specifically state-of-the-art RL and DL techniques, which were applied to solve real-world problems.
University repositories and reputable publication journals were considered for this review to ensure the reliability of the documents. The authors focused on studies in the last five years and selected those that provided interesting and/or novel applications of ML in image processing.
Image processing developments using DL
A recurrent long and short-term memory network (LSTM) was one of the first DL models used for future image prediction based on a sequence of images encoded during the processing of video data.
In a recent study, a correction neural network model, designated as boundary regulated network (BR-Net), was developed. In the model, remote high-resolution satellite images were used as the source, and the image features were extracted through pooling, classification, and convolution.
Additionally, the model was trained on the experimental dataset in a specific area to increase its accuracy. The results indicated a 15% improvement in model performance and a 20% increase in recognition speed compared to newly researched models. However, the BR-Net model possessed poor generalization ability for a significantly large amount of data.
In another study, the ability of a convolutional neural network (CNN) model to learn safe driving maneuvers depending on the front-facing camera collected data was investigated. The data collection was performed by an experienced driver on urban routes. Researchers developed a 17-layer behavior cloning CNN model and added four drop-out layers to avoid overfitting during training.
The results were satisfactory as a small amount of training data obtained from a few tracks was adequate for training the car to drive safely on several tracks. However, the approach can require many tracks for correct generalization during actual street deployment, which is a major limitation.
A mechatronics platform was proposed in a study for real-time and static posture analysis. The platform consisted of three complex components, including a software module for semi-automatic image analysis and data collection, a network to provide raw/input data to the DL server, and a mechanical structure equipped with cameras.
The results demonstrated that the easy-to-use and inexpensive device can effectively enable postural assessment in a non-invasive manner and with great stability. Thus, the device can be a suitable tool for patient rehabilitation. In optical microscopy, studies have displayed that DL-based image processing can improve the resolution of images in smartphone-based microscopy, which is extremely crucial for the evolution and development of healthcare solutions in remote regions. DL can also be utilized to monitor protein localization and gene expression in organisms, which indicates the feasibility of using DL networks for medical image processing.
Image processing developments using RL
In a recent study, researchers combined deep RL (DRL) methods with conceptual embedding techniques to explore suitable healthcare strategies for simulated human bodies. Additionally, the deep neural network (DNN) architecture was utilized to recreate the input-output characteristic transformation function in the human body. The findings displayed that the proposed framework can be effectively applied to a dynamic, high-dimensional human body system.
In another study, an RL module with augmented data leveraging was proposed to overcome the challenges related to data efficiency and generalization to unknown environments in computer vision. The module can be incorporated into common RL systems to enhance their overall performance.
Additionally, the data augmentations can increase the data efficiency in RL techniques operating from pixels without making significant changes to the RL algorithm. Thus, this proposed approach can make DRL more suitable for solving real-world issues.
Researchers developed a system for image stereo-matching algorithms with parallax estimation and rule constraints in another study. The edge pixel constraint rules were initially established, and image blocks were adjusted. Subsequently, researchers performed the image parallax estimation, and a CNN was used to execute a DRL analysis iteratively.
The results demonstrated that the proposed algorithm could quickly complete convergence with more than 95% accuracy. However, the matching targets have not been defined clearly, specifically in small objects with curved surfaces, which can hinder their applicability in real-world scenarios.
Conclusion
Although several studies have shown developments in image processing using RL and DL techniques, more research is required to overcome the major limitations of ML based on its design and operational efficiency. For instance, most ML algorithms developed until now are trained to perform only a specific task/solve a specific problem, which increases the difficulties of applying them to solve other problems.
Additionally, a significant amount of computational resources and data is required to run and train the DL models, which can be impractical or infeasible in several applications/scenarios. Finally, interpreting the DL models is extremely difficult due to their opacity and complexity, which increase the challenges in understanding their outcomes and internal functioning.
Journal reference:
- Valente, J., António, J., Mora, C., Jardim, S. (2023). Developments in Image Processing Using Deep Learning and Reinforcement Learning. Journal of Imaging, 9(10), 207. https://doi.org/10.3390/jimaging9100207, https://www.mdpi.com/2313-433X/9/10/207