In a recent publication in Sensors, researchers proposed a multichannel deep convolutional neural network (MDCNN) connected to a Visual Geometry Group (VGG) model for underwater image enhancement.
Background
Considering the growing demand for oceanic knowledge and situational awareness, there is a pressing need for high-quality underwater images in various marine enterprises. These enterprises rely on an underwater Internet of Things (IoT) sensor network to gather data on underwater life and seabed conditions. This data aids in enhancing the understanding of the ocean's ongoing state, bolstered by advances in big data, cloud computing, and IoT technologies.
In recent years, numerous nations have actively developed underwater detection applications to address issues such as marine environmental monitoring, submarine surveying, underwater archaeology, debris collection, and underwater rescue. These applications require real-time image or video interpretation to guide underwater robots effectively.
Image enhancement has gained significant attention in computer vision and signal processing. Earlier methods used filters for contrast and brightness improvement. Deep learning has revolutionized image enhancement. Deep convolution neural network (CNN)-based models excel in color adjustment and de-cluttering. For underwater images, Shallow-Uwnet and a variant of the residual network (UResnet) were proposed.
Generative adversarial network (GAN)-based style transformation has also succeeded. Physics-based methods estimate light transmission and recover pixel intensity, while nonphysical methods directly modify pixel values for enhancement. Deep learning is data-hungry but feature-rich; physics-based methods rely on models; and nonphysical methods offer flexibility but may cause artifacts.
However, these methods often fall short of balancing contrast, brightness, and color in underwater images, warranting further research. The current study underscores the importance of addressing these issues to enhance underwater image effectiveness.
Architecture
Researchers introduce the MDCNN-VGG, aimed at maximizing the utilization of information distribution in diverse domains while enhancing domain adaptability. Extensive qualitative and quantitative experiments validate the MDCNN-VGG's superior performance in underwater image quality improvement compared to the benchmark model.
The MDCNN-VGG architecture comprises an MDCNN alongside a VGG-16 model, leveraging neural network classifiers and VGG perceptual loss. The MDCNN encompasses multiple parallel deep convolutional neural networks (DCNNs) involving several fully connected CNN layers. These DCNNs follow the principles of traditional CNNs, employing convolutional layers, pooling, and fully connected network ends to extract distinguishing features from original images.
They utilize supervised learning to compute effective subregions based on the perceptual field features of the DCNN, enhancing the model's domain adaptation. Two DCNN network streams share parameters to configure the importance of different model parameters using a soft mask. This facilitates the extraction of texture, structure, and color information from various underwater image domains, further processed by the subsequent VGG.
The VGG-16 identifies and classifies data elements in distinct underwater backgrounds, obtaining feature representations for each domain using VGG perceptual loss. The study also incorporates single-channel DCNNs, where two network streams, Scl and Scom, focus on identifying regions contributing to target object identification in underwater images. These streams utilize classical optimization techniques and soft masks to enhance different domain information.
To optimize the model, a soft mask guides the network to focus on regions of interest with high-response regions indicating image quality too. The pixel mean square error (MSE) calculates image differences to minimize prediction probability errors. The MDCNN-VGG integrates external supervision through the VGG-based loss function and other loss functions.
Experiments and analysis
The MDCNN-VGG was tested on real image datasets to enhance underwater images from various domains. These datasets are as follows:
UFO-120: Clear images collected from ocean soundings for different water types. A subset of 120 images was used as the test set.
Enhancing Underwater Visual Perception (EUVP) Dark: A large collection of paired and unpaired images from ocean soundings under different visibility conditions. This dataset contains 5500 pairs of images with dark underwater backgrounds.
Underwater Image Enhancement Benchmark Dataset (UIEBD): Comprising 890 pairs of underwater images captured under various lighting conditions with different color ranges and contrasts.
The study used standard metrics, including peak signal-to-noise ratio (PSNR), structural similarity index metric (SSIM), and non-reference underwater image quality metric (UIQM), to quantitatively evaluate the model's output images. UIQM consists of three attribute metrics: image color (UICM), sharpness (UISM), and contrast (UIConM).
The model's image enhancement effectiveness was tested in multi-domain underwater image enhancement experiments. The MDCNN-VGG demonstrated the best enhancement for images in different domains, with the targets remaining visible even from different viewpoints.
Several baseline models were used for comparison, including Shallow-Uwnet, UResnet, Cycle-GAN, and underwater GAN with gradient penalty (UGAN-P). The MDCNN-VGG showed the best UIQM values for UFO-120 but weaker PSNR and SSIM results. It performed well in EUVP Dark, although UGAN-P and UResNet outperformed it for paired data. The ablation experiments demonstrated that various loss terms contributed to image enhancement, with clear improvements when combining these terms.
Conclusion
In this research, the MDCNN-VGG, a deep learning model, is introduced for the rapid enhancement of multi-domain underwater images. It combines a DCNN with various channels to extract local information from different domain underwater images and enhances them using VGG.
The model considers global color, structure, local texture, style information, and perceived loss for image quality evaluation. Extensive evaluations demonstrate an average improvement of 0.11 in UIQM values over Shallow-Uwnet for different datasets. Future improvements may focus on model design and robust optimization to address issues like blurred details, color bias, and overexposure.