In a paper published in the journal Scientific Reports, researchers explored semantic segmentation in remote sensing images (RSI). They introduced multi-feature fusion and channel attention network (MFCA-Net), a novel approach to improve segmentation accuracy and recognition of small target objects. MFCA-Net utilized an encoding-decoding structure in the encoding phase, incorporating an improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF).
The decoding phase fused shallow and deep features, followed by upsampling for pixel-level classification. Comparative experiments demonstrated significant improvements over six state-of-the-art methods, with MFCA-Net achieving notable enhancements in segmentation accuracy and small target object recognition on the Vaihingen dataset.
Related Work
Past approaches in remote sensing technology have included threshold-based, edge-based, and region-based semantic segmentation methods, often requiring manual parameter tuning and help with extensive semantic information. However, recent advancements in deep learning, notably through models like U-shaped network (U-Net), improved U-Net, and DeepLab, have transformed RSI analysis.
Transformer models, renowned for their robustness owing to self-attention mechanisms, have also gained prominence in tasks such as semantic segmentation of RSI. Furthermore, the introduction of generative adversarial networks (GANs) has addressed challenges associated with large-scale annotated datasets, offering promising avenues for enhancing semantic segmentation.
MFCA-Net Framework Overview
The MFCA-Net framework adopts an encoding-decoding structure, utilizing MobileNet V2 as the backbone and enhancing it with attention mechanisms in shallow and deep feature layers. Additionally, the framework incorporates the MFDF module to address challenges in identifying small target objects through denser sampling points and obtaining a larger receptive field.
The encoding phase involves MobileNet V2, where channel attention mechanisms are introduced after bottleneck1 and bottleneck6 to enhance feature extraction. It includes compression, activation, and scale operations to assign weights to different channels based on their relevance to the task.
The MFDF module addresses limitations in capturing spatial information for various object sizes by fusing convolution feature maps with different dilation rates. Researchers integrated adaptive average pooling to prevent overfitting. The resulting structure involves densely connected branches, significantly enlarging the receptive field compared to traditional methods like the atrous spatial pyramid pooling (ASPP).
In the decoding phase, the framework performs fusion operations to enhance the application of low-level feature maps, then follows it with downsampling and fusion with deep feature maps. Upsampling using bilinear interpolation generates the segmentation image.
Finally, the loss function is tailored for semantic segmentation, incorporating weight factors to balance its distribution and improve the importance of specific classes. The focal loss function is employed, adjusting category balance and focusing on complex samples to improve segmentation results.
Experiments and Analysis
Researchers designed two experiments to verify the proposed MFCA-Net's performance and compare it with six state-of-the-art methods: segmentation network (SegNet), U-Net, pyramid scene parsing network (PSPNet), dual attention network (DANet), DeepLab V3+, and adaptive attention feature pyramid network (A2-FPN). These methods encompass various segmentation techniques, such as unspooling structures and pyramid pooling modules. Secondly, an ablation experiment assesses the significance of MFCA-Net's components.
For evaluation, researchers utilize pixel accuracy (PA), mean PA (MPA), mean intersection over union (MIoU), and frequency-weighted intersection over union (FWIoU). The experiments are conducted on Windows 10, utilizing an NVIDIA Geforce RTX3060 graphics card with Pytorch 1.7.
Two datasets, the Vaihingen and Gaofen image dataset (GID), are employed. Vaihingen, collected by airborne imaging equipment, comprises imagery of a small village in Germany. GID, derived from China's Gaofen-2 satellite data, includes images from over 60 cities in China.
Quantitative and visual performance analysis reveals MFCA-Net's superior performance. It outperforms other methods in terms of MIoU and FWIoU on both datasets. Visual inspection further confirms its efficacy, particularly in accurately delineating boundaries and recognizing small target objects.
In the ablation study, incorporating enhancements such as improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF) modules improved segmentation accuracy. Visualizations elucidate the impact of these enhancements, showcasing more explicit segmentation boundaries and enhanced recognition of small objects compared to the baseline network.
Moreover, the ablation study highlights the importance of each component in MFCA-Net's architecture, providing insights into the mechanisms driving its superior performance and guiding future improvements in RSI segmentation techniques.
Conclusion
In summary, this paper proposed MFCA-Net, a novel approach to improve semantic segmentation performance with RSI. The method incorporated a channel attention module into the feature extraction network's shallow and deep feature maps and adopted a two-dimensional activation function funnel rectified linear unit (FReLU) for context information.
Additionally, the MFDF module was introduced for deep feature extraction, followed by upsampling processes that fused the shallow feature map branches of the backbone network. The proposed MFCA-Net demonstrated superior performance and higher detection accuracies than state-of-the-art methods. Critical advantages of MFCA-Net included advanced semantic segmentation results and the potential for quick and effective learning performance, making it suitable for practical engineering applications. Future studies would focus on collecting large-area datasets to improve change detection techniques further.