MFCA-Net: Semantic Segmentation in Remote Sensing Images

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Mar 19 2024

In a paper published in the journal Scientific Reports, researchers explored semantic segmentation in remote sensing images (RSI). They introduced multi-feature fusion and channel attention network (MFCA-Net), a novel approach to improve segmentation accuracy and recognition of small target objects. MFCA-Net utilized an encoding-decoding structure in the encoding phase, incorporating an improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF).

*Study: MFCA-Net: Semantic Segmentation in Remote Sensing Images. Image credit: ASVMAGZ/Shutterstock*

The decoding phase fused shallow and deep features, followed by upsampling for pixel-level classification. Comparative experiments demonstrated significant improvements over six state-of-the-art methods, with MFCA-Net achieving notable enhancements in segmentation accuracy and small target object recognition on the Vaihingen dataset.

Related Work

Past approaches in remote sensing technology have included threshold-based, edge-based, and region-based semantic segmentation methods, often requiring manual parameter tuning and help with extensive semantic information. However, recent advancements in deep learning, notably through models like U-shaped network (U-Net), improved U-Net, and DeepLab, have transformed RSI analysis.

Transformer models, renowned for their robustness owing to self-attention mechanisms, have also gained prominence in tasks such as semantic segmentation of RSI. Furthermore, the introduction of generative adversarial networks (GANs) has addressed challenges associated with large-scale annotated datasets, offering promising avenues for enhancing semantic segmentation.

MFCA-Net Framework Overview

The MFCA-Net framework adopts an encoding-decoding structure, utilizing MobileNet V2 as the backbone and enhancing it with attention mechanisms in shallow and deep feature layers. Additionally, the framework incorporates the MFDF module to address challenges in identifying small target objects through denser sampling points and obtaining a larger receptive field.

The encoding phase involves MobileNet V2, where channel attention mechanisms are introduced after bottleneck1 and bottleneck6 to enhance feature extraction. It includes compression, activation, and scale operations to assign weights to different channels based on their relevance to the task.

The MFDF module addresses limitations in capturing spatial information for various object sizes by fusing convolution feature maps with different dilation rates. Researchers integrated adaptive average pooling to prevent overfitting. The resulting structure involves densely connected branches, significantly enlarging the receptive field compared to traditional methods like the atrous spatial pyramid pooling (ASPP).

In the decoding phase, the framework performs fusion operations to enhance the application of low-level feature maps, then follows it with downsampling and fusion with deep feature maps. Upsampling using bilinear interpolation generates the segmentation image.

Finally, the loss function is tailored for semantic segmentation, incorporating weight factors to balance its distribution and improve the importance of specific classes. The focal loss function is employed, adjusting category balance and focusing on complex samples to improve segmentation results.

Experiments and Analysis

Researchers designed two experiments to verify the proposed MFCA-Net's performance and compare it with six state-of-the-art methods: segmentation network (SegNet), U-Net, pyramid scene parsing network (PSPNet), dual attention network (DANet), DeepLab V3+, and adaptive attention feature pyramid network (A2-FPN). These methods encompass various segmentation techniques, such as unspooling structures and pyramid pooling modules. Secondly, an ablation experiment assesses the significance of MFCA-Net's components.

For evaluation, researchers utilize pixel accuracy (PA), mean PA (MPA), mean intersection over union (MIoU), and frequency-weighted intersection over union (FWIoU). The experiments are conducted on Windows 10, utilizing an NVIDIA Geforce RTX3060 graphics card with Pytorch 1.7.

Two datasets, the Vaihingen and Gaofen image dataset (GID), are employed. Vaihingen, collected by airborne imaging equipment, comprises imagery of a small village in Germany. GID, derived from China's Gaofen-2 satellite data, includes images from over 60 cities in China.

Quantitative and visual performance analysis reveals MFCA-Net's superior performance. It outperforms other methods in terms of MIoU and FWIoU on both datasets. Visual inspection further confirms its efficacy, particularly in accurately delineating boundaries and recognizing small target objects.

In the ablation study, incorporating enhancements such as improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF) modules improved segmentation accuracy. Visualizations elucidate the impact of these enhancements, showcasing more explicit segmentation boundaries and enhanced recognition of small objects compared to the baseline network.

Moreover, the ablation study highlights the importance of each component in MFCA-Net's architecture, providing insights into the mechanisms driving its superior performance and guiding future improvements in RSI segmentation techniques.

Conclusion

In summary, this paper proposed MFCA-Net, a novel approach to improve semantic segmentation performance with RSI. The method incorporated a channel attention module into the feature extraction network's shallow and deep feature maps and adopted a two-dimensional activation function funnel rectified linear unit (FReLU) for context information.

Additionally, the MFDF module was introduced for deep feature extraction, followed by upsampling processes that fused the shallow feature map branches of the backbone network. The proposed MFCA-Net demonstrated superior performance and higher detection accuracies than state-of-the-art methods. Critical advantages of MFCA-Net included advanced semantic segmentation results and the potential for quick and effective learning performance, making it suitable for practical engineering applications. Future studies would focus on collecting large-area datasets to improve change detection techniques further.

Journal reference:

Li, X., & Li, J. (2024). MFCA-Net: a deep learning method for semantic segmentation of remote sensing images. Scientific Reports, 14:1, 5745. https://doi.org/10.1038/s41598-024-56211-1, https://www.nature.com/articles/s41598-024-56211-1

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, March 19). MFCA-Net: Semantic Segmentation in Remote Sensing Images. AZoAi. Retrieved on July 01, 2025 from https://www.azoai.com/news/20240319/MFCA-Net-Semantic-Segmentation-in-Remote-Sensing-Images.aspx.
MLA
Chandrasekar, Silpaja. "MFCA-Net: Semantic Segmentation in Remote Sensing Images". AZoAi. 01 July 2025. <https://www.azoai.com/news/20240319/MFCA-Net-Semantic-Segmentation-in-Remote-Sensing-Images.aspx>.
Chicago
Chandrasekar, Silpaja. "MFCA-Net: Semantic Segmentation in Remote Sensing Images". AZoAi. https://www.azoai.com/news/20240319/MFCA-Net-Semantic-Segmentation-in-Remote-Sensing-Images.aspx. (accessed July 01, 2025).
Harvard
Chandrasekar, Silpaja. 2024. MFCA-Net: Semantic Segmentation in Remote Sensing Images. AZoAi, viewed 01 July 2025, https://www.azoai.com/news/20240319/MFCA-Net-Semantic-Segmentation-in-Remote-Sensing-Images.aspx.