In an article published in the journal Remote Sensing, researchers focused on classifying tree species in Austria's forests using Sentinel-2 (S2) satellite imagery. They explored the effectiveness of dense phenology time series for mapping and included both mixed and sparse tree species classes.
The authors highlighted the impact of spatial autocorrelation on validation accuracy, comparing methods such as spatial split validation and National Forest Inventory (NFI)-based validation. They found significant accuracy variations, emphasizing the challenges in capturing forest complexity with polygon-based training data.
Background
Forests are crucial for ecological balance, climate regulation, and economic benefits, including timber and tourism. Earth observation (EO) data, particularly from S2 satellites, have advanced tree species classification through high-resolution imagery and multi-sensor data fusion. Previous studies have successfully used S2 data for mapping pure tree species but faced challenges with heterogeneous forest structures and mixed pixels. Limited large-scale research has addressed these issues effectively.
This paper aimed to improve large-scale tree species classification by integrating mixed and sparse classes into training and validation datasets. It used a dense phenology time series from S2 data, enhanced by a hybrid neural network architecture for better spatial and temporal accuracy. The study addressed gaps in existing methods by employing spatial autocorrelation analysis and innovative validation techniques, providing a more accurate representation of forest complexity and improving validation reliability.
Methodology for Tree Species Classification
The researchers outlined the methodology for classifying tree species in Austria utilizing S2 satellite imagery alongside various datasets. The approach involved processing multi-spectral time series data from S2 to extract phenology features and metrics, such as greening days and vegetation periods. Digital terrain models (DTM) and digital surface models (DSM), which included slope and height metrics, were also integrated to enrich the feature set.
To address the issue of mixed pixels in S2 imagery, the authors defined pure, mixed, and sparse classes. Training data were meticulously labeled through visual interpretations and supplemented with synthetic data to enhance pixel classification accuracy. Validation was performed using data from the NFI, with a focus on mitigating spatial autocorrelation to ensure robust model validation.
A hybrid neural network model combining residual network (ResNet) and multilayer perceptron (MLP) architectures was employed for classification. The training process involved data standardization, parameter optimization, and rigorous validation to avoid overfitting. The model utilized cross-entropy loss and the Adam optimizer and was trained on an NVIDIA GeForce RTX 3070 Ti. Final model validation with NFI data confirmed the accuracy of classifications for both pure and mixed tree species.
Performance Metrics and Validation Results
In the clustered spatial split distance analysis, three models were evaluated across ten spatial splits ranging from 125 to 5000 meters. Accuracy metrics, including the mean NFI-weighted overall accuracy (NFI-w-OA) and macro-averaged F1 score (MAF1), showed a decrease up to 3000 meters, stabilizing around 4000 meters. Increased split distances led to a more uneven holdout set distribution among classes.
For the NFI validation buffer distance analysis, two experiments were conducted. The first series, varying buffer distances, showed stable accuracies until around 5000 meters, with significant accuracy drops and high training data discarded beyond 7500 meters. The second series, maintaining constant training data discard, displayed steady accuracies up to 10,000 meters, with slight drops thereafter.
Model validation revealed that the base model achieved high accuracy with 99% NFI-w-OA. Integration of synthetic training data significantly improved classifier performance. The NFI validation assessment showed the base model's superior performance with an overall accuracy of 55.3% and MAF1 of 42.0%.
Insights into Classification and Validation
In this study, tree species classification was performed over a 40,178 square kilometers forest area using dense S2 imagery. A comprehensive dataset of approximately 570,000 S2 pixels was collected and used to address challenges in spatial autocorrelation and mixed species classes.
The methodology included synthetic data to enhance model performance, addressing spatial autocorrelation through clustered spatial split validation (CSS-VAL) and ground truth probability sampling. Validation results showed a significant accuracy drop due to spatial autocorrelation, with CSS-VAL showing 74% accuracy compared to 99% from random validation. Mixed species classes were found to complicate training data labeling, though synthetic data improved results.
The ResNet architecture used effectively handled the complexities of phenology time series data. Validation metrics, including overall misclassification score (OMS) and prediction in close phenological proximity (PCPP), highlighted improvements with synthetic data but also underscored challenges in generalizing from training to real-world data.
Conclusion
In conclusion, the researchers demonstrated the critical need to account for spatial complexity and autocorrelation in large-scale tree species classification using S2 imagery. The research effectively integrated mixed species classes into training datasets and employed synthetic data to improve model performance. Significant disparities in accuracy between random and clustered validations emphasized the importance of considering spatial factors in map evaluation.
Innovative methods, including tailored validation metrics and advanced neural network architectures, enhanced classification results. However, challenges remain with geolocation accuracy and the trade-offs between training data quality and representativity. Future research should focus on refining these methods and exploring advanced deep-learning techniques to further enhance tree species classification.
Journal reference:
- Schadauer, T., et al. (2024). Evaluating Tree Species Mapping: Probability Sampling Validation of Pure and Mixed Species Classes Using Convolutional Neural Networks and Sentinel-2 Time Series. Remote Sensing, 16(16), 2887–2887. DOI: 10.3390/rs16162887, https://www.mdpi.com/2072-4292/16/16/2887