In a paper published in the journal Remote Sensing, researchers have developed an open-source Python tool to enhance the performance of machine learning (ML) models used for land cover mapping with Google Earth Engine. This tool integrates land cover classification and change dynamics by offering three feature importance metrics. Two case studies demonstrate its usability and make it accessible globally for improved remote sensing applications with explainable artificial intelligence (AI).
Background
The context of this study revolves around the widespread application of AI, particularly in remote sensing (RS) and Earth sciences. The opacity of AI, called a "black box," has given rise to the emergence of explainable AI (XAI) aimed at enhancing the transparency of AI models. Feature importance metrics like Shapley Additive Explanations (SHAP) are crucial for understanding AI models. Google Earth Engine (GEE) offers a powerful geospatial platform, but there is a gap in integrating XAI with GEE for improved transparency. The need for accurate, adaptable land use land cover (LULC) maps is evident, with a growing demand for user-friendly Python-based tools.
Proposed Method
The study presents an explainable ML tool implemented as a Jupyter Notebook, intended to operate smoothly on Google Colaboratory (Google Colab). This notebook offers a user-friendly interface with no need for local setup for operating entirely within a web browser by connecting to Google's cloud servers. The tool primarily relies on Python packages such as geemap and ipywidgets for interactive mapping with GEE and enables users to analyze and visualize Earth Engine datasets interactively using Jupyter notebooks.
Furthermore, the tool utilizes the scikit-learn and shape packages to calculate feature importance values. Additionally, Colab's layout widgets assist in organizing classification results and feature importance plots into user-friendly display tabs. While certain display tabs may function exclusively on Colab, the primary functionalities of the land cover mapping workflow are designed to work across different platforms running Jupyter notebooks.
The workflow for land cover classification involves several key steps, including defining the region of interest (ROI), selecting data sources and dates, preparing labeled data, choosing a classifier, classifying the image, and conducting accuracy assessments and post-processing visualizations. Users can define their ROI using various methods, such as drawing geometries on the map, inputting coordinates, specifying bounding boxes, or uploading GeoJSON files. The available satellite data sources for land cover classification include Landsat Collection 2 and Sentinel-2, along with spectral indices and topographic variables.
Users can provide labeled data either by uploading a Comma-Separated Values (CSV) file with sample locations and class labels or by generating samples from known land cover products. The tool supports supervised classifiers like Classification and Regression Trees (CART) and Random Forest (RF), with an option to customize classifier parameters. Classifier accuracy assessments and confusion matrices are provided. Additionally, the tool supports feature importance calculations, including impurity-based importance, permutation importance, and SHAP-value-based importance. Finally, users can calculate zonal areas occupied by different land cover classes in the classified image for further analysis and visualization.
Experimental Results
Case Study 1: Land Cover Classification around San Francisco Bay
The land cover classification conducted around San Francisco Bay for the year 2021 using the developed tool yielded insightful results. Here are some of the key outcomes:
Classification Map: The tool successfully classified the land cover for the region by generating a detailed classification map that categorized various land cover types.
Feature Importance: The feature importance analysis, conducted using both GEE and scikit-learn's random forest classifier, was identified as the most influential feature for the classification. Notably, features like 'elevation,' 'NDVI,' 'B11,' and 'NDWI' were found to be among the most important predictors.
Classifier Metrics: The tool provided classifier metrics, including an overall accuracy score, precision, recall, and F1-score for each land cover class. This allowed for an assessment of the classification's accuracy and performance.
Visualizations: Various visualizations, such as labels, histograms, parallel coordinate plots, and confusion matrices, helped interpret the classification results. These visualizations provided valuable insights into the distribution of land cover classes and classifier performance.
Case Study 2: Land Cover Change off Dubai Coast
The investigation of land cover changes off the coast of Dubai, spanning the years 2000 to 2020, also revealed significant insights into the region's evolution. Here are the key results:
Land Reclamation Detection: The tool successfully detected and visualized land cover changes associated with major reclamation projects, such as the Palm Islands and the World Islands. These projects led to substantial increases in land area in the Dubai coastal area.
Temporal Evolution: The spatio-temporal analysis demonstrated that the World Islands were constructed around the same time as the Palm Islands, with no significant changes observed after 2008.
Zonal Area Calculations: The tool calculated the zonal areas for each land cover class annually by providing quantifiable data on the extent of land cover changes over time. It revealed that approximately 40 square kilometers of land were reclaimed along the Dubai coastline between 2001 and 2008, with stability observed after the 2008 financial crisis.
Data Limitations: While the classifier reported an accuracy of 1.0, it is important to note that the limited sample size for certain classes may affect comprehensive feature importance analysis.
Conclusion
To sum up, this paper introduced an open-source explainable ML tool that integrates XAI into land cover mapping and monitoring using Google Earth Engine. The tool supports classification and change detection workflows by offering impurity-based, permutation-based, and SHAP-value-based feature importance for assessing classifiers and identifying important features. While it currently lacks online label preparation, it provides flexibility for customization and expansion.