In a paper published in the journal Remote Sensing, researchers employed an unsupervised machine learning (ML) model to classify biases in sea surface salinity (SSS) data from the soil moisture active passive (SMAP) satellite.
The model identified significant biases in cold regions, mid-latitudes, and areas with strong ocean currents. These findings highlighted the need for careful interpretation of SSS data, particularly in dynamically active regions where satellite and in situ correlations remained strong.
Related Work
Past work has extensively utilized satellite-observed SSS data to study ocean salinity variations, relying on missions like soil moisture and ocean salinity (SMOS), Aquarius, and SMAP. Nevertheless, persistent biases in the data, especially in cold places, high latitudes, and areas with strong wind speeds or precipitation, have been a considerable issue. Conventional on-site measurements, such as those from Argo floats, have restricted temporal and spatial coverage but complement satellite data.
Data Classification Overview
Salinity data from the SMAP satellite were sourced from the jet Propulsion laboratory's physical oceanography distributed active archive center (PODAAC). The study utilized Level 3 SMAP SSS data, which offer an 8-day running mean and standard mapping.
For validation, in situ, salinity data from Argo floats, specifically those taken at depths shallower than 5 meters, were compared to the nearest satellite grid point. The team imposed a maximum distance of 19.5 km and a 24-hour time lag on the comparison. They defined the bias as the difference between satellite SSS and Argo SSS.
Additional environmental data were also integrated, such as sea surface temperature (SST) from the optimum interpolation of 1/4 degree daily sea surface temperature (OISST), precipitation from global precipitation measurement (GPM), and wind speed from cross-calibrated multi-platform (CCMP). These data points formed the profiles used for classification.
This study classified the satellite SSS biases using an unsupervised ML technique called the Gaussian mixture model (GMM). GMM assumes that data samples are generated from a combination of Gaussian distributions, each representing a distinct class—the model aimed to identify the different groups of environmental factors affecting the SSS bias.
The number of classes, K, was determined using the Bayesian information criterion (BIC), which balances model fit and simplicity. Although the BIC score did not reach a definitive minimum, it flattened at K = 15, indicating an optimal balance. Thus, 15 classes were chosen for the final classification.
The GMM model was implemented using the scikit-learn package. The model successfully classified the profiles into 15 distinct classes, each representing a combination of environmental factors influencing the SSS bias.
This classification provided insights into the geoclimatic distribution of satellite SSS biases, highlighting areas where satellite measurements are less accurate and emphasizing the need to interpret satellite-derived salinity data in varying environmental conditions carefully.
Satellite SSS Biases
The classification of satellite SSS biases using a GMM reveals distinct environmental patterns among the 15 identified classes. These classes are strongly associated with SST and wind speed. For example, wind speeds are usually low on courses with SSTs above 25°C and high on courses with SSTs below 10°C.
This relationship reflects global climate and atmospheric circulation patterns, with cooler SST and higher wind speeds in higher latitudes due to lower solar incidence and polar vortex influences. In addition, most classes have low precipitation, less than 1 mm/day, affecting SSS measurements. Including rainfall data in the classification further helps explore its role in SSS bias.
Classes are grouped into three SST ranges for better analysis: warm (above 25°C), middle (10–20°C), and cold (below 10°C). In the warm SST range, classes span tropical and subtropical regions, with the largest class, K11, showing minimal bias in these areas.
Smaller classes, such as K13, K3, K8, and K15, show increasing bias with higher SSTs. Class K8, associated with high rainfall in the intertropical convergence zone (ITCZ) and South Pacific convergence zone (SPCZ), exhibits a freshening effect, consistent with earlier findings that heavy rain can create a fresh lens in tropical waters.
In the middle SST range, classes show varying biases, with saline bias prevalent in northern and southern extremes and fresh bias in subtropical zones. In the cold SST range, Classes K2 and K10 reveal significant biases due to low salinity retrieval sensitivity, with K2 showing a broad bias range and K10 a pronounced bias in polar winter, highlighting challenges from low temperatures and sea ice.
Outliers like classes K12, K4, and K5 reveal challenges in unusual environments: K12's fresh bias links to sea ice melt, K4's freshening relates to extreme rainfall, and K5's discrepancies arise from strong ocean currents. These findings highlight the need to investigate further the accuracy of satellite salinity data in these conditions.
Conclusion
To sum up, the study used GMM to classify satellite SSS data bias and assess geographical patterns influenced by environmental factors. This unsupervised approach uncovered patterns not easily detected by traditional methods and identified specific biases related to sea ice, extreme rainfall, and ocean currents. These insights will guide improvements in future satellite salinity instruments and algorithms by pinpointing areas of current underperformance.
Journal reference:
- Ouyang, Y., et al. (2024). Geoclimatic Distribution of Satellite-Observed Salinity Bias Classified by Machine Learning Approach. Remote Sensing, 16:16, 3084–3084. DOI:10.3390/rs16163084, https://www.mdpi.com/2072-4292/16/16/3084