New dataset bridges hemispheres, enabling cutting-edge machine-learning models to predict storm intensity, eye location, and cross-regional trends with unprecedented accuracy.
Research: Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks. Image Credit: Triff / Shutterstock
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In an article submitted to the arXiv preprint* server, researchers introduced the digital typhoon dataset V2, an updated version of the 40+ year typhoon satellite image dataset, now including tropical cyclone data from the southern hemisphere. This extension enables cross-hemisphere comparisons and supports research on regional cyclone characteristics. The dataset also integrates metadata interpolation, covering 78.8% of images, to enhance machine learning applications.
They developed a self-supervised learning framework combined with long short-term memory (LSTM) models for intensity forecasting and proposed new tasks such as typhoon center estimation. Their approach utilized a 6-hour window to define positive pairs and applied data augmentations like solarization and Gaussian blur while ensuring physical realism. Their study also demonstrated improved model generalization across hemispheres using data from both regions.
Related Work
Past work highlighted the critical societal impact of tropical cyclones and the potential of machine learning (ML) for their analysis and forecasting.
The release of the digital typhoon dataset in 2023 bridged meteorology and ML, offering a 40+ year dataset of typhoon satellite images. The updated dataset V2 added southern hemisphere data, enabling research on cross-hemisphere model transferability and regional cyclone characteristics.
Additionally, the study proposed new tasks, such as typhoon center estimation, and explored representation learning with self-supervised frameworks to improve forecasting capabilities. The focus on integrating image features with metadata significantly enhanced accuracy, particularly for longer-term forecasts.
Typhoon Dataset Updates
The digital typhoon dataset V2 includes temporal and spatial dimensions updates, with 2023 typhoon data added for the northern hemisphere and new data from the southern hemisphere based on Australia's Bureau of Meteorology. The dataset introduced azimuthal equidistant map projection, allowing more accurate meteorological analyses, such as measuring the distance between a typhoon center and surrounding features. This extension enables cross-hemisphere comparisons and applies a unified processing pipeline for both regions.
A shift in map projection to azimuthal equidistant was introduced to better serve meteorological research, leading to a complete dataset refresh. Data augmentation for this dataset avoids altering physical semantics, but pre-processing includes flipping for circulation alignment. The dataset, available under a Creative Commons license, supports ML with metadata interpolation for 78.8% of images.
Forecasting Insights
Recent advancements in self-supervised learning (SSL) have enabled neural networks to extract valuable features from images without labels. Using the momentum contrast version 2 (MoCo v2) framework, typhoon image representations were learned, leveraging contrastive learning and infoNCE loss. Positive pairs incorporated temporal aspects, with a 6-hour window between images, while augmentations included solarization, Gaussian blur, and controlled cropping to preserve key features like the typhoon eye. Representations were tested using an LSTM for intensity and extra-tropical storm (ETS) transition forecasting. Results showed improved forecasting accuracy, especially when combining image features with metadata.
Future work will explore advanced computer vision models, visible data, and latent space analysis to enhance meteorological insights. Increasing temporal frequency to 10 or even 2.5 minutes is also a priority for future dataset versions.
Cyclone Eye Localization
Estimating typhoon centers is challenging, especially for weak typhoons with unclear cloud patterns. To address this, an ML model was developed using a U-Net architecture paired with a weighted Hausdorff distance loss function, which avoids bounding boxes by generating heatmaps for center estimation.
The model was trained on a newly created dataset derived from the Digital Typhoon Dataset. It focused on cropped satellite images centered on the typhoon's eye, resized to 256x256 pixels. Rotation augmentation was tested but found to introduce biases, likely due to distortions in wind and cloud patterns. The dataset was split into training, validation, and testing sets, with images shuffled to eliminate time dependencies.
Results showed that the model effectively estimated typhoon centers across various grades, with stronger typhoons yielding lower errors due to more apparent eye structures. However, performance decreased for weaker typhoons with ambiguous cloud patterns, and rotation augmentation did not improve results. The study highlighted that uncertainty in heatmaps increases with weaker typhoons and occasionally results in multiple peaks, underscoring the need for human intervention in some cases. Future improvements could focus on scalability for full satellite images, exploring alternative data augmentations, and evaluating models like CenterNet for better accuracy and robustness.
Model Basin Generality
This task evaluates the generality of ML models across different typhoon basins, focusing on three tasks: center estimation, intensity classification, and intensity regression.
The models were trained on the Western Pacific (WP) dataset and tested on the Australian Region (AU) dataset. Horizontal flipping was used as a preprocessing step to align circulation direction between hemispheres, improving homogeneity across datasets. For the center estimation task, the model trained on WP images was tested on both the WP and AU datasets, with an additional horizontal flip applied to align the circulation direction between the two datasets.
The intensity classification task aimed to predict the typhoon’s grade based on images using ResNet and vision transformer (ViT) architectures. The intensity regression task aimed to estimate the central pressure using only typhoon images, with a data split that preserved temporal sequences to avoid interpolation tasks.
The results showed significant performance gaps between the two basins across all tasks. These discrepancies were attributed to label inconsistencies between agencies and inherent regional differences, highlighting the need for more nuanced dataset standardization. The center estimation task highlighted the importance of the horizontal flip for homogeneity, while the intensity classification and regression tasks revealed performance differences likely due to label discrepancies between agencies. These findings suggest that flipping images alone may not be enough to homogenize the datasets.
Conclusion
To sum up, the digital typhoon dataset V2 was introduced to advance data-driven research on tropical cyclones. This is the first update since November 2023, and the dataset was significantly expanded with data from the southern hemisphere.
Future versions aim to include additional channels such as infrared, water vapor, and visible, along with increased temporal frequency. These updates are expected to enable new research opportunities and address societal and sustainability challenges.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Source:
Journal reference:
- Preliminary scientific report.
Kitamoto, A., Dzik, E., & Faure, G. (2024). Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks. ArXiv. DOI: 10.48550/arXiv.2411.16421, https://arxiv.org/abs/2411.16421