Leveraging advanced algorithms like random forest and neural networks, researchers at Georgia Southern University have revolutionized how earthquakes are forecasted in Los Angeles, bringing unprecedented accuracy to disaster management and seismic risk preparedness.
Research: Improving earthquake prediction accuracy in Los Angeles with machine learning
In an article recently published in the journal Scientific Reports, researchers focused on improving earthquake prediction accuracy in Los Angeles using advanced machine learning techniques. By developing a comprehensive feature matrix and testing 16 different machine learning models, the authors identified random forest and other models like LightGBM and XGBoost as highly effective for predicting maximum earthquake magnitudes within the next 30 days. This work highlighted the potential of machine learning to enhance seismic risk management and preparedness in the region.
Background
Earthquake prediction is critical for enhancing preparedness and reducing the devastating impacts of seismic events. Previous work in earthquake forecasting has explored various techniques, including statistical models, geophysical data analysis, and, more recently, machine learning. Studies have highlighted the importance of local geological conditions and fault dynamics in predicting seismic activity, while others have introduced machine learning to improve prediction accuracy.
Despite these advancements, challenges remained in achieving reliable and timely earthquake forecasts. Prior models often faced limitations due to the complexity of seismic patterns, leading to inconsistent results. While promising, machine learning approaches still require refined feature extraction techniques and larger, more relevant datasets to improve accuracy.
This paper built on these foundations by applying advanced machine learning techniques, such as random forest and neural networks like CNNs, RNNs, and LSTMs, to enhance earthquake prediction accuracy in Los Angeles. The study improved upon earlier work by constructing a robust feature matrix and evaluating multiple machine-learning algorithms, achieving a notable accuracy of 97.97% with the random forest model. This research filled gaps in previous models using a refined semi-supervised training approach, ultimately contributing to more reliable earthquake forecasts and better disaster management strategies.
Dataset and Magnitude Standardization
The researchers utilized earthquake data from the Southern California Earthquake Data Center (SCEDC) to develop predictive models for seismic activity in the Los Angeles area. The dataset, which included earthquakes from January 1, 2012, to September 1, 2024, was filtered to focus on events within a 100-kilometer (km) radius of Los Angeles. Various magnitude types were converted to a uniform local magnitude (ML) for consistency and analysis. Notably, the dataset included minor and negative magnitude events, which were retained to offer comprehensive insights into seismic patterns.
Statistical tests revealed that earthquake magnitudes in the region were moderately right-skewed and did not follow a normal or exponential distribution. Visual analyses, including scatter plots, histograms, and geographic mappings, highlighted depth, magnitude, and time patterns. Advanced statistical tests, such as Kolmogorov-Smirnov and chi-square tests, were applied to ensure the robustness of these analyses.
A 30-day prediction window was chosen for its practical benefits. This longer timeframe allows more time for disaster preparedness in densely populated urban areas. This could provide a critical lead time for authorities to plan and respond more effectively, especially in catastrophic scenarios like those seen in past global earthquakes. The study aimed to improve earthquake forecasting and disaster management in the Los Angeles region.
Feature-Engineered Input Variables and Model Optimization
The researchers used a publicly available dataset on Zenodo to predict earthquake magnitudes within 30 days using machine learning models, particularly a random forest model with 100 estimators. After standardizing the data, models trained on all features and a subset of 15 features achieved accuracies of 0.9769 and 0.9797, respectively. The target variable, transformed into a classification problem, was balanced using the Jenks natural breaks method, outperforming other techniques in defining balanced intervals and improving model performance.
Several new features were engineered to enhance the model’s predictive power, including the rolling mean of earthquake depth, time since the last earthquake, and variations in Gutenberg-Richter b-values. Additionally, pairwise distances between events, clustering coefficients, and Gutenberg-Richter law deviations were integrated to optimize the predictive capability. These features, alongside statistical validation tests (such as chi-square and Kolmogorov-Smirnov), helped refine the model.
Multicollinearity was addressed using variance inflation factor (VIF) analysis, ensuring model stability and interpretability. These improvements resulted in a robust model capable of effectively predicting future seismic events.
Methodology and Model Comparison
Using a seismic event dataset, the researchers assessed 16 machine learning algorithms and neural network models for classifying earthquake magnitudes within 30 days. The evaluated models included traditional machine learning methods like logistic regression, decision trees, random forest, and support vector machines, alongside neural network models such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTMs), and transformers.
Exploratory data analysis revealed that the random forest model was the most effective, with performance boosted by hyperparameter optimization. Key findings indicated that random forest, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and multilayer perceptrons (MLP) provided strong predictive capabilities, while recurrent neural networks (RNNs) and long short-term memory (LSTMs) models were effective in processing time-series data. Integrating gated recurrent units (GRUs) with supplementary data sources further improved accuracy.
Statistical analysis included confidence intervals and other performance metrics, such as accuracy, precision, recall, F1-Score, and receiver operating characteristic-area under the curve (ROC-AUC), highlighting random forest, XGBoost, and LightGBM as consistently top performers. Class-specific analysis particularly noted random forest’s high accuracy (0.982) in predicting strong earthquakes.
Feature selection using information gain identified 15 key features that optimized model accuracy to 0.9797. Overall, the authors emphasized the effectiveness of advanced machine learning and neural network techniques in enhancing earthquake prediction and underscored the importance of careful feature selection and model comparison for accurate seismic event classification.
Conclusion
In conclusion, the researchers successfully enhanced earthquake prediction accuracy in Los Angeles using advanced machine learning techniques, particularly through a fine-tuned random forest model. By constructing a robust feature matrix and applying a subset of 15 selected features, the model achieved an impressive accuracy of 97.97% in predicting maximum earthquake magnitudes within 30 days.
However, the study also emphasized the importance of other models, such as LightGBM, XGBoost, and recurrent neural networks, in the overall analysis, highlighting the necessity of comparing different machine-learning approaches.
This research emphasized the critical role of machine learning in improving seismic risk management and preparedness strategies. By providing reliable forecasts, the study contributed valuable insights to the field of seismology, potentially aiding authorities in better disaster response and mitigation efforts in earthquake-prone regions.