In an article published in the journal Applied Sciences, researchers confirmed the reliability of predictive regression algorithms in filling missing geophysical logging data in the Drava Super Basin, particularly Gola Field. They evaluated tree-based and boosting algorithms for prediction accuracy and used long short-term memory (LSTM) neural networks to improve predictions for blind datasets. Unsupervised machine learning (ML) effectively distinguished lithological variations, aiding in understanding subsurface lithological relations in well log datasets from 20 wells.
Image Credit: vectorfusionart/Shutterstock.com
Background
Well logging data is crucial for characterizing subsurface properties, especially in complex regions like the Drava Basin. Previous research has applied ML techniques to identify lithofacies and characterize reservoirs. However, many studies have faced limitations due to incomplete or poor-quality well logs. Specific challenges arise from the complex lithology and sparse data in the Croatian region of the Drava Basin.
This study aimed to address these gaps by applying 17 ML algorithms, including tree-based, boosting, and LSTM neural networks, to predict missing well-log data and improve lithological characterization. Focusing on the Gola Field, the research also employed unsupervised learning for lithology pattern recognition. By recreating missing logs and analyzing lithology with advanced ML techniques, this study provided a more reliable and detailed understanding of the subsurface relations, significantly improving over previous methodologies.
Data Preparation and Machine Learning Methodology
The researchers investigated the use of ML algorithms to predict missing well log data and analyze lithological patterns in the Drava Basin, focusing on the Gola Field. The dataset comprised well log data from 20 wells. Various logs were available, including density, acoustic, neutron, gamma rays, and resistivity. Data preparation involved handling inconsistencies, duplicate measurements, and outliers, followed by normalization and standardization.
The authors employed 17 regression models to predict acoustic logs, using a combination of supervised learning models, such as LSTM neural networks, and unsupervised clustering algorithms to understand spatial distributions. The models were evaluated using performance metrics like mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R2).
Clustering algorithms, including K-means, Gaussian mixture, spectral clustering, agglomerative clustering, and mean shift, were used to group data based on patterns, with the number of clusters optimized using the silhouette method. The research confirmed the lithology patterns, identifying nine different sandstone bodies.
All analyses were conducted using Python, leveraging libraries like scikit-learn, pandas, NumPy, and lasio. The results demonstrated the potential of ML techniques in improving lithological characterization and filling gaps in well-log data, offering a more reliable understanding of subsurface relations in the Drava Basin.
Prediction and Clustering Performance Results
Baseline regression models were tested on a blind dataset from a well in the same field but furthest from the training wells. After parameter optimization, grid search cross-validation and Bayesian optimization were used to enhance model performance. The artificial neural network, particularly with LSTM layers, performed best on unseen data, achieving a 0.73 R2. This model showed high prediction accuracy across different parts of the Drava Basin, indicating its potential for generalized use.
Clustering algorithms identified lithological patterns, recognizing distinct sandstone properties. The LSTM model, built using TensorFlow, effectively handled missing data, predicting acoustic, density, and neutron logs with high correlation to measured values. Clustering methods, validated by well WH data, revealed optimal grouping of Neogene sediments, accurately distinguishing Pliocene deposits and various sandstone bodies. The blue cluster, associated with higher resistivities, indicated potential gas-bearing reservoirs.
Analysis and Interpretation
The study demonstrated that regression ML algorithms effectively predicted well logs, particularly acoustic logs, with over 80% correlation on test datasets. Tree-based algorithms, boosting algorithms, and neural networks, including LSTM, showed over 90% correlation. Blind data predictions were lower but still displayed significant trendlines. Neural networks, especially LSTM, performed best on unseen data, achieving up to 45% correlation for acoustic values.
Unsupervised clustering algorithms successfully identified lithological patterns with five to 12 clusters, with 10 clusters optimally representing sandstone bodies. The results indicated that ML models can accurately predict well-log values and lithological properties, aiding in interpreting subsurface patterns and regional relations in the Drava Basin.
Conclusion
In conclusion, the researchers confirmed the reliability of predictive regression algorithms in filling missing geophysical logging data in the Drava Super Basin, particularly Gola Field. Evaluating 17 ML algorithms, including tree-based, boosting, and LSTM neural networks, the researchers successfully predicted well logs and improved lithological characterization.
The artificial neural network, especially with LSTM layers, achieved the highest prediction accuracy on unseen data, demonstrating the potential for generalized use. Unsupervised clustering algorithms effectively distinguished lithological patterns, aiding in understanding subsurface relations.
The findings showed that ML models could accurately predict well-log values and lithological properties, providing a more reliable and detailed understanding of the Drava Basin's subsurface, especially in complex regions with sparse data. This study significantly advanced previous methodologies, offering improved subsurface characterization and interpretation.
Journal reference:
- Brcković et al., 2024Enhancing the Understanding of Subsurface Relations: Machine Learning Approaches for Well Data Analysis in the Drava Basin, Pannonian Super Basin. Applied Sciences, 14(14), 6039. DOI: 10.3390/app14146039, https://www.mdpi.com/2076-3417/14/14/6039
Article Revisions
- Aug 15 2024 - General improvements to readability and fixed broken journal paper link.