Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms

In an article published in Scientific Reports, researchers from Iran proposed a novel workflow that uses an enhanced weighted average ensemble approach with error-correcting output code (ECOC) and cost-sensitive learning (CSL) techniques to produce high-resolution lithology logs from conventional well logs. They demonstrated that the developed workflow can accurately predict underground lithofacies and outperform commonly used machine learning (ML) algorithms. The research addresses the challenges of multiclass imbalanced data classification and scalability, which arise from the complex geological heterogeneities and the large volume of data.​​​​​​​

Study: Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms. Image credit: Nil Kulp/Shutterstock
Study: Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms. Image credit: Nil Kulp/Shutterstock

Background

Lithology logs are graphical representations of the subsurface rock types encountered during drilling operations. They provide valuable information for geologists and engineers to evaluate and correlate different formations. Well logs are measurements of the physical properties of subsurface rocks, such as gamma ray, neutron porosity, density, sonic, and photoelectric factor. These properties can reflect the changes in lithology, texture, and structure of the rocks, and thus can be used to gather the lithofacies or the rock units that share similar characteristics and depositional environments.

However, well logs can also be affected by other factors, such as salinity, fluid content, diagenesis, fractures, and clay composition, which can complicate and non-linearize the relationship between well logs and lithofacies. Moreover, the distribution of lithofacies can be highly imbalanced due to the natural heterogeneity of the subsurface, which poses a challenge for traditional ML algorithms that assume balanced data and focus on accuracy.

About the Research

The paper introduces a workflow that relies on an enhanced weighted average ensemble approach, which combines several baseline ML algorithms into a larger one with better performance and stability. Support vector machine (SVM), random forest (RF), decision tree (DT), logistic regression (LR), and extreme gradient boosting (XGBoost) are some widely used baseline algorithms in lithofacies classification.

This study addresses the challenge of multiclass imbalanced data by using two strategies: (1) decomposing the multiclass problem into binary subproblems using ECOC and (2) applying CSL to assign different weights and penalties to different classes according to their importance and rarity.

The authors used a dataset from a Middle Eastern oilfield, which consists of conventional well logs and lithology logs from five wells. They selected one well as a blind well and use the data from the other four wells to train and test the ML algorithms. The lithology logs are divided into seven classes: shale, limestone, argillaceous limestone, chalky limestone, cherty limestone, pyritic limestone, and shaly limestone. The dataset exhibits a significant imbalance among the classes, with shale being the most dominant and pyritic limestone being the rarest.

Methodologies Used

Researchers used the following three-step workflow:

Data preparation: They checked for missing values and outliers, encoded categorical features, removed unnecessary columns, standardized the data, applied linear discriminant analysis (LDA) as a noise reduction technique, and split the dataset into train, test, and blind verification sets.

Model training: The hyperparameters of the baseline algorithms were tuned using grid search and cross-validation, and then trained with different combinations of ECOC and CSL. Further, a voting ensemble classifier and an enhanced weighted average ensemble classifier were also trained.

Model evaluation: Two metrics were used to assess the performance of the models, mean F-measure and mean Kappa statistic, which are suitable for imbalanced multiclass data. The results of the models on both the test set and the blind set were compared to identify the optimal workflow that achieves the best performance.

Research Findings

The outcomes show that the enhanced weighted average ensemble of SVM and RF coupled with ECOC and CSL performs the best among all the models, with a mean F-measure of 91.04% and a mean Kappa statistic of 84.50% on the blind set. Moreover, the ECOC and CSL strategies were effective in handling multiclass imbalanced data, as they improved the performance of the baseline algorithms and the ensemble models.

Furthermore, the generated lithology log by the optimal workflow had a reasonable similarity to the original one and could accurately identify the minority classes, such as shale and pyritic limestone, which are often neglected by other models. This also provides a graphical comparative assessment of the generated lithology log and the original log and demonstrates the robustness and reliability of the optimal workflow.

The study has several applications for the geo-energy industry, as it provides a reliable and automated solution for generating high-resolution lithology logs from conventional well logs. It can help characterize and evaluate subsurface reservoirs and can also be applied to other fields and domains that deal with multiclass imbalanced data, such as image processing, medical diagnosis, and fraud detection.

Conclusion

In summary, the paper presents a versatile and robust workflow for lithofacies classification and generating high-resolution lithology logs from conventional well logs. This workflow can handle the challenges of multiclass imbalanced data and scalability. Moreover, workflow performance is checked by a real-time dataset from a Middle Eastern oilfield with complex and heterogeneous geological formations.

In the future, researchers suggest using other ML techniques, such as deep learning and transfer learning, for lithofacies classification tasks, and considering other sources of uncertainty and noise in the data.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2023, December 14). Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms. AZoAi. Retrieved on October 05, 2024 from https://www.azoai.com/news/20231212/Enhanced-Workflow-for-High-Resolution-Lithology-Logs-Using-Multiple-ML-Algorithms.aspx.

  • MLA

    Osama, Muhammad. "Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms". AZoAi. 05 October 2024. <https://www.azoai.com/news/20231212/Enhanced-Workflow-for-High-Resolution-Lithology-Logs-Using-Multiple-ML-Algorithms.aspx>.

  • Chicago

    Osama, Muhammad. "Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms". AZoAi. https://www.azoai.com/news/20231212/Enhanced-Workflow-for-High-Resolution-Lithology-Logs-Using-Multiple-ML-Algorithms.aspx. (accessed October 05, 2024).

  • Harvard

    Osama, Muhammad. 2023. Enhanced Workflow for High-Resolution Lithology Logs Using Multiple ML Algorithms. AZoAi, viewed 05 October 2024, https://www.azoai.com/news/20231212/Enhanced-Workflow-for-High-Resolution-Lithology-Logs-Using-Multiple-ML-Algorithms.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Optimizes Polymer Analysis