Tree-Based ML Predicts 2D Material Impurity Energy

In an article published in the journal Machine Learning Science and Technology, researchers explored the use of tree-based machine learning (ML) algorithms to predict the formation energy of impurities in two-dimensional (2D) materials. They employed various regression models and integrated chemical and structural features, including Jacobi–Legendre polynomials, to enhance prediction accuracy.

Study: Tree-Based ML Predicts 2D Material Impurity Energy. Image Credit: Collagery/Shutterstock.com
Study: Tree-Based ML Predicts 2D Material Impurity Energy. Image Credit: Collagery/Shutterstock.com

The authors found that including structural features improved the prediction accuracy, with better results for adsorbates compared to interstitial defects. This approach reduced computational costs while providing valuable insights into impurity properties.

Background

Nano-structured materials, particularly impurity structures in 2D materials, are pivotal in various applications like optical quantum technologies, energy storage, and sensing. Understanding the formation energy of these impurities is crucial, as it indicates the stability and feasibility of the nano-structure. Traditional methods like density functional theory (DFT) for calculating formation energy are time-consuming, highlighting the need for more efficient approaches.

Recent advances in ML offer a promising alternative. ML algorithms, including decision tree regression and gradient boosting methods, have shown potential in predicting material properties by leveraging physics-inspired descriptors. Despite these advancements, existing methods often lack integration of both chemical and structural features, which limits prediction accuracy and interpretability.

This paper addressed these gaps by employing ML techniques to predict the formation energy of impurities in 2D materials using a novel combination of chemical and structural features derived from Jacobi–Legendre polynomials. This approach enhanced prediction accuracy and reduced computational costs, offering a valuable tool for materials science research.

Methodological Approach and Data Analysis

The methodology for predicting the formation energy of impurities in 2D materials involved several key steps.

Data preprocessing: Data from the impurities in 2D materials (IMP2D) database, containing 14,662 samples of adsorbate and interstitial impurities in 44 host materials, was used. Each sample's formation energy was computed using DFT with the Perdew, Burke, and Ernzerhof (PBE) exchange-correlation functional. Due to its computational efficiency and consistency, PBE was chosen despite its limitations. Samples with outlier formation energies or convergence issues were filtered, leaving 5,906 samples for analysis.

Feature creation: Features were divided into chemical and structural types. Chemical features included properties like atomic radius and electronegativity, while structural features were derived from Jacobi–Legendre polynomials, capturing the spatial arrangement of atoms around impurities.

ML models: The authors employed tree-based ML algorithms: Random forest (RF) and various gradient boosting methods, namely, gradient boosting regression, histogram-based gradient boosting regression, and light gradient boosting machine(LightGBM). These models used decision trees to predict formation energy based on the prepared features. LightGBM, with its advanced techniques like gradient-based one-side sampling (GOSS), was particularly noted for its efficiency in handling large datasets.

Computational details: Data was split into training, test, and blind-test sets. The blind-test set included samples from molybdenum disulfide (MoS2) and tungsten diselenide (W2Se4) hosts, not used during model training. Models were optimized using cross-validation, and their performance was assessed with metrics such as coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE).

Results and Discussion

The features were categorized into three sets, chemical features only, both chemical and structural features, and a subset of chemical features excluding certain parameters. Chemical features, such as chemical potential and electronegativity, were expected to provide valuable insights due to their link with formation energy and structural stability. Structural features were incorporated to enhance model performance but removing some features (e.g., hostenergy/atom) did not significantly impact accuracy.

The ML models, including RF, gradient boosting regression, histogram gradient boosting regression, and LightGBM, were evaluated for their prediction accuracy using various metrics, such as RMSE, MAE, and R2. Combining chemical and structural features improved the models' performance, particularly in predicting the formation energy of adsorbates and interstitial defects. The results showed that LightGBM provided faster training times with comparable prediction accuracy to other models. Comparisons of RMSE scores and prediction times across different models demonstrated the efficiency and robustness of LightGBM for this task. 

Conclusion

In conclusion, the researchers utilized tree-based ML algorithms to predict the formation energy of impurities in 2D materials, integrating chemical and structural features, including Jacobi–Legendre polynomials. They found that incorporating structural features improved accuracy, especially for adsorbates compared to interstitial defects.

The authors highlighted that while LightGBM provided the fastest training times with competitive prediction accuracy, overall predictions were effective without needing host-specific features. This approach reduced computational costs and offered valuable insights into impurity properties, showcasing the potential of combining physically meaningful features with ML for accurate predictions in materials science. Future work could explore additional properties and advanced feature integrations.

Journal reference:
  • Aniwat Kesorn, et al. (2024). Formation energy prediction of neutral single-atom impurities in 2D materials using tree-based machine learning. Machine Learning Science and Technology5(3), 035039–035039. DOI: 10.1088/2632-2153/ad66ae, ‌https://iopscience.iop.org/article/10.1088/2632-2153/ad66ae
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, August 13). Tree-Based ML Predicts 2D Material Impurity Energy. AZoAi. Retrieved on October 22, 2024 from https://www.azoai.com/news/20240813/Tree-Based-ML-Predicts-2D-Material-Impurity-Energy.aspx.

  • MLA

    Nandi, Soham. "Tree-Based ML Predicts 2D Material Impurity Energy". AZoAi. 22 October 2024. <https://www.azoai.com/news/20240813/Tree-Based-ML-Predicts-2D-Material-Impurity-Energy.aspx>.

  • Chicago

    Nandi, Soham. "Tree-Based ML Predicts 2D Material Impurity Energy". AZoAi. https://www.azoai.com/news/20240813/Tree-Based-ML-Predicts-2D-Material-Impurity-Energy.aspx. (accessed October 22, 2024).

  • Harvard

    Nandi, Soham. 2024. Tree-Based ML Predicts 2D Material Impurity Energy. AZoAi, viewed 22 October 2024, https://www.azoai.com/news/20240813/Tree-Based-ML-Predicts-2D-Material-Impurity-Energy.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Boost Machine Learning Trust With HEX's Human-in-the-Loop Explainability