In a recent publication in the journal npj Computational Materials, researchers introduced a framework that combines a deep learning model with an ensemble learning model, the Light Gradient Boosting Machine. This innovative approach facilitates swift and precise screening of organic photovoltaic molecules, effectively establishing the intricate relationship between molecular structure, properties, and device efficiency.
Background
Organic semiconducting materials are prized for their synthetic versatility, allowing precise control over characteristics such as energy level, bandgap, and carrier mobility. Their adaptability makes them appealing for uses like organic solar cells (OSCs), which have unique benefits over their inorganic counterparts, including lightweight, flexible, and semi-transparency.
Yet, finding the right organic molecules within the vast realm of compounds remains a challenging and time-consuming task. While density functional theory (DFT) calculations provide insights into electronic properties without the need for synthesis, there is a lack of an efficient model to directly compute PCEs from molecular properties. Additionally, despite the cost-saving aspects of DFT, the time involved hampers its use in high-throughput screening.
The need for a quantitative structure-property relationship (QSPR) model for rapid organic compound screening is evident. Machine learning, with its capacity to decipher complex relationships in extensive datasets, has made significant contributions to materials research. Recent applications in the OSC field have shown promise for high-throughput screening. While earlier models had limitations, the integration of molecular and microscopic properties has improved accuracy and met screening demands.
Nonetheless, including costly calculations, particularly for excited states, remains a bottleneck. Therefore, developing a precise, easily accessible machine learning model is essential.
Model architecture and analysis
Researchers have introduced an automated framework for rapid PCE prediction in OSCs. It begins with training an ensemble learning model using a small dataset of high-quality experimental data to predict PCEs based on molecular properties. Subsequently, a deep learning model employing a graph neural network (GNN) model and a dataset with a plethora of molecular structures and properties accurately predicts molecular properties. These two models collectively form the foundation of a framework that directly predicts PCEs based on molecular structure.
Quantum Chemical Calculations: The optimization of the ground state structures for all molecules was carried out using the DFT method (BP86) with the def2svp basis set. Subsequently, the energy levels of the molecules were determined through calculations based on the optimized geometries. Furthermore, the energy associated with the electronic transition to the lowest-lying triplet state was computed utilizing the 6–311 g(d) basis set.
The proposed framework combines two models: the Property Model and the Efficiency Model.
Property Model: Graph neural networks (GNNs) were used to create a Property Model to address the problem of predicting molecular properties from molecular structures. Graphs are created using molecular structures, with nodes standing in for atoms and edges for chemical bonds. Three matrices described node properties, edge properties, and adjacency relationships. The graph convolution layer captured structural influences, with the fully connected (FC) layers providing the final property predictions.
Efficiency Model: The Efficiency Model is an ensemble learning model to forecast power conversion efficiencies (PCEs) based on the physicochemical properties of molecules. Researchers employ common machine learning models, such as support vector machines (SVM), random forest (RF), gradient-boosting decision trees (GBDT), and light gradient-boosting machines (LightGBM). These models were trained on the dataset to establish connections between molecular properties and device PCEs. Leave-one-out validation was utilized to assess generalization and mitigate overfitting. Grid searches were employed to optimize hyperparameters. Comparing the models, GBDT and LightGBM demonstrated superior performance. LightGBM, possessing excellent generalization abilities, was chosen for the Efficiency Model.
Dataset: The dataset comprises 440 pairs of small molecules and fullerene molecules, along with their respective PCEs, sourced from published literature. To bolster the Property Model via transfer learning, 200,000 data points from the Clean Energy Project Database (CEPDB) were incorporated for pre-training. For a rigorous evaluation, the dataset was divided into training and test sets, ensuring compatibility between the distributions of PCEs in both sets. Nine properties were thoughtfully selected as learning features for the Efficiency Model. These selections factored in availability, relevance to the target property (PCE), and existing knowledge from the literature.
The Property Model and Efficiency Model, combining GNNs and LightGBM, were validated using 40 unseen data points. First, molecular structures were used as inputs for the Property Model, which predicted physical and chemical properties. These properties were subsequently fed into the Efficiency Model to forecast PCEs. Results demonstrated accurate predictions, highlighting the efficiency and generalization capabilities of the framework.
To assess the framework's screening capabilities, 375 molecules of various configurations were designed, with PCEs predicted in minutes. Four novel molecules were synthesized and tested experimentally. The predicted PCEs closely matched the experimental values, demonstrating the model's accuracy and rapid screening potential.
Conclusion
In summary, researchers developed a machine learning-based approach to predict the PCEs of OSCs without relying on DFT calculations. The proposed framework consists of two key components: the Efficiency Model and the Property Model. It offers direct and precise evaluations of the chemical structures of organic photovoltaic molecules, all while expediting predictions by avoiding density functional theory calculations. This research signifies an efficacious method for advancing new organic optoelectronic materials.