In an article published in the journal NPJ Computational Materials, researchers introduced a novel approach using frequency modulation atomic force microscopy (FM-AFM) with carbon monoxide (CO)-functionalized metal tips to achieve unprecedented resolution in imaging molecular structures. The proposed model, a conditional generative adversarial network (CGAN), analyzed constant-height AFM images to create detailed ball-and-stick depictions of molecules.
Background
AFM has emerged as a pivotal tool for nanoscale imaging and manipulation of materials and biological systems, especially in dynamic modes such as FM-AFM. The use of metal tips functionalized with a CO molecule at the apex enabled unprecedented resolution, revealing internal molecular structures. Previous attempts at molecular identification faced challenges in chemical discrimination and handling non-planar three-dimensional (3D) structures.
Existing works employed deep learning (DL), including convolutional neural networks (CNN), to interpret AFM images. However, they struggled with chemical identification and failed to generalize for diverse organic molecules. This paper introduced a groundbreaking approach using CGANs for molecular identification. By transforming constant-height high-resolution experimental (HR)-AFM image stacks into ball-and-stick depictions, the CGAN achieved comprehensive structural and compositional information.
Trained and tested on a large dataset, including simulations for 686,000 molecules, the CGAN exhibited remarkable generalization, addressing the limitations of previous methods. The researchers demonstrated the potential of this innovative technique in molecular identification, surpassing the challenges posed by diverse chemical compositions and complex 3D structures encountered in organic chemistry.
Methods
The authors utilized the quasar science resources-autonomous University of Madrid atomic force microscopy image dataset (QUAM-AFM), comprising simulations of theoretical AFM images for 686,000 molecules with various atomic species. Focused on quasi-planar molecules, it encompassed height variations up to 183 picometers along the z-axis. The dataset covered diverse AFM parameters, such as oscillation amplitudes, tip-sample distances, and CO-metal bond stiffness values, resulting in 165 million grayscale images. The AFM contrast was influenced by experimental parameters and CO attachment variations, adding complexity to the dataset.
The proposed molecular identification model employed a CGAN. The generator, processing a stack of 10 AFM images, underwent multiple 3D convolutions, followed by encoder and decoder blocks. The discriminator, handling concatenated input images, consisted of convolutional layers. The training dataset was divided into training, validation, and a substantial test set. During training, random combinations of AFM simulation parameters were chosen for each input stack, enhancing model generalization. Image data generator (IDG) techniques, including rotations and shifts, were applied to both AFM images and corresponding ball-and-stick depictions.
The loss functions involved mean absolute error (MAE) for the generator and binary cross-entropy for the discriminator. The training utilized batches of 32 inputs with the adaptive moment estimator (Adam) optimizer. The model underwent six epochs, displaying 300 validation set predictions every 10,000 iterations. The large test set facilitated a comprehensive quantitative analysis, demonstrating the model's ability to generalize and identify diverse organic molecules accurately. The study aimed to address the challenge of molecular identification from AFM images, showcasing the potential of the proposed CGAN-based approach.
Results
The researchers described the development and application of a CGAN for identifying molecules based on their AFM images. The CGAN was trained to generate ball-and-stick depictions of molecules from stacks of simulated AFM images.
The model was tested using theoretical AFM images from the QUAM-AFM dataset and achieved outstanding results in identifying molecular structures. The accuracy was quantified, and the model demonstrated its ability to recognize diverse chemical species and complex molecular configurations.
The CGAN was then applied to experimental AFM images obtained from various studies. Despite challenges such as limited input data and variations in AFM operation modes, the model successfully identified molecular structures in several cases, showcasing its robustness and potential for real-world applications. The study highlighted the model's capability to generalize from theoretical to experimental images, even when facing irregularities such as tip asymmetries.
While some challenges and limitations existed, the CGAN's performance with experimental data suggested its effectiveness in molecular identification through AFM images, offering a promising avenue for advancing the analysis of molecular structures in practical applications.
Discussion
The authors demonstrated the potential of using a CGAN for accurate chemical and structural identification of organic molecules based on HR-AFM images. The model translated constant-height HR-AFM image stacks into ball-and-stick molecular depictions, achieving generalization across diverse molecules. With training on theoretical images, it accurately identified molecular structures in both theoretical and experimental HR-AFM images, even when faced with incomplete information.
The model's ability to discern molecular configurations surpassed human experts, and its performance was attributed to the consistency of CNNs in image analysis, patch analysis by the discriminator, and an effective loss function. While challenges existed with highly corrugated structures, the limitations were attributed to current AFM setups, suggesting potential improvements with alternative operational modes. Future research could explore combining this approach with Bayesian inference and density functional theory calculations for extending molecular identification to highly corrugated structures.
Conclusion
In conclusion, the researchers presented a groundbreaking approach employing a CGAN for accurate molecular identification through HR-AFM images. Trained on a vast dataset, the CGAN showcased remarkable generalization, accurately identifying diverse organic molecules in both theoretical and experimental settings. The model's proficiency in handling complex 3D structures and diverse chemical compositions surpassed previous methods. Despite challenges, the CGAN demonstrated unprecedented capabilities, hinting at its potential for practical applications in nanoscale molecular analysis and suggesting avenues for future research in improving AFM methodologies.
Article Revisions
- Feb 2 2024 - Correction to journal name from Nature to NPJ Computational Materials.