In a paper published in the journal Nature Chemical Engineering, researchers highlighted the broad applications of protein engineering but noted its slow, labor-intensive nature. They introduced the self-driving autonomous machines for protein landscape exploration (SAMPLE) platform, which operated autonomously in the past—guided by an intelligent agent. SAMPLE designed and tested proteins with a fully automated robotic system.
In a specific application, four SAMPLE agents aimed to enhance the thermal tolerance of glycoside hydrolase (GH) enzymes. Despite individual differences, all agents quickly converged on developing thermostable enzymes. The self-driving laboratories, exemplified by SAMPLE, represented a significant advancement in automating scientific discovery in protein engineering and synthetic biology.
Related Work
Past approaches in biological engineering involved a discovery-driven process led by human researchers characterized by hypothesis generation, wet laboratory experiments, and iterative cycles. Despite notable achievements, this method could be more efficient.
Recent advancements introduce robot scientists and self-driving laboratories, leveraging automated learning and reasoning to accelerate scientific discovery. These intelligent systems outperform humans in tasks such as learning from diverse data sources and continuous operation. While these are promising for protein engineering and synthetic biology, challenges persist due to the complex nature of biological phenotypes, high-dimensional genomic search spaces, and the intricate, error-prone steps involved in biological experiments. Although some automated workflows exist in synthetic biology, achieving full autonomy remains a hurdle.
Research Methodologies Overview
Researchers compiled a cytochrome P450 dataset to benchmark modeling and Bayesian optimization (BO) methods. It comprised 518 data points with binary active/inactive information and thermostability measurements. The multi-output Gaussian process (GP) model was tested through 10-fold cross-validation, utilizing a linear hamming kernel with an additive noise term. Four distinct BO methods—random, upper confidence bound (UCB), UCB positive, and expected UCB—were evaluated using 10,000 simulated protein engineering trials. Additionally, batch methods that select multiple sequences each round were developed and tested for their impact on performance at different batch sizes.
Researchers actively designed a sequence space for GH family 1 (GH1) by combining natural, Rosetta-designed, and evolution-designed sequence elements. This approach involved creating new sequences through golden gate cloning. They reverse-translated deoxyribonucleic acid (DNA) constructs and actively cloned gene fragments into the ptwist amp high copy vector. Researchers actively implemented an automated protein testing pipeline on a tecan liquid-handling system and the strateos cloud lab for gene assembly, expression, and characterization. Thermostability assays using T50 measurements were employed to assess GH1 enzyme stability.
The researchers conducted bacterial protein expression and purification to characterize the top-designed human enzymes. They used golden gate cloning for gene assembly in this process and actively purified the enzymes through nickel nitrilotriacetic acid (Ni-NTA) agarose column chromatography. Researchers conducted thermostability assays and Michaelis–Melen kinetic assays to assess enzyme performance. They actively assayed the purified enzymes along an eight-point dilution series of the substrate 4-methylumbelliferyl-β-D-glucopyranoside. They used the Michaelis–Menten equation to determine the enzyme kcat and Michaelis constant (KM).
Introduction to the SAMPLE System
The study aimed to develop a fully autonomous system for protein engineering inspired by the iterative learning and decision-making processes of human researchers in a laboratory setting. The system, named SAMPLE, utilizes an intelligent agent that autonomously learns, makes decisions, and takes actions in a laboratory environment, actively exploring protein sequence-function relationships and engineering proteins.
The SAMPLE system employs BO techniques to navigate the protein fitness landscape efficiently. Researchers utilize the GP model to understand the fitness landscape from limited experimental observations. The multi-output GP model demonstrates excellent predictive ability, achieving an 83% accuracy in classifying active/inactive sequences and predicting thermostability with an R-value of 0.84. Researchers adapted a UCB algorithm for protein engineering, and two heuristic BO methods, 'UCB positive' and 'expected UCB,' show improved efficiency in discovering thermostable proteins.
Researchers implemented a streamlined and robust experimental pipeline for automated gene assembly, cell-free protein expression, and biochemical characterization. The system focuses on GH enzymes, and the computerized procedure demonstrates reliability in measuring enzyme thermostability with minimal error.
The study introduces combinatorial sequence spaces, leveraging exponential scaling to broadly sample protein fitness landscapes. The design of the GH1 combinatorial sequence space actively incorporates natural, Rosetta-designed, and evolution-designed sequence elements. The space comprises 1,352 unique GH1 sequences, providing diversity across the protein structure.
Researchers deployed SAMPLE on the Stratos cloud lab, and four independent agents explored the GH1 landscape autonomously. Each agent discovers thermostable sequences while searching a small fraction of the combinatorial landscape. The agents' trajectories and landscape ascent vary, demonstrating the system's efficiency and adaptability in navigating diverse sequence spaces.
Researchers characterize the machine-designed proteins using human protocols. The top sequences discovered by each agent are expressed in Escherichia coli and lysate-based thermostability assays, which reveal substantial improvements in thermostability compared to the top natural sequence. The study highlights the successful integration of autonomous machine-driven protein engineering with subsequent human validation.
Conclusion
In conclusion, the SAMPLE system represents a ground-breaking step in autonomous protein engineering. By leveraging intelligent agents and BO, the system autonomously explores protein sequence-function landscapes, discovering thermostable enzymes. The experimental validation of machine-designed proteins highlights the platform's effectiveness in surpassing natural counterparts. This innovative approach not only streamlines protein engineering but also holds immense promise for accelerating the discovery of proteins with desired properties.
Article Revisions
- Jul 16 2024 - Minor edits to punctuation and fixed broken journal link.