AI Foundation Models Reshape Bioinformatics, Unlocking Genomic and Drug Discovery Advances

Scientists unveil how cutting-edge AI foundation models are revolutionizing bioinformatics—enhancing molecular research, accelerating drug discovery, and tackling fundamental biological challenges.

Research: Foundation models in bioinformatics. National Science Review. Image Credit: ZinetroN / ShutterstockResearch: Foundation models in bioinformatics. National Science Review. Image Credit: ZinetroN / Shutterstock

A study conducted by Fei Guo, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, and led by Prof. Jianxin Wang (School of Computer Science and Engineering, Central South University, in collaboration with researchers from Jilin University, Old Dominion University, Tongji University, Tsinghua University, and The Hong Kong University of Science and Technology) has identified recent advancements in bioinformatics foundation models (FMs) that are applied across a range of downstream tasks, including genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Their objective is to assist scientists in selecting suitable FMs for bioinformatics based on four model categories: language FMs, vision FMs, graph FMs, and multimodal FMs. Beyond enhancing our understanding of molecular landscapes, AI technology can provide theoretical and practical foundations for ongoing innovation in molecular biology. The study also highlights key challenges, including benchmarking, interpretability, and hallucination detection, which influence the effectiveness of these models in real-world applications.

Lab Director Jianxin Wang provided an analysis of bioinformatics FMs that can be trained using both supervised and unsupervised learning techniques for applications addressing fundamental biological challenges as well as integrated biological issues. They highlighted recent advancements in bioinformatics foundation models, emphasizing their versatility as essential tools in the field. Among these, models such as DNABERT (for genomic sequence analysis), ProteinBERT (for proteomics applications), and scGPT (for single-cell analysis) have demonstrated significant advancements in AI-driven biological research.

The team provided a comprehensive summary of several prominent foundation models utilized to enhance the understanding of high-throughput biological data. An in-depth discussion on applying prediction and generation models across various downstream tasks within bioinformatics followed this. Their discourse emphasized key aspects such as biological databases, training strategies, hyperparameter configurations, pre-training frameworks, fine-tuning strategies, and the evaluation of model performance. They also explored how AI models can be optimized for specific bioinformatics challenges, such as genome-wide variant effects, RNA modification detection, and protein function prediction.

They possess a comprehensive understanding of how the revised model effectively addresses the limitations and shortcomings of the primary model by elucidating the evolutionary process of bioinformatics feature mapping. For instance, researchers reviewed how newer iterations of models, such as AlphaFold3, have improved upon earlier versions like AlphaFold1 and AlphaFold2 by incorporating multimodal learning techniques and diffusion-based structural prediction. "Taking advantage of the latest bioinformatics FM, one can achieve unprecedented accuracy, realize an integrated AI model, and perform richer downstream analysis," Prof. Wang says.

"Taking the classic biological problem 'protein three-dimensional structure reconstruction' as a representative demonstration, DeepMind has developed three iterations of an artificial intelligence system over the past five years," Prof. Wang says. AlphaFold1 introduced deep residual convolutional blocks, while AlphaFold2 incorporated EvoFormer for improved sequence alignment, and AlphaFold3 further enhanced accuracy by integrating small molecules and nucleic acids into structural predictions.

The researchers articulated their insights regarding the promising trajectory of bioinformatics FMs. They drew upon their experiences with model pre-training frameworks, the selection of benchmarking methods, white-box approaches, interpretability, and the evaluation of model hallucinations. They emphasized the importance of future developments in AI-driven biological research, including the role of large-scale pre-training paradigms, knowledge graph integration, and contrastive learning techniques to enhance model explainability.

Source:
Journal reference:

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.