In a paper published in the journal Scientific Reports, researchers addressed biotic stress in maize caused by pathogens. They used machine learning and deep learning algorithms like Naive Bayes (NB), K-Nearest Neighbor (KNN), Ensemble, Support Vector Machine (SVM), and Decision Tree (DT) alongside feature selection. Their novel Bidirectional Long Short-Term Memory (BiLSTM)-based deep learning model outperformed other methods.
The study singled out significant genes, including (S)-beta-macrocarpene synthase and chitinase chem 5, that exhibited distinct upregulation in response to biotic stress. It also highlighted the role of these methods in various applications, including plant disease diagnosis, irrigation timing determination, gene classification, and fruit and vegetable cultivar classification in agriculture. These findings provide valuable insights for the development of disease-resistant maize varieties.
Background
Cereals account for approximately half of the world's food supply. The maize crop originated in Mexico and is now one of the most prominent cereals globally and follows wheat and rice in terms of significance. Maize has contributed a staggering 1 billion metric tons of food to the global supply since 2013. Maize, also known as corn, is vital in sustaining the growing human population. It serves as both a direct source of dietary products and an indirect resource as livestock feed. Additionally, maize holds promise as a renewable source for bioethanol production, thus offering a sustainable alternative to fossil fuels.
Related Work
Several studies have reported using artificial intelligence (AI) in genomics. Gene expression classification is performed using DeepChrome, which utilizes a Convolutional Neural Network (CNN)-based architecture. This innovative approach autonomously learned intricate relationships between histone modification marks, although its effectiveness may be influenced by data quality limitations.
Many methods use various feature extraction methods and machine learning techniques to identify vital Pentatricopeptide Repeat (PPR)-encoding genes. However, it is important to note that the effectiveness of these approaches may vary depending on data quality and diversity. Other methods used the TIMgo application to predict gene expression in rice Transfer Deoxyribonucleic Acid (T-DNA) mutants through the use of SVM and feature selection. However, it is worth considering its generalizability to other plant species. These studies showcase the potential of AI in genomics while highlighting data-related challenges and species-specific nuances.
Proposed Method
Data Preprocessing and Merging: The researchers initiated the analysis by performing data preprocessing on raw gene expression files. This involved quantile normalization and background correction using the Robust Multichip Average (RMA) method from the affy Bioconductor package. The normalization process was crucial for eliminating technical variations in experimental conditions, especially when merging datasets from multiple experiments. The batch-specific bias variations were addressed using the ComBat function from the SVA R package, which employs an empirical Bayes method. Both RMA and ComBat functions were applied with default parameters as defined in their respective packages.
AI-Based Gene Recognition: The data underwent min-max normalization to enhance the effectiveness of AI techniques with the varying upper and lower limits of maize gene features. Subsequently, the researchers compared the performance of deep learning methods and was evaluated maize gene recognition under biotic stress conditions.
Machine Learning-Based Gene Recognition: Traditional machine learning methods like SVM, KNN, DT, NB, and Ensemble, rely on robust feature selection and extraction to achieve effective classification or regression. These methods require the selection of the most discriminative features for successful application. In this study, gene characteristics were selected according to the criteria suitable for these ML methods. The selection of these distinguishing features was a critical step in gene recognition. This was because an excessive number of features could introduce noise and hinder learning algorithms. Feature selection was carried out using the Relief feature selection algorithm, which helps reduce the size of training samples and streamlines the computational complexity of classification algorithms.
RNN-Based BiLSTM Network: This paper also explored deep learning techniques such as Recurrent Neural Networks (RNNs) and BiLSTM networks. The ability to model complex data relationships in temporal data was a recognized strength of RNNs and LSTM. LSTM addressed the vanishing gradient problem, making it suitable for long-time series data. The study presented the architecture of the BiLSTM network that featured two BiLSTM layers with dropout to prevent overfitting. The network aimed to extract powerful features from gene data and employed Fully Connected (FC) layers and Softmax layers for classification. Parameters for network training were determined through experimentation, with the Adam Optimization algorithm used for weight optimization during training.
Study Results
The BiLSTM model demonstrated superior performance with a remarkable accuracy of 92.86%, supported by high sensitivity, precision, and F1-score values, and highlighted its effectiveness in classification. Furthermore, the analysis identified key differentially expressed genes associated with plant defense mechanisms and stress responses. Overall, this research underscores the potential of these computational approaches in accurately characterizing maize gene expression data by offering valuable insights for genomics research and applications.
Conclusion
In summary, this study used a combination of meta-analysis, machine learning, and deep learning techniques to classify maize samples based on their gene expression profiles during both control and biotic stress conditions. The results highlighted the superior performance of the BiLSTM model in accurately categorizing these samples. Additionally, the research identified several key genes associated with biotic stress responses by shedding light on potential genetic factors influencing maize's defense mechanisms.
Journal reference:
- Nazari, L., et al. (2023). Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress. Scientific Reports, 13:1, 15899. DOI:10.1038/s41598-023-42984-4, https://www.nature.com/articles/s41598-023-42984-4.
Article Revisions
- Jun 24 2024 - Fixed broken journal paper URL