In an article recently published in the Journal of Human Genetics, researchers reviewed the amalgamation of tabular-to-image conversion with deep learning (DL) methods, specifically convolutional neural networks (CNNs), for omics analysis to improve predictive modeling in precision medicine.
Background
Advancements in high-throughput sequencing have led to a data explosion in omics. Although this data abundance offers significant predictive modeling opportunities in precision medicine, it also presents substantial challenges in data interpretation and analysis.
Conventional machine learning (ML) methods have partially succeeded in predictive model generation for omics analysis. However, they cannot effectively handle potential relationships in the data to make more accurate predictions. In this paper, the authors reviewed promising solutions and synergies from recent advances in DL analysis and their potential applications in biomedical research.
DeepInsight and DeepFeature
Transformation methods like DeepInsight can be used to convert omics data with independent variables in tabular form into image-like representations, enabling CNNs to effectively capture latent features. This approach improves the predictive power, and reduces computational time by leveraging transfer learning, resulting in an enhanced overall performance.
The transformation process comprises several steps, including placing the elements/genes on the Cartesian coordinates using kernel PCA, UMAP, or t-SNE manifold methods; employing the convex hull algorithm to identify the smallest rectangle encapsulating the feature spread, which is followed by rotation for aligning with the vertical and horizontal axes; conversion of the Cartesian coordinates to a pixel framework; and finally, mapping the elements/gene expression values on their corresponding positions in this pixel framework.
The similarity between factors of interest such as genes is represented by the closeness of their spatial positions during the conversion process, ensuring that elements possessing similar characteristics are located adjacent to each other, while elements with dissimilar characteristics are distant. The transformation generates image-like representations equivalent to the original feature vectors. These images are used as input for CNNs in the predictive modeling as they are suitable for analysis by CNNs.
Moreover, DeepInsight also allows using pre-trained CNN models, which are extremely effective in image analysis. This approach is highly beneficial as it capitalizes on the capabilities of pre-existing CNN architectures and provides accelerated insights, as no model has to be trained from the beginning.
The DeepInsight's performance was evaluated in multiple scenarios, such as in cancer-type prediction, where the method displayed a better performance compared to several ML models. The DeepInsight-introduced analytical capabilities are further complemented by DeepFeature, which focuses on the issue of interpretability, specifically in the DL model context.
DeepFeature highlights and extracts the critical features that influence the decisions of a model, such as prediction, using a class activation map (CAM). Thus, DeepFeature can identify the key elements or genes that are pivotal for determining specific disease manifestations or phenotypic outcomes.
DeepInsight-3D and pathway analysis
DeepInsight-3D, an extension of DeepInsight specifically tailored for multi-omic analyses, has been developed to address the heterogeneous data modality issue across multiomics. By adapting to multi-omic data, DeepInsight-3D can integrate information across various omic types into a three-dimensional (3D) unified space. The 3D representation can capture the synergistic interactions among several types of omic data, facilitating a comprehensive understanding and providing a better context for analysis.
DeepInsight-3D has demonstrated a 7-29% improvement in performance based on model AUC-ROC when the method was compared with several neural network architectures, three recently developed drug response prediction pipelines, support vector machine (SVM)-based classifier, and optimized random forest pipeline. Anti-cancer drug response prediction is one of DeepInsight-3D's applications in oncology.
In a study leveraging DeepInsight-3D, multi-omics data of gene expression, copy number alterations, and gene mutations were used as input to make a drug efficacy prediction model. The data point mapping was determined from the DeepInsight expression data, and the copy number alterations and mutations were positioned to the gene positions, with multiple colors based on their levels.
Patient-derived xenografts (PDX) and The Cancer Genome Atlas (TCGA) datasets were utilized for CNN testing and learning, respectively. The proposed DeepInsight-3D-based approach achieved 72% accuracy, outperforming other DL-based methods by over 7%.
Pathway analysis after DeepFeature extraction can decipher the biological significance of influential features. For instance, identifying pathways related to drug sensitivity or resistance in the anti-cancer drug response context can assist in understanding potential therapeutic strategies and molecular targets.
Many pathways involved in several drug responses, including protein degradation and recycling, Rho GTPase, JAK/STAT, PI3K/AKT, and STAT3, have been identified by DeepInsight-3D. Pathway analysis can also find new pathways, such as clathrin-dependent endocytosis and tryptophan metabolism.
Overall, DeepInsight and its derivatives, such as DeepInsight-3D, represent an enormous leap in the progression of analytical strategies with the continued expansion of the boundaries of genomics. However, several challenges still exist despite integrating tabular-to-image conversion with CNNs for omics analysis, including data heterogeneity and size and model interpretability. A multidisciplinary approach involving medical doctors, biologists, bioinformatics researchers, and ML experts is required to effectively address these challenges.
Journal reference:
- Sharma, A., Lysenko, A., Jia, S., Boroevich, K. A., Tsunoda, T. (2024). Advances in AI and machine learning for predictive medicine. Journal of Human Genetics, 1-11. https://doi.org/10.1038/s10038-024-01231-y, https://www.nature.com/articles/s10038-024-01231-y