In an article recently published in the journal Npj Precision Oncology, researchers reviewed the potential of multimodal models and large language models (LLMs) in precision oncology.
Background
In oncology, the patient-specific data volume is expanding rapidly due to the advancements in medical imaging, large-scale genomic analyses' integration into clinical routines, and extensive utilization of electronic health records (EHRs). Effectively using this vast quantity of data is crucial for providing optimal treatment to cancer patients.
Artificial intelligence (AI) has witnessed rapid technological progress since 2022, which has significant implications for cancer and oncology research. AI and machine learning can assist healthcare professionals in processing huge amounts of data in oncology. Currently, LLMs can perform text processing at human-level competency.
Additionally, image and text processing networks increasingly leverage transformer neural networks. This convergence can facilitate multimodal AI model development. These models can simultaneously take different data types as input, which indicates a qualitative shift from the specialized niche models common in the 2010s.
This paper reviews the recent innovations in AI, specifically in LLMs and multimodal models, which will impact precision oncology in the future.
LLMs for precision oncology
LLMs are primarily deep learning (DL) models that generate and process text-based data. These models are trained using training data that consists of a diverse and large amount of text, including diverse medical data types. Recently, the most effective models have depended on transformer-based architectures due to their attention mechanisms.
LLMs can also be used for new tasks without any explicit training, which is known as a zero-shot application. These models have been investigated for different applications in healthcare. Different approaches, like training these models only using medical data, can be employed to apply LLMs to medical problems. For instance, Bio-BERT, one of the first LLMs in the medical domain, demonstrated good capabilities for understanding biomedical text.
Similarly, Med-PaLM, which was developed by fine-tuning Google's LLM "PALM" on medical training data, showed superior performance in medical use cases compared to its earlier version. Med-PaLM 2, the subsequent iteration of Med-PaLM, scored 86.5% in the United States Medical Licensing Exam (USMLE).
Although USMLE question solving is of limited use for practical applications, fine-tuned LLMs have also solved practical/real-world problems like clinical outcome prediction based on only unstructured text data present in EHRs. Generalist LLMs can be used for medical tasks without fine-tuning using only a detailed input prompt. Retrieval augmented generation is another alternative in which the domain knowledge is provided in a machine-readable format to a trained LLM.
Multimodal models
Current LLMs are mostly transformer neural networks, which are suitable for all data types and enable multimodality. Multimodal AI systems can interpret multiple data types together, and their validation and development require collaboration between several disciplines, including technology experts in hardware and software and medical experts in specialties like medicine or surgery and diagnostic specialties like pathology or radiology.
These multimodal systems have been investigated for different precision oncology applications, such as outcome predictions. Models pre-trained on diverse and large tasks and then applied to only specialized tasks are known as foundation models. These models decrease the data requirements for different specialized tasks, such as disease prediction from retinal photographs.
Similarly, foundation models lessen the need for laborious and time-consuming manual annotation while maintaining human-level accuracies and surpassing the performance of supervised methods by linking chest X-ray images to corresponding report text data.
These models can be deployed as chatbot assistants aiding diagnosis interactively in clinical practice, while in pathology, the linking of large image datasets with case-specific information and contextual knowledge can yield high performance in both biomarker prediction and disease detection.
Moreover, early generalist models displayed high performance consistently across several medical tasks and domains by integrating knowledge from multiple domains. Recent advancements in open-sourced models can enable de novo model development and training at significantly lower computational and financial burdens.
Existing challenges
Although the expanding capabilities of foundation models make them suitable for potential applications in cancer and oncology research like drug discovery and multimodal diagnostics, overcoming several existing challenges is necessary to unlock the full potential of these models. For instance, the data used for training the model has to be assessed for diversity, quantity, and quality.
Similarly, the design of systems integrating the foundation models must be guided by computer science experts, patient advocates, medical professionals, and the wider scientific community. Moreover, integrating these models into operable clinical software systems faces both regulatory and legal challenges as these models must obtain approval as medical devices. The lack of interpretability of AI models is another significant challenge. Although model explainability has been realized to a greater extent for image-related tasks, explainability issues remain in multimodal or text-processing tasks in medicine.
To summarize, advances in multimodal models and LLMs can potentially impact precision oncology through different applications. However, more scientific evidence is required to ensure that they provide quantifiable benefits in oncology.
Journal reference:
- Truhn, D., Eckardt, J., Ferber, D., Kather, J. N. (2024). Large language models and multimodal foundation models for precision oncology. Npj Precision Oncology, 8(1), 1-4. https://doi.org/10.1038/s41698-024-00573-2, https://www.nature.com/articles/s41698-024-00573-2