In an article recently published in the journal Npj Digital Medicine, researchers proposed a medical multimodal large language model (Med-MLLM) requiring little labeled data as an effective and accurate artificial intelligence (AI)-based decision-support tool for rare diseases/new pandemics.
Background
Integrating deep neural networks (DNNs) into the clinical decision procedure can improve the diagnosis efficiency. In the diagnosis process, DNNs can assist physicians and alleviate their heavy workload. The performance of DNNs primarily depends on the quality and volume of labeled data, as most DNNs are supervised.
However, the clinical data labeling process is time-consuming and costly. Additionally, collecting and labeling sufficient data for rare diseases, such as coronavirus disease 2019 (COVID-19), in a timely manner for training a deep learning (DL) model is difficult, which delays the quick deployment of DL models required to combat those diseases promptly.
The proposed approach
In this study, researchers proposed a Med-MLLM framework for radiograph representation learning, which can learn extensive medical knowledge, such as clinical phenotypes, text semantics, and image understanding, primarily from unlabeled data.
The framework can be deployed easily and rapidly in situations where labeled data are scarce to ensure a rapid response to rare diseases. Med-MLLM can learn comprehensive thorax knowledge using multimodal medical data across textual and visual modalities.
Med-MLLM used unlabeled medical image data from the available public image datasets and contrastive learning to perform image-only pre-training to learn visual characteristics for capturing diagnostic information in medical images, unlabeled medical text data from existing public text datasets and self-supervised LLM to perform text-only pre-training to learn clinical findings and text semantics in medical texts, and an existing large knowledge base/Unified Medical Language System (UMLS) and contrastive learning to perform image-text pre-training to unify the knowledge learned from unpaired texts and images for accurately capturing clinical presentations and disease phenotypes.
In the framework, an image encoder was pre-trained with two types of losses, including image-level contrastive loss and patient-level contrastive learning loss, for medical images such as computed tomography (CT) and chest X-rays (CXR)/visual data, a text encoder was pre-trained with three types of losses, including findings-impression alignment loss, sentence reconstruction loss, and masked language modeling loss, for medical texts such as clinical notes and medical reports, and a soft image-text alignment loss was introduced to pre-train the text encoder and visual encoder for unpaired radiology reports and images.
Researchers fine-tuned the Med-MLLM for downstream COVID-19 decision-support tasks/COVID-19 reporting, diagnosis, and prognosis. The image encoder and an additional text decoder were used to fine-tune the pre-trained Med-MLLM on the COVID-19 reporting task, and a classification layer was added on the text and/or image encoder output and a binary cross-entropy loss was used to fine-tune the Med-MLLM for COVID-19 diagnosis task. The fine-tuning strategy used in the COVID-19 diagnosis task was also utilized for the COVID-19 prognosis task.
This approach enabled Med-MLLM to handle multi-modal, textual, and visual input. Researchers evaluated the effectiveness of the proposed framework using the COVID-19 pandemic. They applied the model to COVID-19 reporting/medical report generation, diagnosis/disease classification, and prognosis/survival prediction tasks using limited labels during training/1% labeled training data.
The framework performance was evaluated and compared with the performance of several existing methods on five COVID-19 datasets, including COVID-HCH dataset, BIMCV-COVID-19 dataset, COVID-19 CT dataset, COVID-CXR dataset, and COVIDx-CXR-2 dataset, across different modalities, regions, and languages, including Chinese, Spanish, and English.
Researchers performed both retrospective and prospective studies during model evaluation. In the retrospective setting, the pre-trained model was evaluated on the early COVID-19 datasets, while in the prospective setting, the model was assessed on the new variant COVID-19-Omicron based on its predictions for Omicron. Researchers also investigated the performance of the proposed Med-MLLM on tuberculosis and 14 other common thorax diseases to evaluate its scalability.
Significance of the study
The results of the COVID-19 reporting, diagnosis, and prognosis tasks demonstrated the effectiveness of the proposed framework. In both retrospective and prospective studies, the Med-MLLM framework trained using 1% training data demonstrated a competitive or better performance compared to the previous models/self-supervised learning and contrastive learning methods trained using 100% training data in all three tasks. Additionally, the proposed model realized the best performance among all models in retrospective and prospective studies across different regions and languages when trained using 100% training data in all tasks.
The proposed model also demonstrated a good performance when evaluated for tuberculosis and 14 other common thorax diseases. Med-MLLM trained using 1% labeled data showed competitive or better performance compared to the previous methods trained using 100% labeled training data and the best performance among all methods when trained using 100% labeled data, which indicated the scalability/generalization capabilities of the proposed approach. These results indicated the feasibility of training the proposed model using limited labels to promptly combat rare diseases/future pandemics, effectively reducing the current dependence on annotations.
Journal reference:
- Liu, F., Zhu, T., Wu, X., Yang, B., You, C., Wang, C., Lu, L., Liu, Z., Zheng, Y., Sun, X., Yang, Y., Clifton, L., Clifton, D. A. (2023). A medical multimodal large language model for future pandemics. Npj Digital Medicine, 6(1), 1-15. https://doi.org/10.1038/s41746-023-00952-2, https://www.nature.com/articles/s41746-023-00952-2