In a paper published in the journal Nature Communications, researchers investigated the potential of conversational generative pre-trained transformer (ChatGPT) advanced data analysis (ADA), an extension of GPT-4, to bridge the gap between machine learning (ML) developers and clinical practitioners.
They found that ChatGPT ADA autonomously developed ML models for clinical data analysis, achieving comparable or better performance than manually crafted models. It suggests that ChatGPT ADA could democratize ML in medicine by simplifying complex analyses while emphasizing the importance of specialized training and resources alongside its use.
Related Work
Past work has shown that ML drives advancements in artificial intelligence (AI) and holds great promise for transforming medical research and practice, particularly in areas like diagnosis and outcome prediction. The adoption of ML in analyzing clinical data has expanded rapidly, with established and evolving roles in various public health and medicine domains, including image analysis, clinical trial performance, and operational organization.
Despite the potential benefits, the complexity of developing, implementing, and validating ML models has limited their accessibility to most clinicians and medical researchers. Automated ML (AutoML) platforms aim to address this challenge by making ML accessible to non-technical experts. While existing AutoML platforms have demonstrated feasibility and utility in medicine, they typically require users to provide instructions in a non-natural language format, such as through dedicated interfaces. However, converting natural language commands into Python code for ML model creation has yet to be widely implemented.
Methodological Overview: ChatGPT ADA Study
The methods section outlined the ethical considerations, stating that the research adhered to relevant guidelines and that the ethical committee of the Medical Faculty of Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University approved it. Researchers described the patient cohorts, specifying the sources of datasets from public repositories and providing details on each included study, such as the endocrinologic oncology study on metastatic disease prediction and the esophageal cancer study on screening.
Next, the experimental design was detailed, starting with extracting original training and test datasets from each clinical trial. Next, researchers detailed the experimental design, beginning with extracting original training and test datasets from each clinical trial. They sequentially prompted ChatGPT ADA to develop ML models using natural language commands based on the individual study's framework.
The version of ChatGPT used and the absence of memory retention bias was noted by starting new chat sessions for each trial. Researchers explained the data pre-processing and ML model development in two phases. Firstly, ChatGPT ADA autonomously developed ML models, and their performance metrics were calculated based on ground-truth labels from the original studies. Secondly, a data scientist re-implemented and optimized the best-performing ML models, termed "benchmark validatory re-implementation," using the same training datasets. Researchers provided detailed trial-specific methodologies for data pre-processing and ML model development, covering missing data handling, feature engineering, and classifier selection.
Lastly, researchers described the explainability analysis using Shapley additive explanations (SHAP) and the reproducibility analysis to ensure consistency in ChatGPT ADA's responses. Statistical analysis involved calculating performance metrics and bootstrapping for comparisons between ChatGPT ADA-based and re-implemented models. Researchers set the family-wise alpha threshold at 0.05 and adjusted for multiple comparisons. This section was comprehensive, detailing the methodological steps undertaken to evaluate the performance and validity of ChatGPT ADA in developing ML models for clinical data analysis.
Analysis of ChatGPT ADA
Across four diverse clinical-trial datasets, ChatGPT ADA independently developed and executed advanced ML techniques for disease screening and prediction. Its performance was comparable to the hand-crafted and customized ML methods re-implemented based on the original studies. An exemplary interaction with ChatGPT ADA, detailing prompts and responses for autonomous prediction.
After summarizing each clinical trial and associated dataset, researchers compared the ML methods head-to-head for each trial. It involved evaluating the performance metrics of ML methods developed and executed by ChatGPT ADA against those reported in the original studies and the validatory ML methods re-implemented by a seasoned data scientist. The comparison revealed similar performance metrics between ChatGPT ADA-based models and the published initial and re-implemented models.
Each clinical trial underwent a detailed analysis of the ML methods employed. For instance, ChatGPT ADA selected a gradient boosting machine (GBM) model in the endocrinologic oncology study on metastatic disease prediction, achieving slightly improved performance metrics compared to the best-performing published counterpart. Researchers observed similar trends in gastrointestinal oncology, otolaryngology, and cardiology studies.
Researchers conducted explainability analysis using SHAP and reproducibility to ensure transparency and reliability. ChatGPT ADA autonomously performed the SHAP analysis, providing insights into the top 10 predictive features influencing the ML models' predictions. Additionally, reproducibility analysis confirmed the consistency of ChatGPT ADA's responses across separate chat sessions, reinforcing the reliability of its performance. Overall, the comprehensive analysis conducted in this study demonstrates the efficacy and reliability of ChatGPT ADA in developing ML models for clinical data analysis while maintaining transparency and reproducibility throughout the process.
Conclusion
In conclusion, the analysis demonstrates ChatGPT ADA's effectiveness and reliability in developing ML models for clinical data analysis. Across diverse clinical trial datasets, ChatGPT ADA autonomously formulated advanced ML techniques for disease screening and prediction. The performance of ChatGPT ADA-aligned models closely matched hand-crafted methods from original studies and re-implemented models by a data scientist.
Additionally, explainability analysis using SHAP and reproducibility measures ensured transparency and reliability in ChatGPT ADA's predictions. This study underscores ChatGPT ADA's potential as a valuable tool in healthcare research, offering insights while maintaining methodological rigor and transparency.