In a paper published in the journal Nature Medicine, researchers explored the emergence of artificial intelligence (AI) driven chatbots such as Chat Generative Pre-trained Transformer (ChatGPT) in healthcare. They acknowledged concerns about their accuracy in providing information and highlighted inconsistent reporting in studies that assessed these chatbots, leading to the Chatbot Assessment Reporting Tool (CHART) proposal—a collaborative effort to create reporting standards. Engaging a diverse group of stakeholders, CHART sought to develop a checklist and flow diagram, intending to foster transparent reporting to assist researchers, clinicians, and patients in navigating chatbot assessment studies.
Background
AI-based chatbots, such as ChatGPT, increasingly utilize large language models (LLMs) and natural language processing (NLP) to enable user interactions through text generation. With their growing application in healthcare, concerns about their accuracy in providing clinical advice have surfaced, given the reliance of physicians and patients on internet-based health advice.
The inconsistent reporting in studies evaluating these chatbots' performance in delivering accurate clinical guidance has prompted a collaborative effort. A diverse international group of stakeholders aims to develop CHART to establish structured reporting standards, addressing the lack of transparency in this evolving field and aiming to benefit researchers, clinicians, and patients alike.
Building CHART for Chatbot Assessment
The proposed methodology revolves around developing a reporting guideline, CHART, addressing the inconsistency in assessing AI-driven chatbots' accuracy in providing clinical advice. This collaborative effort, involving a diverse international group, aims to establish structured reporting standards through several key phases.
Firstly, a diverse range of stakeholders will gather to form a robust foundation, including statisticians, methodologists, developers, NLP researchers, journal editors, and patient partners. The involvement of such multidisciplinary expertise ensures a holistic approach to developing CHART. The methodology proceeds with rigorous planning, adhering to established methodologies and evidence-based guidelines. Utilizing the EQUATOR international network, known for Enhancing the QUAlity and Transparency Of health Research, this initiative ensures alignment with best practices in developing reporting guidelines.
The subsequent step involves a meticulous review of existing literature. This comprehensive analysis aims to identify pertinent reporting guidelines and relevant studies evaluating chatbot assessments. This phase lays the groundwork for informed decision-making and guideline development. After the literature review, stakeholders will conduct a Delphi consensus. This consensus-building process actively involves public engagement to ensure consideration of diverse perspectives.
Through this iterative approach, consensus meetings will focus on refining and finalizing the reporting checklist and flow diagram for CHART. Ultimately, the proposed methodology culminates in the establishment of structured reporting standards. This initiative strives to benefit researchers, clinicians, and patients navigating this evolving landscape by encouraging transparent reporting among studies evaluating LLM-linked chatbots in healthcare.
Enhancing Chatbot Assessment Reporting
The analysis will involve a comprehensive review of existing literature to identify relevant reporting guidelines and studies assessing chatbot performance. Stakeholders, spanning various disciplines like statistics, research methodology, natural language processing, and journal editing, will collectively contribute to CHART's development.
The methodology will strictly adhere to robust approaches, aligning with established methodologies and the evidence-based EQUATOR toolkit for guideline development. A critical phase involves a Delphi consensus among stakeholders, preceded by a literature review. This consensus-building process will integrate public engagement, ensuring diverse viewpoints contribute to refining the reporting checklist and flow diagram. After reaching a consensus, stakeholders will conduct focused synchronous meetings to finalize the structured reporting standards for CHART.
The envisioned outcome is to encourage transparent reporting in studies evaluating LLM-linked chatbots' abilities in summarizing health evidence and offering clinical advice. This concerted effort aims to benefit researchers, clinicians, and patients navigating this evolving domain of chatbot assessment studies.
Conclusion and Future Work
To sum up, the initiative to develop CHART for evaluating LLM-linked chatbots marks a crucial step towards establishing structured reporting standards. With the collaborative efforts of diverse stakeholders, rigorous methodologies, and inclusive consensus-building processes, this endeavor aims to enhance transparency in assessing chatbot performance in healthcare.
The envisioned outcome is to empower researchers, clinicians, and patients with reliable guidelines for navigating the expanding landscape of chatbot assessment studies, fostering a more informed and trustworthy approach to leveraging AI-driven technologies for clinical advice.
Future work will focus on implementing and refining the CHART guidelines in actual research contexts. It will involve piloting the reporting checklist and flow diagram across diverse studies evaluating LLM-linked chatbots' performance in clinical settings.
Additionally, ongoing iterations based on feedback from stakeholders and users will be pivotal in enhancing the applicability and effectiveness of CHART. Continuous engagement with the evolving landscape of AI technology and healthcare will ensure that CHART remains adaptable and responsive to emerging advancements, ultimately contributing to the evolution of transparent and reliable reporting standards in this domain.