In an article recently submitted to the arxiv* server, researchers introduced Farsight, an interactive tool addressing challenges in identifying potential harms during prompt-based prototyping of artificial intelligence (AI) applications.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Farsight assisted users by highlighting relevant news articles about AI incidents, enabling exploration and editing of large language model (LLM)-generated use cases, stakeholders, and associated harms. Co-designed with AI prototypers, the tool enhanced users' ability to independently identify potential harms, making it more useful and usable than existing resources, fostering a focus on end-users and considering broader impacts.
Background
As AI becomes increasingly ingrained in daily life, ensuring responsible and safe integration is paramount. Previous efforts in responsible AI primarily targeted machine learning (ML) experts, neglecting the evolving landscape introduced by LLMs and prompt-based interfaces. The surge in user roles, extending beyond traditional ML experts to include designers, writers, lawyers, and everyday users in AI prompt-based prototyping interfaces, presented unique challenges. Existing responsible AI practices struggled to adapt to the diverse and expansive capabilities of LLMs, leaving a gap in anticipating and mitigating potential harms associated with these technologies.
To address these challenges, this paper introduced Farsight, an innovative in situ interactive tool specifically designed for AI prototypers. Farsight facilitated the identification of potential harms in LLM-powered applications during the early prototyping stage. Leveraging novel techniques and interactive system designs, Farsight enhanced responsible AI awareness among diverse AI prototypers, providing a user-friendly interface for envisioning harms associated with their prompt-based AI applications. The tool employed a progressive disclosure design, utilized embedding similarities to surface relevant news articles about AI incidents, and employed node-link diagrams for interactive visualization.
Through empirical findings from co-design and evaluation studies, Farsight proved effective in aiding AI prototypers in independently identifying potential harms, encouraging a shift in focus from AI models to end-users, and providing an open-source, web-based implementation for broader accessibility. This work addressed the gaps in existing responsible AI practices by offering a tailored solution for the evolving landscape of AI development, encompassing a wider spectrum of user roles and the complexities introduced by LLMs.
Formative Study and Design Goals
The researchers undertook a formative co-design study to understand the needs and challenges AI prototypers face in envisioning potential harm in their applications. Through interviews and prototype evaluations with 10 AI prototypers, the study shed light on the varying awareness and consideration of harm during the prototyping process. Participants without expertise in responsible AI often did not anticipate harm, while those with expertise tended to do so. The authors captured user feedback and generated valuable design insights, emphasizing the importance of guiding users in imagining use cases, assisting in understanding and organizing harms, fitting seamlessly into current workflows, promoting user engagement, and ensuring an open-source and adaptable implementation.
The five identified design goals drove the subsequent development of Farsight. The goals included guiding users in brainstorming use cases, helping them understand and prioritize harms, seamlessly integrating into existing workflows, promoting user engagement through compelling examples, and ensuring an open-source, adaptable implementation to accommodate the dynamic landscape of LLMs and prompt-crafting tools. The researchers provided a foundation for Farsight's development, addressing the unique challenges posed by the diverse backgrounds of AI prototypers and the evolving capabilities of AI technologies.
User Interface
Farsight operated as a plugin for web-based prompt-crafting tools, featuring three main components: the alert symbol, awareness sidebar, and harm envisioner. The alert symbol provided an always-on display indicating the alert level of a user's prompt, which is determined by assessing potential harms based on similarity to AI incident reports. The awareness sidebar expanded upon user interaction, presenting relevant AI incident headlines and generated use cases.
The harm envisioner allowed active user engagement by visualizing and editing AI-generated harms associated with prompts. Farsight was designed with a model-agnostic, environment-agnostic implementation, ensuring adaptability and open-source accessibility for integration into various AI prototyping interfaces, demonstrated through a Chrome extension and Python package.
User Study
The user study assessed Farsight involving 42 participants with diverse roles using Farsight, Farsight Lite, or envisioning guide. The study aimed to answer three research questions on harm identification, tool effectiveness, and challenges in envisioning harm. Farsight significantly increased harm identification, offering valuable AI-generated suggestions, and outperformed Farsight Lite. Challenges included ambiguous prompts. The study utilized a mixed-methods approach, combining quantitative metrics and qualitative analyses, revealing Farsight's positive impact on users' ability to envision harms associated with AI applications. The tools influenced participants' approaches, with Farsight promoting specific use case focus and envisioning guide providing a structured harm taxonomy.
Participants reported shifts in their envisioning approaches, with Farsight inspiring brainstorming and thinking beyond immediate harm. Users perceived Farsight and Farsight Lite as more usable and helpful than Envisioning Guide, emphasizing their practical value in AI application prototyping. However, varied content quality and the lack of actionability were noted as limitations. The study design had limitations: recruiting participants solely from a technology company and conducting a single post-task evaluation. Overall, Farsight demonstrated efficacy in addressing challenges and enhancing harm envisioning, highlighting areas for improvement and suggesting the need for broader participant inclusion and extended evaluation periods for a comprehensive understanding of its impact.
Conclusion
In conclusion, Farsight emerged as an innovative solution, effectively addressing challenges in anticipating and mitigating potential harms associated with AI applications during early prototyping. Developed through a formative co-design study, Farsight's user-friendly interface empowered diverse AI prototypers, guiding them in harm envisioning and fostering responsible AI awareness. A comprehensive user study validated The tool's success, demonstrating its significant impact on harm identification and user preferences. Despite some limitations, Farsight's positive outcomes underscored its valuable contribution to enhancing responsible AI development, emphasizing adaptability, user engagement, and practical usability.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Wang, Z. J., Kulkarni, C., Wilcox, L., Terry, M., & Madaio, M. (2024, February 23). Farsight: Fostering Responsible AI Awareness During AI Application Prototyping. ArXiv.org. https://doi.org/10.1145/3613904.3642335, https://arxiv.org/abs/2402.15350