In an article published in the Journal of Imaging, researchers explored the implications of the European Union (EU) Artificial Intelligence Act (AIA) on high-risk AI systems, focusing on decision-support systems with human control. They addressed the balance between accountability and potential errors introduced by human-computer interaction. The paper delved into requirements for high-risk AI systems under the AIA, emphasizing opportunities and limitations in decision support systems, and highlighted the significance of increasing system explainability.
Background
The EU AIA represents a recent regulatory framework for AI systems, employing a risk-based approach to categorize and address varying levels of risk. Despite recent amendments removing the direct classification of DeepFakes as high-risk applications, the AIA emphasized human oversight and control in high-risk AI contexts. This paper navigated the evolving landscape by extending the system causability scale (SCS) to suit the demands of DeepFake detection. While previous research has extensively focused on AI methods and decision performance, the communication of AI results to operators in decision support systems remains underexplored.
This study undertook a literature review of AI regulatory documents, identifying 15 requirements for high-risk AI systems. The authors delved into the implications of these requirements for DeepFake detection, addressing a gap in existing literature. Additionally, they discussed the challenges and possibilities of explainability in AI applications, considering diverse audiences. By incorporating operators' perspectives and proposing an adapted evaluation scheme, this paper contributed to the design and evaluation of high-risk AI systems, especially in the context of DeepFake detection within forensic applications.
The State of the Art
The state of the art in DeepFake detection for forensic applications was presented, addressing the evolving landscape of AI regulations, particularly the EU AIA. The paper emphasized the critical need for DeepFake detection, a rapidly growing field of research since 2017, exceeding 1000 publications. Existing and upcoming regulations, including the AIA, were discussed, with a focus on their implications for authenticity verification in court, especially in media forensics. The authors highlighted the German situation, referencing the guidelines provided by the Federal Office for Information Security (BSI) and the data-centric examination approach (DCEA) for media forensics.
Concerning AI usage, regulations such as the federal rules of evidence in the U.S. and the evolving AIA categorized AI systems based on risk levels, impacting transparency obligations and requirements. Various documents on "Trustworthy AI," "Responsible AI," and "Auditable AI" proposed guidelines, with the BSI and UNESCO emphasizing ethical aspects. Notably, the AIA's influence on DeepFake detection, initially considered high-risk, was explored, especially in law enforcement and biometric identification contexts.
Explainability in AI, a crucial requirement, was discussed with varying terminologies, including explainable AI (XAI). Different perspectives, such as Interpol's distinction between narrow and broad explainability and the four principles put forth by the National Institute of Standards and Technologies (NIST), provided insights into the complexity of achieving meaningful explanations. The SCS was introduced for qualitative evaluation, addressing the user's perspective.
The authors delved into the AI lifecycle, specifically the training of DeepFake detection algorithms, highlighting certification and benchmarking requirements.
Derived Requirements for DeepFake Detection
The authors outlined 15 key requirements relevant to DeepFake detection within the context of the EU AIA. These requirements were derived from the assessment list for trustworthy AI (ALTAI) and were categorized into groups such as data protection, reliability, accountability, transparency, fairness, and decision interpretability. The definitions provided in ALTAI, which aligned with the terminology in the AIA, served as the basis for these requirements.
The identified requirements covered crucial aspects such as privacy protection, algorithm auditability, and explainability, user interface design, accuracy, fairness, and adherence to legal frameworks. The relevance and applicability of these requirements were then validated through a comprehensive projection on to existing documents discussing AI system usage, regulations, and ethical guidelines. The comparison emphasized the evolving nature of requirements over time, considering categories like information technology (IT) systems, algorithms, user interface usability, decisions, and legal framework conditions.
Navigating Challenges
The researchers discussed the challenges associated with implementing requirements for DeepFake detection, particularly in the context of high-risk AI systems. The focus was on explainability and human-in-the-loop (HITL) aspects, which are crucial for integrating effective DeepFake detectors. The distinction between strong AI (independent decision-making) and weak AI (human-involved decisions) was explored, emphasizing the need for methods like quantitative and qualitative explanations. Quantitative methods provided insights into algorithmic decision processes, while qualitative methods involved human input in the decision-making and training processes, aiming to enhance data quality assurance.
The authors introduced the concept of HITL and discussed potential pitfalls such as confirmation bias and misinterpretations. Roles and actors in the context of forensic investigations for DeepFake detection were identified, including forensic experts, the person affected, data scientists, and legal representatives. The relevance of different categories of requirements, such as IT systems, algorithms, usability, and decision explainability, was mapped to these distinct roles.
Qualitative methods for assessing the conformity of AI systems with identified criteria were introduced. The SCS was presented, containing 10 questions for user feedback and quality estimation of AI system explanations. The questions, categorized and designed for user-friendly responses, served as a means to gauge the explainability of AI systems from various perspectives. The authors concluded by linking explanation properties, such as "meaningful," "explanation accuracy," and "knowledge limits," to the identified requirements and the SCS, providing a comprehensive approach to evaluate the performance and adherence of DeepFake detection systems.
Conclusion
In conclusion, the authors highlighted the critical role of human oversight and qualitative feedback in high-risk AI applications like DeepFake detection, particularly in forensic scenarios. Emphasizing the need for efficient communication between AI systems and domain experts, they advocated for integrating qualitative feedback methods in system design and operational processes.
Specialized training for users was crucial for effective human oversight. The challenge was in providing meaningful explanations to those affected by AI decisions, necessitating further exploration. The paper underscored the importance of early consideration of human-AI interaction in system design, anticipating the forthcoming EU AIA's impact on AI application usage and decision explainability.
Journal reference:
- Siegel, D., Kraetzer, C., Seidlitz, S., & Dittmann, J. (2024). Media Forensic Considerations of the Usage of Artificial Intelligence Using the Example of DeepFake Detection. Journal of Imaging, 10(2), 46. https://doi.org/10.3390/jimaging10020046, https://www.mdpi.com/2313-433X/10/2/46