HEX tailors machine learning explanations to match human decision-making preferences, boosting trust and reliability in high-stakes scenarios.
HEX: Human-in-the-loop explainability via deep reinforcement learning
In a paper published in the journal Decision Support Systems, Michael T. Lash, an assistant professor in the Analytics, Information, and Operations Management area at the University of Kansas School of Business, addressed the challenges of using machine learning (ML) models in high-stakes decision-making, where humans ultimately bore responsibility for outcomes.
They introduced HEX, a human-in-the-loop explainability (HITL) deep reinforcement learning (DRL) approach for machine learning explainability (MLX), which generated individualized decider-specific explanations based on user-specific preferred features.
HEX enhanced explanation reliability by incorporating a novel 0-distrust projection mechanism and explicitly considering the ML model's decision boundary. Through a randomized, controlled experiment, empirical evaluations showed that HEX consistently outperformed competing methods in human-in-the-loop scenarios, increasing trust and reliance on these user-preferred features.
Related Work
In contrast to previous efforts, past work on MLX primarily focused on providing model predictions alongside explanations, but few explicitly incorporated the human element in decision-making.
Challenges included the lack of methods that customized explanations to a decision-maker's specific preferences and the reliance on surrogate models, which often failed to capture the true decision boundary of the original model.
Moreover, many existing approaches produce overly computationally complex explanations, which make them difficult for human decision-makers to interpret or apply effectively in high-stakes situations.
Human-in-the-loop MLX
The proposed method introduces a human-in-the-loop MLX framework for decision-making in continuous state and action spaces. The environment, state, and action space are mathematically defined, with vector addition as the transition function. A novel reward function is introduced based on identifying explanatory points on the decision boundary.
The reward function encourages solutions close to the decision boundary while penalizing those further away. Importantly, this reward function significantly improves solution quality by rewarding the discovery of decision boundaries and ensuring more accurate policy learning.
To implement this, the authors propose using an actor-critic reinforcement learning framework. The actor generates actions, while the critic assesses their quality based on the Q function.
Two variations are suggested: a single critic (SC) and a double critic (DC) system, each tailored to MLX. The double critic approach, inspired by previous works, prevents overestimation of rewards and incorporates target network soft updates for improved policy learning stability. The actor's parameters are optimized using policy gradient methods, ensuring the learned policy is accurate and reliable.
A crucial component of the proposed method is incorporating a human into the decision loop through a 0-distrust projection, ensuring the explanations align with the decision-makers preferences.
The technique allows decision-makers to trust predictions by providing explanations based on these critical features.
The explanation-decider disagreement score measures how well explanations match the decision-maker's expectations, ensuring that predictions are understandable and actionable for humans in high-stakes environments.
Finally, the authors address the problem of buffer and policy degeneracy. If the buffer contains poor-quality examples, the policy learned from these examples will also be suboptimal.
To mitigate this, the authors propose selective buffering, where only high-reward examples are added to the buffer, preventing the policy from degrading over time. This selective buffering strategy ensures that the policy remains effective and that the learned explanations remain meaningful throughout the learning process.
Empirical Evaluation Summary
The proposed method was empirically evaluated using five real-world datasets covering various decision—making domains: Bank, Medical Information Mart for Intensive Care (MIMIC), Movie, News, and Student.
Five common ML models (logistic regression, support vector machine (SVM), neural networks, decision trees (DT), and random forest (RF)) were applied.
The proposed explainability methods (HEX-SC, HEX-DC, SC, and DC) were compared to two model-agnostic methods, Local interpretable model-agnostic explanations (LIME) and growing spheres (Grow).
The first set of experiments, which did not involve a human-in-the-loop (HITL) decider, showed that the SC methods, particularly HEX-SC, outperformed others by yielding the lowest average decision boundary deviance (DBD) on most models.
The DC methods were comparable but less successful regarding statistically significant wins. LIME and Grow performed poorly in terms of overall metrics.
The HEX methods again excelled in the second set of experiments, incorporating a HITL decider sensitive to specific features, even when constrained to produce explanations with limited features. These experimental results further highlight the advantage of the HEX approach, especially when the decider's preferences are accounted for.
Additionally, the laboratory experiment aimed to assess how humans rated explanations provided by the proposed methods. In this study, participants evaluated explanations generated using the HEX-SC method against explanations from non-HITL methods.
Results indicated that explanations from HEX-SC were preferred, trusted, and made more sense to the participants.
The study also demonstrated that participants' ratings varied depending on the scenario and whether the explanations aligned with their feature preferences, with positive classifications consistently receiving higher ratings.
These findings suggest that incorporating human preferences into explainability models significantly improves user satisfaction with the explanations.
Conclusion
To summarize, HEX, a HITL model-agnostic classification explainability method based on DRL, was proposed. Through empirical and laboratory evaluations, HEX-SC and HEX-DC were compared with out-of-the-box DRL methods, LIME, and Growing Spheres.
The methods consistently outperformed the baselines and demonstrated the benefits of incorporating a decider in the explanation process. Future research may explore multi-agent system extensions, human-in-the-loop image explanations, and follow-up action scenarios.