In a paper published in the journal Scientific Reports, researchers examined human interaction with artificial intelligence (AI) generated advice in personnel selection through five experiments involving 1403 participants. They presented both accurate and incorrect advice and manipulated factors such as the source of advice and its explainability.
Despite expecting accurate and explainable advice to enhance decision-making, findings revealed that incorrect advice consistently led to performance decrements, indicating overreliance. Interestingly, both the source of advice and its explainability significantly affected participants' reliance on inaccurate guidance. This highlights the intricate nature of human-AI interaction and underscores the need for regulatory standards in human resource management (HRM).
Background
Prior studies note the increasing use of AI in HRM, particularly in resume screening for personnel selection. However, the impact of AI-generated advice on decision-making and the efficacy of explainability measures still need to be determined. Previous research indicates that people heavily rely on advice, influenced by factors such as its source and accuracy. While there is a preference for human advice, AI advice can also be appreciated in certain contexts. The accuracy and explainability of advice are crucial, yet findings on the effectiveness of explanations vary.
Experiment Execution Guide
The study, preregistered and approved by the research ethics committee of the University of Regensburg, involved five online experiments focusing on personnel selection. Participants, recruited via university mailing lists and professional networking platforms, assumed the role of recruiters tasked with identifying suitable candidates for a position. Resumes designed specifically for the study were presented, with either human or AI sources or no advice influencing participants' decisions.
Additionally, the explainability of AI advice varied across experiments, employing methods such as heatmaps or charts. Participants rated the suitability of each candidate, the quality of advice, and their confidence in decision-making.
Dependent variables included performance, advice quality rating, and confidence rating. The team assessed performance by evaluating the correctness of decisions, measured advice quality by participants' perceived usefulness and trust, and participants self-reported their confidence in decision-making.
Statistical analyses involved one-way analysis of variance (ANOVAs), mixed-effects regressions, and logistic/linear regressions to compare performance and analyze advice quality and confidence ratings. Covariates such as attitude towards AI and professional experience were controlled for in the experiments, which were conducted in German, with materials and data publicly accessible for further review.
Experiment 1 Findings
In Experiment 1a, involving university students, participants were tasked with personnel selection decisions under varying advice conditions. The study employed a mixed-factors design, examining the impact of advice accuracy (correct vs. incorrect) and source (human vs. AI) on decision-making performance. Results indicated that participants' performance improved when provided with proper advice, regardless of source.
However, reliance on incorrect advice was evident, leading to lower performance rates and indicating overreliance on inaccurate guidance. While human advice elicited slightly better performance and higher confidence ratings, controlling for covariates diminished the effect. Interestingly, participants often failed to discern the quality of incorrect advice, rating it similarly to correct advice and maintaining higher confidence even when it was wrong.
Experiment 1b replicated these findings with HRM employees, demonstrating consistent results across participant groups with varying levels of expertise. Accuracy of advice significantly influenced decision-making performance, advice quality ratings, and confidence levels, corroborating the importance of accurate guidance.
Notably, HRM professionals exhibited similar patterns of following correct and incorrect advice, suggesting reduced discrimination between advice sources compared to less experienced participants. These findings underscore the critical role of advice accuracy in personnel selection tasks and highlight the need for cautious reliance on AI-generated guidance, irrespective of the source.
AI Advice Explainability
In a series of experiments (2a, 2b, and 2c), the focus shifted towards exploring the impact of explainability on the efficacy of AI advice in personnel selection tasks. Experiment 2a, involving full-time and part-time students, investigated the effectiveness of visual explanations, specifically saliency heat maps, in aiding participants in identifying incorrect AI advice. Despite participants' better performance when provided with explainable advice, there was no significant difference in their decision-making compared to receiving no advice.
Experiment 2b replicated these findings with unlimited review time, indicating that even with more time, the complexity of visualized explanations did not aid participants in recognizing incorrect advice. Experiment 2c introduced a simplified visualization method using bar graphs to represent the degree of suitability for each criterion, yet it also failed to impact decision-making significantly.
Surprisingly, participants expressed higher confidence in their decisions when provided with explanations for incorrect advice, suggesting a nuanced relationship between explainability and decision confidence. Overall, while the quality of correct advice was perceived as higher with explainable AI advice, none of the experiments demonstrated a substantial improvement in decision-making with explainable advice, challenging the notion that visualized explanations effectively prevent overreliance on incorrect AI guidance.
Conclusion
To sum up, this study underscored the critical role of algorithm accuracy in AI-enabled decision support systems for personnel selection. Despite efforts to enhance explainability, participants continued to rely heavily on inaccurate advice. It highlighted the complexity of human-AI interaction and underscored the necessity for robust regulations and quality standards. Future research should explore alternative approaches to presenting AI advice to mitigate blind reliance.