Cutting-edge algorithm leverages both noisy and ground truth data to assess expert accuracy, offering groundbreaking solutions for diverse fields like healthcare, audits, and beyond.
Research: A Machine Learning Framework for Assessing Experts’ Decision Quality. Image Credit: Mikko Lemola / Shutterstock
Years ago, as she sat in waiting rooms, Maytal Saar-Tsechansky began to wonder how people chose a good doctor when they had no way of knowing a doctor's track record on accurate diagnoses. Talking to other patients, she found they sometimes based choices on a physician's personality or even the quality of their office furniture.
"I realized all these signals people are using are just not the right ones," says Saar-Tsechansky, professor of information, risk, and operations management at Texas McCombs. "We were operating in complete darkness, like there's no transparency on these things."
In new research, she uses artificial intelligence to judge the judges: to evaluate the rates at which experts make successful decisions. Her team tested the algorithm using three distinct datasets: sales tax audits, IMDb movie reviews, and spam detection cases, showing its ability to adapt to diverse decision contexts. The machine learning algorithm, called MDE-HYB, can appraise both doctors and other kinds of experts—such as engineers who diagnose mechanical problems—when their success rates are not publicly available or not scrutinized beyond small groups of peers.
Saar-Tsechansky says prior research has studied the accuracy of doctors' diagnoses but not in ways that can be scaled up or monitored on an ongoing basis.
She adds that more effective methods are vital today when medical systems are deploying AI to help with diagnoses. It will be challenging to determine whether AI is helping or hurting successful diagnoses if observers can't tell how successful a doctor was without AI assistance.
Evaluating the Experts
Saar-Tsechansky and McCombs doctoral students Wanxue Dong and Tomer Geva of Tel Aviv University in Israel created an algorithm called MDE-HYB. It integrates two forms of information: overall data about the quality of an expert's past decisions and more detailed evaluations of specific cases.
They then compared MDE-HYB's results with those of other evaluators: three alternative algorithms and 40 human reviewers. To test the flexibility of MDE-HYB's ratings, datasets with varying levels of predictability—such as movie reviews with an 80% prediction accuracy and tax audits with lower predictive power—were analyzed.
In each case, evaluators judged prior decisions made by experts about the data, such as whether they accurately classified movie reviews as positive or negative. For all three sets, MDE-HYB equaled or bested all challengers.
Compared to other algorithms, its error rates were 95% lower. Compared to humans, they were up to 72% lower.
The researchers also tested MDE-HYB on Saar-Tsechansky's original concern: selecting a doctor based on the doctor's history of correct diagnoses. MDE-HYB dropped the average misdiagnosis rate by 41% compared to doctors chosen by another algorithm.
In real-world use, such a difference could translate to better patient outcomes and lower costs, she says. However, the study noted that MDE-HYB may face challenges when ground truth labels are extremely costly or scarce, potentially limiting its scalability in some high-stakes domains.
She cautions that MDE-HYB needs more work before it can be used in such practical applications. "The main purpose of this paper was to get this idea out there, to get people to think about it, and hopefully, people will improve this method," she says.
However, she hopes it can one day help managers and regulators monitor expert workers' accuracy and decide when to intervene if improvement is needed. It might also help consumers choose service providers such as doctors.
"In every profession where people make these types of decisions, it would be valuable to assess the quality of decision-making," Saar-Tsechansky says. "I don't think that any of us should be off the hook, especially if we make consequential decisions."
Future research aims to refine the method's ability to handle scarce ground truth by leveraging active learning techniques and enhancing its adaptability across organizational settings.
"A Machine Learning Framework for Assessing Experts' Decision Quality" is published in the journal Management Science.
Source:
Journal reference: