From Doctors to Engineers: AI Enhances Decision Quality Monitoring

Download PDF Copy

Reviewed by Joel ScanlonJan 13 2025

Cutting-edge algorithm leverages both noisy and ground truth data to assess expert accuracy, offering groundbreaking solutions for diverse fields like healthcare, audits, and beyond.

Research: A Machine Learning Framework for Assessing Experts’ Decision Quality. Image Credit: Mikko Lemola / Shutterstock

Years ago, as she sat in waiting rooms, Maytal Saar-Tsechansky began to wonder how people chose a good doctor when they had no way of knowing a doctor's track record on accurate diagnoses. Talking to other patients, she found they sometimes based choices on a physician's personality or even the quality of their office furniture.

"I realized all these signals people are using are just not the right ones," says Saar-Tsechansky, professor of information, risk, and operations management at Texas McCombs. "We were operating in complete darkness, like there's no transparency on these things."

Groundbreaking Scalability: MDE-HYB leverages both noisy decision data and scarce ground truth labels, enabling accurate assessments even when experts handle mutually exclusive cases—a key challenge in expert evaluation.

In new research, she uses artificial intelligence to judge the judges: to evaluate the rates at which experts make successful decisions. Her team tested the algorithm using three distinct datasets: sales tax audits, IMDb movie reviews, and spam detection cases, showing its ability to adapt to diverse decision contexts. The machine learning algorithm, called MDE-HYB, can appraise both doctors and other kinds of experts—such as engineers who diagnose mechanical problems—when their success rates are not publicly available or not scrutinized beyond small groups of peers.

Saar-Tsechansky says prior research has studied the accuracy of doctors' diagnoses but not in ways that can be scaled up or monitored on an ongoing basis.

She adds that more effective methods are vital today when medical systems are deploying AI to help with diagnoses. It will be challenging to determine whether AI is helping or hurting successful diagnoses if observers can't tell how successful a doctor was without AI assistance.

Evaluating the Experts

Saar-Tsechansky and McCombs doctoral students Wanxue Dong and Tomer Geva of Tel Aviv University in Israel created an algorithm called MDE-HYB. It integrates two forms of information: overall data about the quality of an expert's past decisions and more detailed evaluations of specific cases.

They then compared MDE-HYB's results with those of other evaluators: three alternative algorithms and 40 human reviewers. To test the flexibility of MDE-HYB's ratings, datasets with varying levels of predictability—such as movie reviews with an 80% prediction accuracy and tax audits with lower predictive power—were analyzed.

In each case, evaluators judged prior decisions made by experts about the data, such as whether they accurately classified movie reviews as positive or negative. For all three sets, MDE-HYB equaled or bested all challengers.

Compared to other algorithms, its error rates were 95% lower. Compared to humans, they were up to 72% lower.

Synthetic Worker Simulation: The algorithm creates synthetic workers with predefined decision accuracy levels to generate data for training, ensuring robust and versatile performance across diverse datasets.

The researchers also tested MDE-HYB on Saar-Tsechansky's original concern: selecting a doctor based on the doctor's history of correct diagnoses. MDE-HYB dropped the average misdiagnosis rate by 41% compared to doctors chosen by another algorithm.

In real-world use, such a difference could translate to better patient outcomes and lower costs, she says. However, the study noted that MDE-HYB may face challenges when ground truth labels are extremely costly or scarce, potentially limiting its scalability in some high-stakes domains.

She cautions that MDE-HYB needs more work before it can be used in such practical applications. "The main purpose of this paper was to get this idea out there, to get people to think about it, and hopefully, people will improve this method," she says.

However, she hopes it can one day help managers and regulators monitor expert workers' accuracy and decide when to intervene if improvement is needed. It might also help consumers choose service providers such as doctors.

"In every profession where people make these types of decisions, it would be valuable to assess the quality of decision-making," Saar-Tsechansky says. "I don't think that any of us should be off the hook, especially if we make consequential decisions."

Future research aims to refine the method's ability to handle scarce ground truth by leveraging active learning techniques and enhancing its adaptability across organizational settings.

"A Machine Learning Framework for Assessing Experts' Decision Quality" is published in the journal Management Science.

Source:

University of Texas at Austin

Journal reference:

Dong, Wanxue, et al. “A Machine Learning Framework for Assessing Experts’ Decision Quality.” Management Science, 15 Oct. 2024, DOI: 10.1287/mnsc.2021.03357. Accessed 23 Nov. 2024. https://pubsonline.informs.org/doi/10.1287/mnsc.2021.03357

‌

Posted in: AI Research News