01/24/2025 | News release | Distributed by Public on 01/24/2025 14:23
Based on the research of Maytal Saar-Tsechansky
Years ago, as she sat in waiting rooms, Maytal Saar-Tsechansky began to wonder how people chose a good doctor when they had no way of knowing a doctor's track record on accurate diagnoses. Talking to other patients, she found they sometimes based choices on a physician's personality, individual reviews without knowing the physician's overall performance, or even the quality of their office furniture.
"I realized all these signals people are using are just not the right ones," says Saar-Tsechansky, professor of information, risk, and operations management at Texas McCombs. "We were operating in complete darkness, like there's no transparency on these things."
In new research, she uses artificial intelligence to judge the judges: to evaluate the rates at which experts make successful decisions. Her machine learning algorithm can appraise both doctors and other kinds of experts - such as engineers who diagnose mechanical problems - when their success rates are not publicly available or not scrutinized beyond small groups of peers.
Prior research has reported diagnostic accuracies of groups of doctors but not ways to practically assess individual doctors' accuracies, Saar-Tsechansky says: particularly methods that can be scaled up to continuously monitor their performance. Such monitoring could help them improve themselves while helping managers allocate them to the right tasks.
More effective methods are critical today, she adds, when medical systems are deploying AI to help with diagnoses. It will be difficult to determine whether AI is helping or hurting successful diagnoses if observers can't tell how successful a doctor was without the AI assist.
Evaluating the Experts
With McCombs doctoral student Wanxue Dong and Tomer Geva of Tel Aviv University in Israel, Saar-Tsechansky created an algorithm they call MDE-HYB. It integrates various sets of information about experts' past cases to evaluate the quality of their decisions.
They then compared MDE-HYB's results with the best existing methods for assessing the accuracies of decision-makers' judgments. To test the flexibility of MDE-HYB's evaluations, they analyzed very different kinds of data:
For all contexts, MDE-HYB equaled or bested all challengers. Against other algorithms, its error rates were up to 95% lower.
The researchers also tested MDE-HYB on Saar-Tsechansky's original concern: selecting a doctor based on the doctor's history of correct diagnoses. Compared with another algorithm, MDE-HYB's assessment of misdiagnosis rates was 41% more accurate.
In real-world use, such a difference could translate to better patient outcomes and lower costs, she says.
As a next step, she hopes others will apply, evaluate, and build upon MDE-HYB in real-world settings. Like any tool, she says, it has limitations, and its accuracy may vary across different contexts.
But she hopes it can one day allow experts to assess their own performance. It might also help managers and regulators ensure an expert's accuracy is acceptable - or help them design an intervention to improve it. Additionally, it might help consumers choose service providers such as doctors.
"Ultimately, the ability to assess decision-making quality is valuable across all professions where consequential choices are made," Saar-Tsechansky says. "While respecting professional expertise, it's important to recognize that thoughtful evaluation can benefit both practitioners and those they serve."
" A Machine Learning Framework for Assessing Experts' Decision Quality " is published in Management Science.
Story by Omar L. Gallaga