Microsoft Boosts AI Speech Recognition Accuracy for Non-Standard English Speakers

Download PDF Copy

Reviewed by Joel ScanlonFeb 2 2025

By training AI on real-world speech diversity, Microsoft’s Azure AI Speech platform is breaking barriers in voice recognition, making technology more accessible for individuals with atypical speech patterns.

Microsoft Boosts AI Speech Recognition Accuracy for Non-Standard English Speakers Image Credit: Digitala World / Shutterstock

Microsoft's Azure AI Speech platform achieved "significant improvements" in recognizing non-standard English speech thanks to recordings and transcripts from University of Illinois Urbana-Champaign Speech Accessibility Project participants. Its accuracy gains range from 18% to 60%, depending on the speaker's disability.

The changes are currently rolling out on Microsoft's Cloud endpoint for third-party customers.

Until now, the majority of voice recognition technology has been trained using recordings and transcriptions from audiobooks. However, an audiobook narrator and an individual with aphasia after a stroke sound different.

When the Speech Accessibility Project began, the largest database of atypical speech included 16 people with cerebral palsy. Mark Hasegawa-Johnson, a professor of electrical engineering and the Project's leader, also created it at the University of Illinois. The Project currently has about 1,500 participants, and Microsoft is a member of the coalition that funds it.

"Accessibility is a core value for Microsoft," said Aadhrik Kuila, a product manager at Microsoft working to integrate the Speech Accessibility Project data into Azure's Speech service. "These improvements are a testament to our commitment to building technologies that empower everyone, including people with non-standard speech. This collaboration not only enhances accessibility but also sets a benchmark for how industry and academia can work together to drive meaningful societal impact."

The Speech Accessibility Project records people with diverse speech patterns to improve voice recognition technology. The project is currently recruiting U.S., Canadian, and Puerto Rican adults with amyotrophic lateral sclerosis, cerebral palsy, Down syndrome, Parkinson's disease, and those who have had a stroke.

So, how do the project's recordings help improve speech recognition tools?

Think of the engineers training an artificial intelligence model as math teachers who have a pool of math problems (in this case, a training set made up of voice recordings). The engineers teach the computer how to solve the math problems by providing the answers (exact transcriptions of what the recordings say). They also set aside several math problems for a test at the end of the unit. The test set's problems are similar but new, so the engineers can see what the model learned.

"We also compare these results with our current production model to quantify the gains," Kuila said. "Importantly, we run our standard test sets focused on typical speech to ensure that incorporating (project) data doesn't cause regressions."

Microsoft is committed to enhancing the accessibility of AI systems by integrating disability-representative data into the development process.

"This iterative process allows us to fine-tune training parameters to strike the best balance," Kuila said, "improving performance for non-standard speech while maintaining or slightly enhancing accuracy for typical speech."

Hasegawa-Johnson said he's thrilled that Microsoft is already seeing improvements.

"It's the first result we've heard of a company running against production data and seeing significant improvements," he said. "It's exciting to see that a year and a half into the project, we're having an impact."

Other coalition members include Amazon, Apple, Google, and Meta. The project's data is first shared with coalition members and then made available to companies, universities, and researchers who agree to the data use agreement.

Source: