Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems

In an article published in the journal Scientific Reports, researchers explored the complex interaction between speech pathology and the performance of deep learning-based automatic speaker verification (ASV) systems. This research mainly focuses on understanding how different speech disorders affect the accuracy of automatic speaker verification systems, specifically delving deeper into the ASV landscape.

Study: Exploring the Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems. Image credit: Panchenko Vladimir/Shutterstock.
Study: Exploring the Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems. Image credit: Panchenko Vladimir/Shutterstock.

By examining these effects closely, the research identifies potential weaknesses in the systems and enhances our understanding of speaker identification across various conditions. The researchers used a real-world dataset with approximately 200 hours of healthy as well as pathological recordings (with voices of adults and children).
Background

ASV technology plays a crucial role in confirming the identity of speakers through voice analysis. This system works in two phases: one is enrollment, and the other is the verification phase. ASV technology is commonly used in security and voice-controlled devices.

In ASV, deep learning plays a crucial role, enabling high-level feature learning from speech signals and enhancing the accuracy and robustness of ASV. However, concerns arise about the efficacy of ASV in scenarios involving individuals with speech pathology because these systems are trained primarily on datasets containing healthy speakers. Addressing this gap, ongoing research highlights how speech disorders influence the outcomes of deep learning-based ASV systems, aiming to enhance understanding by thoroughly exploring the relationship between speech distortion and ASV accuracy.

About the Research

Researchers tried to answer the question: does pathological speech, when examined as a biomarker, increase susceptibility to re-identification attacks compared to healthy speech? To answer this question, they used a comprehensive real-world pathological speech dataset with 3800 test subjects from different age groups having various speech disorders. The dataset contains recordings of German speakers reading phonetically rich text or naming pictograms.

Researchers analyzed normal recordings as well as recordings with a diverse range of speech pathology (or disorders) such as dysglosia, dysarthria, dysphonia, and cleft lip and palate (CLP) of adults and children in both types of test subjects. Twenty repeated experiments were conducted to address potential biases, with a specific focus on age distribution and speaker numbers, ensuring fair comparisons among groups.

Researchers employed a deep learning neural network model, specifically the GE2E (Generalized End-to-End) TISV model, to investigate the complex interplay between speech disorders and ASV accuracy. Researchers used the dataset containing pathological and healthy speech  to train and evaluate speaker verification models based on recurrent neural networks.

The study indicates that several factors, including the age of subjects, recording quality, microphone type, background noise, and speech intelligibility, can impact speaker verification accuracy in the ASV system.

Research Findings

The findings show that pathological speech has a significant impact on speaker verification performance of ASV and that different speech pathologies have different effects. The study reports a low mean equal error rate (EER) of 0.89% for the entire pathological dataset, which is lower than common values found in non-pathological datasets. Research shows that identifying those people with pathological speech (or unusual speech), such as adults with voice problems (dysphonia) and children with CLP, as compared to the adults and children with regular speech is very much easy.

The results also reveal that speech intelligibility does not influence the speaker verification performance of ASV, suggesting that automatic speaker verification systems can operate effectively even if the speech is not clearly understood. Increasing the size of the used training dataset can also improve the speaker verification performance of the ASV system. This is because having more data helps the neural network model learn better (due to which the error rate is reduced), making it more accurate in recognizing speakers.

Applications

This research can be applied in various fields like healthcare, voice-controlled devices, access control, forensic investigation, telecommunications, etc. Specifically, it can be used in healthcare applications in which speech is used as a biomarker, including diagnosis, therapy, screening, and monitoring of speech and voice disorders (or voice pathology). Apart from healthcare, it can be applied to enhance security and reliability in biometric authentication applications such as access control, banking, e-commerce, voice-controlled devices, forensic investigation, and telecommunications, which use voice as a means of authentication.

Conclusion

In conclusion, this paper presents a comprehensive study of the effect of speech pathology on speaker verification using deep learning-based ASV systems, using a large-scale dataset of pathological and healthy speech. As per the researchers, pathological speech influences speaker verification performance in different ways, depending on the type of pathology, the recording environment, the diversity of the recorded speech data, and the size of the used dataset.

The study findings show that speech intelligibility does not affect the speaker verification performance of the ASV. The paper concludes by highlighting the importance and challenges of speech pathology in speaker verification and suggesting directions for future work, such as extending the dataset, developing anonymization techniques, and examining individual-level differences.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2023, December 14). Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems. AZoAi. Retrieved on July 04, 2024 from https://www.azoai.com/news/20231126/Impact-of-Speech-Pathology-on-DL-Based-Automatic-Speaker-Verification-Systems.aspx.

  • MLA

    Osama, Muhammad. "Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems". AZoAi. 04 July 2024. <https://www.azoai.com/news/20231126/Impact-of-Speech-Pathology-on-DL-Based-Automatic-Speaker-Verification-Systems.aspx>.

  • Chicago

    Osama, Muhammad. "Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems". AZoAi. https://www.azoai.com/news/20231126/Impact-of-Speech-Pathology-on-DL-Based-Automatic-Speaker-Verification-Systems.aspx. (accessed July 04, 2024).

  • Harvard

    Osama, Muhammad. 2023. Impact of Speech Pathology on DL-Based Automatic Speaker Verification Systems. AZoAi, viewed 04 July 2024, https://www.azoai.com/news/20231126/Impact-of-Speech-Pathology-on-DL-Based-Automatic-Speaker-Verification-Systems.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Deep Learning Enhances Urban Building Mapping