Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Nov 3 2023

In a paper published in the journal Scientific Reports, researchers investigated the impact of audio quality, especially in video conferencing (Zoom), on human voice recognition. They found that Zoom and studio-quality audio performed similarly, surpassing telephone audio. Interestingly, participants familiarized with Zoom audio demonstrated a trend towards improved recognition, highlighting the potential advantages of Zoom's speech coding mechanisms.

*Study: Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom. Image credit: Generated using DALL.E.3*

Previous Research on Voice Recognition

Past work in voice recognition has predominantly focused on investigating voice recognition under traditional telephone and studio audio conditions. Standard landline and mobile telephony have been studied, with specific attention given to the limitations of bandwidth and transmission characteristics in these contexts. The dominant finding has been that voice recognition is more challenging under telephone audio than studio-quality audio.

However, results have been mixed, with some studies revealing no significant differences between telephone and studio audio recognition, highlighting the need for further investigation into these factors.

Study Participants and Materials

Speakers: Nine female native speakers of Zurich German, aged 22 and 27, participated in the study. They were all Zurich natives, spoke Zurich German daily, and had no reported speech, language, or hearing impairments.

Listeners: 63 native Swiss German speakers (18 male) participated in the study, divided into three groups: studio familiarization, Zoom familiarization, and telephone familiarization. These listeners were between 18 and 35 years old, born and raised in Switzerland, and had no reported speech, language, or hearing impairments. The University of Zurich recruited them from its student population.

Studio-quality stimuli: Recorded nine female speakers reading 75 sentences in Swiss Standard German in a controlled acoustic environment. These sentences included various structures and were semantically unpredictable.

Zoom-quality stimuli: Audio with a bandwidth of 12 kHz was generated by playing the studio-quality stimuli over a Zoom call and recording them locally.

Telephone-quality stimuli: Researchers obtained them by playing the studio-quality stimuli on one mobile phone and recording them on another during a phone call. They also used VoIP technology, resulting in a wider bandwidth of 8 kHz. All stimuli, totalling 2025 sentences, were normalized to 70 dB SPL.

Procedure: The experiment had two parts: familiarization and a voice recognition test (old–new judgment). Researchers randomly assigned listeners to one of the three familiarization groups. During familiarization, they were exposed to four female voices, hearing five sentences per voice for 20 sentences. In the subsequent voice recognition test, they presented listeners with 96 trials, each containing a sentence from a familiar or unfamiliar voice in all three audio conditions (studio, Zoom, telephone).

Statistical Analyses: Researchers conducted the analysis using signal detection theory (SDT), which measures sensitivity (dʹ) and response bias (c). Sensitivity measures the ability to distinguish familiar voices (signal) from unfamiliar ones (noise), while response bias quantifies the tendency to favor one response over the other. Statistical analysis involved a two-way mixed ANOVA with factors for familiarization audio and test audio, performed on dʹ and c values. Researchers carried out the analysis in R version 4.0.3.

Sensitivity (dʹ) and Response Bias (c) Analyses

Sensitivity (dʹ): Researchers initially examined the interaction between the audio quality used for familiarization and the audio quality employed in the test on dʹ to ensure that different levels of familiarity did not influence the voice recognition performance in the test during the study.

The results showed that the interaction was insignificant, suggesting that listeners' performance in different audio conditions during the voice recognition test was not affected by the audio quality during familiarization. Consequently, researchers focused on analyzing the main effects of familiarization and the test individually. The analysis revealed a significant effect of familiarization audio on dʹ, indicating differences in the overall voice recognition performance among listener groups, regardless of the audio quality during the test. Post hoc comparisons revealed that listeners familiarized with the voices via Zoom audio outperformed those familiar with telephone audio.

Additionally, there was a notable trend of improved performance for Zoom-familiarized listeners compared to studio-familiarized listeners. Researchers did not observe a significant difference between the listeners familiarized with the studio and those acquainted with telephone audio. The main effect of audio quality on dʹ in the voice recognition test was also significant. Post hoc comparisons suggested that listeners performed significantly better via Zoom audio and studio audio in the test compared to telephone audio. Nevertheless, researchers found no significant difference in listeners' performance between Zoom and studio audio during the test.

Response bias (c): When examining the interaction between familiarization audio and test audio on response bias (c), researchers found a significant interaction, indicating that listeners' response bias across different audio qualities in the test varied depending on the audio quality during familiarization. Post hoc comparisons revealed that listeners familiarized with voices via telephone audio displayed a significant bias towards responding 'new' when they heard Zoom or studio audio compared to telephone-quality stimuli.

Listeners familiarized with voices via Zoom and studio audio showed no significant differences in response bias. Listeners introduced with voices via telephone audio displayed no significant difference in bias towards studio and Zoom audio in the test, indicating that they perceived the differences in audio quality between telephone and studio and between telephone and Zoom as equally distinct from telephone audio quality.

Conclusion

To sum up, this study revealed that Zoom audio quality during familiarization had minimal impact on voice recognition performance in the subsequent test. Listeners familiarized with voices via Zoom audio even tended to perform better in recognition, emphasizing its potential advantages.

The study also demonstrated that voices presented through Zoom or studio audio significantly improved recognition performance compared to telephone audio. These findings underscore the importance of signal bandwidth in voice recognition. Response bias played a role, particularly for telephone-familiarized listeners, who showed varying biases in response to different audio qualities.

In conclusion, this research sheds light on the intricate relationship between audio quality, familiarity, and response bias in voice recognition, opening avenues for further exploration and understanding.

Source:

Journal reference:

Perepelytsia, V., & Dellwo, V. (2023). Acoustic compression in Zoom audio does not compromise voice recognition performance. Scientific Reports, 13:1, 18742. https://doi.org/10.1038/s41598-023-45971-x, https://www.nature.com/articles/s41598-023-45971-x

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2023, November 03). Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom. AZoAi. Retrieved on April 03, 2025 from https://www.azoai.com/news/20231103/Impact-of-Audio-Quality-Familiarity-and-Response-Bias-in-Voice-Recognition-on-Zoom.aspx.
MLA
Chandrasekar, Silpaja. "Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom". AZoAi. 03 April 2025. <https://www.azoai.com/news/20231103/Impact-of-Audio-Quality-Familiarity-and-Response-Bias-in-Voice-Recognition-on-Zoom.aspx>.
Chicago
Chandrasekar, Silpaja. "Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom". AZoAi. https://www.azoai.com/news/20231103/Impact-of-Audio-Quality-Familiarity-and-Response-Bias-in-Voice-Recognition-on-Zoom.aspx. (accessed April 03, 2025).
Harvard
Chandrasekar, Silpaja. 2023. Impact of Audio Quality, Familiarity, and Response Bias in Voice Recognition on Zoom. AZoAi, viewed 03 April 2025, https://www.azoai.com/news/20231103/Impact-of-Audio-Quality-Familiarity-and-Response-Bias-in-Voice-Recognition-on-Zoom.aspx.