In a paper published in the journal Scientific Reports, researchers investigated the impact of audio quality, especially in video conferencing (Zoom), on human voice recognition. They found that Zoom and studio-quality audio performed similarly, surpassing telephone audio. Interestingly, participants familiarized with Zoom audio demonstrated a trend towards improved recognition, highlighting the potential advantages of Zoom's speech coding mechanisms.
Previous Research on Voice Recognition
Past work in voice recognition has predominantly focused on investigating voice recognition under traditional telephone and studio audio conditions. Standard landline and mobile telephony have been studied, with specific attention given to the limitations of bandwidth and transmission characteristics in these contexts. The dominant finding has been that voice recognition is more challenging under telephone audio than studio-quality audio.
However, results have been mixed, with some studies revealing no significant differences between telephone and studio audio recognition, highlighting the need for further investigation into these factors.
Study Participants and Materials
Speakers: Nine female native speakers of Zurich German, aged 22 and 27, participated in the study. They were all Zurich natives, spoke Zurich German daily, and had no reported speech, language, or hearing impairments.
Listeners: 63 native Swiss German speakers (18 male) participated in the study, divided into three groups: studio familiarization, Zoom familiarization, and telephone familiarization. These listeners were between 18 and 35 years old, born and raised in Switzerland, and had no reported speech, language, or hearing impairments. The University of Zurich recruited them from its student population.
Studio-quality stimuli: Recorded nine female speakers reading 75 sentences in Swiss Standard German in a controlled acoustic environment. These sentences included various structures and were semantically unpredictable.
Zoom-quality stimuli: Audio with a bandwidth of 12 kHz was generated by playing the studio-quality stimuli over a Zoom call and recording them locally.
Telephone-quality stimuli: Researchers obtained them by playing the studio-quality stimuli on one mobile phone and recording them on another during a phone call. They also used VoIP technology, resulting in a wider bandwidth of 8 kHz. All stimuli, totalling 2025 sentences, were normalized to 70 dB SPL.
Procedure: The experiment had two parts: familiarization and a voice recognition test (old–new judgment). Researchers randomly assigned listeners to one of the three familiarization groups. During familiarization, they were exposed to four female voices, hearing five sentences per voice for 20 sentences. In the subsequent voice recognition test, they presented listeners with 96 trials, each containing a sentence from a familiar or unfamiliar voice in all three audio conditions (studio, Zoom, telephone).
Statistical Analyses: Researchers conducted the analysis using signal detection theory (SDT), which measures sensitivity (dʹ) and response bias (c). Sensitivity measures the ability to distinguish familiar voices (signal) from unfamiliar ones (noise), while response bias quantifies the tendency to favor one response over the other. Statistical analysis involved a two-way mixed ANOVA with factors for familiarization audio and test audio, performed on dʹ and c values. Researchers carried out the analysis in R version 4.0.3.
Sensitivity (dʹ) and Response Bias (c) Analyses
Sensitivity (dʹ): Researchers initially examined the interaction between the audio quality used for familiarization and the audio quality employed in the test on dʹ to ensure that different levels of familiarity did not influence the voice recognition performance in the test during the study.
The results showed that the interaction was insignificant, suggesting that listeners' performance in different audio conditions during the voice recognition test was not affected by the audio quality during familiarization. Consequently, researchers focused on analyzing the main effects of familiarization and the test individually. The analysis revealed a significant effect of familiarization audio on dʹ, indicating differences in the overall voice recognition performance among listener groups, regardless of the audio quality during the test. Post hoc comparisons revealed that listeners familiarized with the voices via Zoom audio outperformed those familiar with telephone audio.
Additionally, there was a notable trend of improved performance for Zoom-familiarized listeners compared to studio-familiarized listeners. Researchers did not observe a significant difference between the listeners familiarized with the studio and those acquainted with telephone audio. The main effect of audio quality on dʹ in the voice recognition test was also significant. Post hoc comparisons suggested that listeners performed significantly better via Zoom audio and studio audio in the test compared to telephone audio. Nevertheless, researchers found no significant difference in listeners' performance between Zoom and studio audio during the test.
Response bias (c): When examining the interaction between familiarization audio and test audio on response bias (c), researchers found a significant interaction, indicating that listeners' response bias across different audio qualities in the test varied depending on the audio quality during familiarization. Post hoc comparisons revealed that listeners familiarized with voices via telephone audio displayed a significant bias towards responding 'new' when they heard Zoom or studio audio compared to telephone-quality stimuli.
Listeners familiarized with voices via Zoom and studio audio showed no significant differences in response bias. Listeners introduced with voices via telephone audio displayed no significant difference in bias towards studio and Zoom audio in the test, indicating that they perceived the differences in audio quality between telephone and studio and between telephone and Zoom as equally distinct from telephone audio quality.
Conclusion
To sum up, this study revealed that Zoom audio quality during familiarization had minimal impact on voice recognition performance in the subsequent test. Listeners familiarized with voices via Zoom audio even tended to perform better in recognition, emphasizing its potential advantages.
The study also demonstrated that voices presented through Zoom or studio audio significantly improved recognition performance compared to telephone audio. These findings underscore the importance of signal bandwidth in voice recognition. Response bias played a role, particularly for telephone-familiarized listeners, who showed varying biases in response to different audio qualities.
In conclusion, this research sheds light on the intricate relationship between audio quality, familiarity, and response bias in voice recognition, opening avenues for further exploration and understanding.
Source:
Journal reference: