Machine Learning Matches Human Perception in Cross-linguistic Sound Classification

In a recent publication in the journal Scientific Reports, researchers evaluated three machine learning algorithms, including linear discriminant analysis (LDA), decision trees (C5.0), and neural networks (NNET), in alignment with human speech perception. These algorithms were trained on first-language (L1) vowel formants and duration and tested on second-language (L2) vowels. Adult L2 speakers participated in perceptual classification.

Study: Machine Learning Matches Human Perception in Cross-linguistic Sound Classification. Image credit: Aree_S/Shutterstock
Study: Machine Learning Matches Human Perception in Cross-linguistic Sound Classification. Image credit: Aree_S/Shutterstock

Background

In recent years, machine learning has found application in predicting nonnative speech perception patterns. This involves the assumption, direct or indirect, from various speech models that acoustic and phonetic similarity between first-language (L1) and second-language (L2) sounds can predict L2 sound perception. For instance, the author’s previous study successfully used machine learning to predict the classification of English /ɪ/ and /iː/ in terms of Cypriot Greek /i/. Similarly, LDA has been employed to predict nonnative sound mappings. Gilichinskaya and Strange's study estimated the assimilation of American English vowels into Russian listeners' L1 vowel categories. LDA predicted this assimilation effectively.

Acoustic analysis and methodologies

Experimental protocols received approval from the Ethics Committee of the University of Nicosia, Department of Languages and Literature. All methods adhered to ethical standards outlined in the Declaration of Helsinki and its subsequent amendments. Participants had the option to withdraw at any time because participation was entirely voluntary. Data were kept confidential, and participant identities remained anonymous using codes. Every subject gave their informed consent.

A formant in a speech wave is an accumulation of acoustic energy at a specific frequency. For speech feature extraction, training data included F1, F2, and F3 formats and duration measurements of Cypriot Greek vowels (/i e a o u/) from 22 adults. Equivalent test data for Standard Southern British English vowels (/ɪ iː e ɜː æ ɑː ʌ ɒ ɔː uː ʊ/) were included, from 20 English-speaking adults (10 females). Natural speaking was encouraged, and recordings were made at a sampling rate of 44.1 kHz.

The study aimed to assess classifiers' ability to generalize across phonetic contexts. Acoustic analysis was conducted using Praat, with adjustments including pre-emphasis (50 Hz), window length (0.025 ms), and spectrogram view range (5500 Hz). Formant frequencies were extracted based on vowel analysis points, and vocalic duration was determined manually.

Machine learning employed three algorithms: LDA, C5.0, and NNET, to predict L2 sound classification relative to L1 phonetic categories. These models were trained using R software with cross-validation. LDA achieved 0.94 percent prediction accuracy, C5.0 reached 0.95 percent, and NNET attained the same accuracy. Model optimizations were cross-validated.

In the perception study, 20 Cypriot Greek speakers (10 females) participated. Their daily English use was reported, and their mean age at which English learning began was 8.35 years. All knew English at B2/C1 levels and had healthy sensory and cognitive functions. Test stimuli consisted of 11 English monophthongs embedded in /hVd/ words within the phrase "They say < word > now." The stimuli were recorded by two adult female English speakers. Participants individually completed the classification test, clicking on labels that matched the heard vowel. Participants gave written consent to the Declaration of Helsinki, and the University of Nicosia Ethics Committee approved the study as ethical.

Results and analysis

Machine learning algorithms classified L2 English vowels according to L1 vowel categories. For the responses with the highest proportion, C5.0 and LDA showed 100 percent agreement, whereas LDA and NNET showed 90 percent agreement, and NNET and C5.0 showed 90 percent agreement. For a broader range of above-chance responses, C5.0 and LDA exhibited 63.6 percent agreement, NNET and LDA had 72.7 percent agreement, and NNET and C5.0 showed 63.6 percent agreement.

In the perceptual test, L2 speakers classified English vowels similarly. Comparing machine learning to human participants, for responses with the highest proportions, LDA and C5.0 achieved 90.9 percent prediction accuracy. NNET reached 100 percent prediction accuracy, covering all English vowels. For the broader range of above-chance responses, LDA achieved 72.7 percent prediction accuracy. C5.0 had 45.5 percent prediction accuracy. NNET achieved 81.8 percent prediction accuracy.

Results indicated strong performance by LDA and NNET but poor performance by C5.0. This aligns with previous findings suggesting LDA's efficacy in mapping L2 sounds to L1 categories, albeit with reduced accuracy in predicting a wider range of responses. LDA, while slightly less accurate, still performed well, possibly due to the less nonlinear nature of the L1-L2 relationship and dataset size limitations. C5.0 struggled, potentially due to overfitting and difficulty handling continuous variables.

Conclusion

In summary, the study aimed to assess if machine learning algorithms, specifically NNET, C5.0, and LDA, trained on crosslinguistic acoustic data, could match the accuracy of L2 human listeners in classifying sounds. The models NNET and LDA showed accurate classification of L2 sounds based on L1 categories, with potential implications for cross-linguistic speech studies. These findings can enhance language learning and speech technology. Future research can explore larger samples and diverse classifier sets.

Journal reference:
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, September 24). Machine Learning Matches Human Perception in Cross-linguistic Sound Classification. AZoAi. Retrieved on November 24, 2024 from https://www.azoai.com/news/20230924/Machine-Learning-Matches-Human-Perception-in-Cross-linguistic-Sound-Classification.aspx.

  • MLA

    Lonka, Sampath. "Machine Learning Matches Human Perception in Cross-linguistic Sound Classification". AZoAi. 24 November 2024. <https://www.azoai.com/news/20230924/Machine-Learning-Matches-Human-Perception-in-Cross-linguistic-Sound-Classification.aspx>.

  • Chicago

    Lonka, Sampath. "Machine Learning Matches Human Perception in Cross-linguistic Sound Classification". AZoAi. https://www.azoai.com/news/20230924/Machine-Learning-Matches-Human-Perception-in-Cross-linguistic-Sound-Classification.aspx. (accessed November 24, 2024).

  • Harvard

    Lonka, Sampath. 2023. Machine Learning Matches Human Perception in Cross-linguistic Sound Classification. AZoAi, viewed 24 November 2024, https://www.azoai.com/news/20230924/Machine-Learning-Matches-Human-Perception-in-Cross-linguistic-Sound-Classification.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Boosts Earthquake Prediction Accuracy in Los Angeles