AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.Aug 9 2023

Parkinson’s disease (PD) impacts the elderly with symptoms including tremors and speech difficulties. As the aging population grows, early PD detection is essential. In a paper published in the journal Biomimetics, researchers proposed an artificial intelligence (AI) model, utilizing a transformer-based method and neural network, that outperforms the state-of-the-art (SOTA) method Gradient-Boosted Decision Trees (GBDTs) by 1% area under the curve (AUC), improving precision and recall.

*Study: AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis. Image credit: PopTika/Shutterstock*

Background

PD is the second most prevalent age-related neurodegenerative ailment following Alzheimer’s disease, with projections indicating a surge to 135 million cases worldwide by mid-century. The World Health Organization anticipates neurodegenerative conditions, including Parkinson’s and motor-related disorders, to surpass cancer as the second principal cause of death by 2040. PD, characterized by progressive nervous system impairment and movement issues, starts subtly and worsens over time, leading to stiffness and reduced movement speed.

Detecting PD early is crucial for minimizing its impact and expenses. Hindered speech and writing due to rigid muscles have prompted investigations into cost-effective screening techniques such as dysphonia measures from voice recordings.

Historically, machine learning (ML) approaches such as support vector machines (SVM), random forest (RF), and logistic regression were applied to smaller dysphonia datasets. Larger datasets have led to using gradient boosting machines (GBMs), particularly GBDTs, as the SOTA method. However, their limitations have inspired the introduction of neural network-based solutions. The researchers introduced the Vocal Tab Transformer network, a neural network architecture surpassing GBDTs in classifying PD.

Related work

Early studies in PD delve into PD identification, emphasizing voice as a significant diagnostic attribute. Vocal attributes extracted from sustained phonation, such as harmonic-to-noise ratio, jitter, shimmer, and prosodic features, have been employed in various ML models for PD classification. Recent explorations introduce novel deep learning and ML approaches for diagnosing PD via speech signal analysis. Techniques such as SkipConNet and RF merge convolutional neural networks with RF for improved accuracy, while ensemble learning enhances prediction accuracy for the Unified Parkinson’s Disease Rating Scale (UPDRS) using large datasets. Despite the potential of deep learning techniques for predicting PD symptoms, data accessibility, and interpretability remain challenges.

Materials and methods

Datasets: Two datasets were employed: the main PD dataset from the UCI Machine Learning Repository with enriched features and the PD dataset to test generalizability. The main dataset comprises 188 Parkinson's patients (107 men, 81 women) aged 33 to 87 (mean 65.1 ± 10.9) and 64 healthy individuals (23 men, 41 women) aged 41 to 82 (mean 61.1 ± 8.9). Derived vocal features using various signal processing techniques.

Another dataset included 40 subjects, half with PD and half healthy, with six women and 14 men in the PD group and 10 women and 10 men in the non-PD group. Each subject contributed 26 voice recordings of varied types. This dataset had a total of 26 features.

Data pre-processing: Standardization was applied to the main PD dataset, dividing it into a stratified 10-fold training and testing set. The Adaptive Synthetic (ADASYN) algorithm addressed class imbalance by oversampling the minority class in the training set. Similar data splits were executed for the PD dataset, with the ADASYN algorithm applied to each fold's training set.

Methods: The Vocal Tab Transformer was introduced, incorporating multi-head self-attention and position-wise feed-forward layers. The architecture leveraged vocal features to facilitate classification. Vocal Tab Transformer is a novel approach, integrating a transformer-based technique with feature selection to enhance accuracy and manage complexity. The procedure involves training XgBoost with the complete dataset, determining feature importance, ranking features, and selecting the top N features to train the proposed network.

System model: In the feature selection process, tree-based models such as RFs and decision trees are frequently utilized. These models give precedence to high-performing features closer to the tree's root. GBDTs, which excel on complex and imbalanced datasets, are chosen for the main PD dataset. Empirical observations provide proof of the dominance of GBDTs compared to alternative techniques.

Feature embedding was heterogenous with categorical and ordinal features embedded, such as tokens, and continuous features linearly projected. A transformer encoder block facilitated learning contextual relationships, followed by prediction in the multi-layer perceptron (MLP) head.

The evaluation adopted a 10-fold strategy of training and testing on different subsets and averaging AUC scores. The neural network comprised approximately 14 million trainable parameters. The training and inference times were documented on specific hardware.

Classifiers: XGBoost, Scikit-Learn's GradientBoostingClassifier, RandomForestClassifier, AdaBoostClassifier, KNeighborsClassifier, DecisionTreeClassifier, SVM, LogisticRegression, and GaussianNB were utilized as classifiers. Each classifier underwent training and testing on a CPU.

Study Results

Evaluation involved receiver operating characteristic (ROC)-AUC scores, precision, and recall metrics. The proposed approach's comparison with MLP, XGBoost, and RF, performance on another dataset, and the impact of hyper-parameters were explored. Analysis revealed the significance of parameters such as batch size, attention heads, and feature count. Comparative graphs highlighted the ROC-AUC scores for the proposed approach and MLP.

Conclusion

In summary, the proposed approach capitalizes on vocal features for early PD detection. Surpassing SOTA GBDTs, it offers enhanced accuracy and multi-modal potential. Future avenues include broader dataset application, coupling with k-nearest neighbors for greater accuracy, and exploring the transformer-based network's robustness to noise and its interpretability. With increasing PD cases, the proposed approach's significance in early detection and intervention becomes increasingly pronounced.

Journal reference:

Nijhawan, Rahul., et al. (2023). A Novel Artificial-Intelligence-Based Approach for Classification of Parkinson’s Disease Using Complex and Large Vocal Features. Biomimetics 8, no. 4: 351. DOI: https://doi.org/10.3390/biomimetics8040351, https://www.mdpi.com/2313-7673/8/4/351

Posted in: AI Research News

Comments (0)

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, August 09). AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis. AZoAi. Retrieved on July 05, 2025 from https://www.azoai.com/news/20230809/AI-Enhanced-Early-Detection-of-Parkinsons-Disease-Through-Vocal-Analysis.aspx.
MLA
Lonka, Sampath. "AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis". AZoAi. 05 July 2025. <https://www.azoai.com/news/20230809/AI-Enhanced-Early-Detection-of-Parkinsons-Disease-Through-Vocal-Analysis.aspx>.
Chicago
Lonka, Sampath. "AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis". AZoAi. https://www.azoai.com/news/20230809/AI-Enhanced-Early-Detection-of-Parkinsons-Disease-Through-Vocal-Analysis.aspx. (accessed July 05, 2025).
Harvard
Lonka, Sampath. 2023. AI-Enhanced Early Detection of Parkinson's Disease Through Vocal Analysis. AZoAi, viewed 05 July 2025, https://www.azoai.com/news/20230809/AI-Enhanced-Early-Detection-of-Parkinsons-Disease-Through-Vocal-Analysis.aspx.