In a paper published in the journal PLOS ONE, researchers detailed their pursuit of enhancing dyslexia identification by introducing a groundbreaking multi-source dataset incorporating eye movement, demographic, and non-verbal intelligence data. Unlike previous binary-class sets, this dataset provided three labels and reading speed information.
Through rigorous experimentation with various artificial intelligence (AI) models, they established the effectiveness of multi-layer perceptron (MLP), random forest (RF), gradient boosting (GB), and k-nearest neighbor (KNN) in dyslexia prediction. Notably, combining these diverse data sources proved pivotal in achieving reliable results, shedding light on key features like Intelligence Quotient (IQ), gender, age, and specific eye fixation patterns, significantly contributing to accurate dyslexia identification.
Background
Developmental dyslexia presents a challenge in reading despite typical intelligence and language skills, affecting comprehension and overall academic, mental, and social well-being. Early identification is crucial for timely intervention. Traditional assessment methods, relying on language and cognitive tasks, are time-consuming and require specialist involvement.
Recent strides in AI-based dyslexia detection leverage diverse data sources, including eye-tracking, Electroencephalogram (EEG), Magnetic Resonance Imaging (MRI), and reading tests, offering promising alternatives to conventional assessments. Notably, AI algorithms have been applied across various data types to identify dyslexia, with eye-tracking, EEG, and MRI among the most frequently utilized data sources.
Dyslexia Detection with AI
The present research aimed to assess various AI methods for detecting dyslexia comprehensively. Researchers studied four model families, including artificial neural networks (ANN) like MLP and convolutional neural network (CNN), non-parametric models such as RF, AdaBoost, GB, KNN, and support vector machines (SVM), linear models encompassing linear and logistic regression, and Bayesian models like Gaussian and multinomial naive Bayes. While achieving similar results for regression, the study prioritized classification tasks.
ANN, specifically MLP and CNN, were chosen based on their historical success. MLP adjusts weights and biases across hidden layers to map input data to target variables. The key hyperparameters considered were neuron count, hidden layers, learning rate, activation functions, epochs, and optimization algorithm. Researchers utilized shallow networks with a single hidden layer to prevent overfitting caused by the dataset's limited size.
CNN addresses translation invariance issues via convolutional and pooling layers. It detects features and maintains their location information, which is essential in image-based data like eye fixation. For CNN, parameters included pairs of convolutional-pooling layers, filter size, number of filters, learning rate, activation functions, pooling size, and epochs. Researchers devised a combined CNN-MLP model, leveraging MLP's strengths in demographic data and CNN's fixation data to classify dyslexia from their amalgamation.
Non-parametric models like RF, AdaBoost, and GB involve ensemble learning, aggregating predictions from multiple decision trees. Hyperparameters such as the number of estimators, minimum samples for split/leaf, and learning rate were pivotal. KNN predicted target values based on the KNN in the training set. Tuned hyperparameters included the number of neighbors and the Minkowski distance's P value.
SVM aimed to maximize margins between hyperplanes and support vectors using kernel extensions like linear, polynomial, RBF, or sigmoid kernels. Tuned parameters encompassed regularization term, kernel coefficient, and epsilon-tube value. Naive Bayes classifiers assumed feature independence given the class label, mainly using the multivariate Gaussian distribution. Logistic regression involved maximizing conditional likelihood, relying on regularization strength, iterations, and Elastic-Net mixing parameters.
Lastly, after identifying the best estimator, feature importance was determined using the Shapley Additive exPlanations (SHAP) approach, gauging the magnitude and influence of each feature towards classification. Visualizations like bar plots of mean absolute SHAP values and beeswarm summary plots helped comprehend the impact of features on predictions.
Insights from Dyslexia Detection Experiment
The experimental results showcased the performance of different methods on various data sets. GB emerged as the top performer in predicting dyslexia from demographic data, while CNN excelled with fixation data, demonstrating its superiority in this context. Interestingly, the combination of demographic and fixation data significantly boosted the algorithms' performance, with MLP achieving the best results, closely followed by RF and KNN. Demographic features enhance the discrimination of eye-fixation data among different classes, aligning with expectations about eye movement similarities across certain student groups.
When analyzing feature importance in MLP predictions, demographic features constituted the majority (93.2%) of the overall significance, with factors like IQ, age, and gender being pivotal. Fixation data, although contributing less (6.8%), highlighted the importance of eye movement patterns, particularly fixation along the y-axis, indicating differences in gaze behavior between students with dyslexia and typical readers.
An independent test on newly collected data reinforced the model's robustness, albeit with a slight drop in performance compared to prior assessments. Exploring individual predictions revealed misclassifications, particularly with female dyslexic students from specific grades, hinting at challenges arising from data imbalance, insufficient training data, distribution shifts, and the need for more informative features.
To tackle these challenges, researchers considered potential strategies, including employing data balancing techniques, utilizing augmentation methods, and exploring more complex algorithms or feature combinations across data sources. Collecting additional data emerged as a crucial step toward resolving these limitations and refining the dyslexia detection model's accuracy and reliability in real-world scenarios.
Conclusion
In summary, the study aimed to enhance dyslexia detection through a comprehensive dataset spanning 307 annotated participants in Russian, rectifying previous data limitations. While individual models showed limitations in dyslexia prediction, a fusion of demographic and fixation data showcased promising results, with MLP emerging as the recommended AI model.
Despite constraints in training samples necessitating shallower neural networks, SHAP analysis highlighted key demographic features. However, fluctuations in independent test results and data insufficiency for specific grades and classes still need to be addressed. Future steps involve extensive data collection, clinical trials, feature augmentation, advanced classification methods, and investigating demographic feature nuances and dataset amalgamation for more robust dyslexia detection models.