Machine Learning for Early Dropout Prediction in an Active Aging App

In a recent paper submitted to the arXiv* server, researchers presented a machine-learning approach for predicting early dropouts in an active and healthy aging app. The proposed algorithms were submitted to the International Federation of Medical and Biological Engineering (IFMBE)  Scientific Challenge 2022. The results demonstrate that machine learning algorithms can offer high-quality adherence predictions.

Study: Machine Learning for Early Dropout Prediction in an Active Aging App Image credit: Peshkova/Shutterstock
Study: Machine Learning for Early Dropout Prediction in an Active Aging App. Image credit: Peshkova/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Background

The IFMBE Scientific Challenge 2022 aimed to predict early dropouts by analyzing user acquisition patterns. Using machine learning algorithms, a binary classification task was used to forecast user adherence for three future acquisitions based on scheduled acquisitions and user characteristics.

Dataset preparation

Dataset and Features: The Moving Active and Healthy Aging (MAHA) dataset contains approximately 400 users with demographic features, acquisitions, and answered questionnaires organized in 10 tables. The data consist of users' acquisitions per activity, socio-demographic characteristics, acceptability of the application, application logs, and participants' quality of life. The questionnaires include the Self Perception Questionnaire (SPQ), Unified Theory of Acceptance and Use of Technology (UTAUT), EQ-5D-3L, and the University of California, Los Angeles (UCLA). The challenge comprises two phases, and the combined datasets from both phases are used.

Pre-processing: To prepare the data for machine learning, pre-processing is necessary. Users with different statuses and limited interactions are discarded for dynamic features (number of acquisitions per activity). The dataset is reduced to 463 users after data cleansing. The active periods for each user are calculated, and sessions are divided based on weekly intervals. Finally, the researchers produced 15-session sets using all possible linear combinations within each participant's active period, employing a sliding window algorithm. The resulting dataset comprised 84111 rows, representing session acquisitions. To obtain the corresponding target adherence, the last three acquisitions in each set were added together.

Static features: Questionnaires contain null values, and Cronbach's alpha assesses reliability. Demographic features show that most users are elderly with limited technological knowledge.

Final datasets and imbalance: After combining dynamic and static features, seven datasets are generated, with dataset six containing acquisitions and static information. The dataset is imbalanced, with most samples presenting low adherence. Oversampling techniques are used to reduce skewness. Null values are handled using mode imputation, and normalization is performed to improve model performance.

Issue of duplicate session tuples

The MAHA dataset poses a challenge with duplicate session data, where each 12-tuple session is limited to distinct values due to low or high acquisitions. This results in only 4096 unique values, whereas the dataset contains 3948 distinct rows out of 84111. Even considering the number of acquisitions per session (up to four), the dataset has 26924 unique 12-tuple sessions, causing samples to be included multiple times. This imbalance affects classification algorithms, leading to biased predictions favoring the majority class (low adherence). Addressing the duplicate data issue is crucial to improving model generalization and preventing concept-learning problems, enabling accurate classification of the minority class (high adherence).

Study Results

Local Evaluation: Various classification algorithms were employed, including Random Forest (RF), k-nearest neighbor (kNN), XGBoost, and Multi-Layer Perceptron (MLP). The classifiers were evaluated locally on each dataset using 10-fold cross-validation. The MLP and XGBoost models showed superior performance, correctly predicting high adherence. Conventional machine learning algorithms also performed well but were slower to train.

Feature importance: For the RF model, the last two acquisition sessions strongly influenced the classifier's behavior. In later datasets, the week's number and demographic features became more influential, indicating the importance of certain dates and the influence of age and technological level on adherence.

Oversampling techniques: To address the class imbalance, four oversampling techniques were used: random oversampling, Synthetic Minority Oversampling Technique (SMOTE), adaptive synthetic, and conditional tabular Generative Adversarial Network (CTGAN). The MLP model with oversampling provided better results, significantly improving the classification scores compared to the baseline. The SMOTE method outperformed other techniques.

Official challenge results: In Phases I and II of the challenge, different classifiers were submitted using various datasets. The MLP model performed consistently well in both phases, while XGBoost showed slightly better local evaluation but struggled to generalize. The ensemble method did not outperform the single MLP classifier. Oversampling techniques produced varying results, with SMOTE providing the highest-quality model and winning the challenge. There was a significant difference in classification performance between local evaluation and official results on some datasets, likely due to a dataset shift.

Conclusion

In summary, addressing the adherence problem has significant implications for improving the quality of life for elderly individuals and promoting healthy aging. The IFMBE Scientific Challenge 2022 involved predicting user adherence through binary classification. The researchers performed pre-processing techniques to generate final datasets and tested various binary classification methods. MLP and XGBoost models slightly outperformed conventional algorithms in local evaluation, but conventional models had faster training. Features other than consecutive session acquisitions did not significantly impact adherence prediction. Oversampling, especially with the SMOTE algorithm, yielded the highest classification performance. Challenges included dataset interpretation, content learning, noisy data, and imbalanced datasets. Future work aims to address these issues and analyze feature performance in the evaluation set.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, August 05). Machine Learning for Early Dropout Prediction in an Active Aging App. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20230805/Machine-Learning-for-Early-Dropout-Prediction-in-an-Active-Aging-App.aspx.

  • MLA

    Lonka, Sampath. "Machine Learning for Early Dropout Prediction in an Active Aging App". AZoAi. 21 November 2024. <https://www.azoai.com/news/20230805/Machine-Learning-for-Early-Dropout-Prediction-in-an-Active-Aging-App.aspx>.

  • Chicago

    Lonka, Sampath. "Machine Learning for Early Dropout Prediction in an Active Aging App". AZoAi. https://www.azoai.com/news/20230805/Machine-Learning-for-Early-Dropout-Prediction-in-an-Active-Aging-App.aspx. (accessed November 21, 2024).

  • Harvard

    Lonka, Sampath. 2023. Machine Learning for Early Dropout Prediction in an Active Aging App. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20230805/Machine-Learning-for-Early-Dropout-Prediction-in-an-Active-Aging-App.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Enhances Water Quality Monitoring