In an article published in the journal Heliyon, researchers explored using machine learning algorithms to predict life satisfaction with high accuracy (93.80%) and macro F1-score (73.00%).
Utilizing data from a Danish government survey, the authors identified 27 key questions for assessing contentment and employed feature learning techniques. They also examined clinical and biomedical large language models (LLMs) for predictions, highlighting health conditions as a crucial determinant and deploying the best model publicly for unrestricted use.
Background
Life satisfaction is a critical aspect of human well-being, closely linked to better mental health outcomes, increased work engagement, and reduced burnout rates. Conversely, low life satisfaction is associated with poorer mental health and decreased productivity. Recognizing its importance, various governments, including those of the United Kingdom and Bhutan, have implemented programs to measure and promote life satisfaction. Traditionally, life satisfaction assessment relies on analog methods, which are time-consuming, expensive, and logistically challenging, especially for large populations, limiting their effectiveness in policy-making and intervention strategies.
Previous research on life satisfaction has explored numerous factors such as mental health, age, socioeconomic status, environmental influences, social media, and personality traits. Studies have utilized national surveys, personal interviews, and statistical analyses to understand these factors.
However, these methods often suffer from biases, limited sample sizes, and difficulties in establishing causality. Recent advancements include the use of machine learning to examine the relationship between life satisfaction and specific variables like age, but these studies often lack comprehensive model interpretability and generalizability.
This paper addressed these gaps by leveraging machine learning algorithms to predict life satisfaction with high accuracy and interpretability. The study identified 27 key questions to assess life satisfaction efficiently. It also explored LLMs like biomedical bidirectional encoder representations from transformers (BioBERT) to transform tabular data into meaningful text, further enhancing prediction accuracy.
The deployment of explainable AI (XAI) ensured transparency, aiding policymakers in making informed decisions. This research not only improved prediction accuracy but also offered a practical tool for assessing life satisfaction, thereby contributing to better policy-making and intervention strategies.
Materials and Methodology
The researchers aimed to delve into the complexities of human contentment through a blend of psychology and machine learning. They outlined three primary objectives: identifying factors influencing life satisfaction, optimizing prediction models using machine learning, and enhancing model interpretability through XAI.
The authors employed a comprehensive dataset from the Danish government's survey of health, impairment, and living conditions in Denmark (SHILD), encompassing various aspects of life. Rigorous data preprocessing was conducted, including handling missing values, categorical encoding, and outlier detection. Features were selected using recursive feature elimination with cross-validation (RFECV), resulting in a questionnaire with 27 questions.
Machine learning algorithms such as random forest, gradient boosting, and logistic regression were utilized, with hyperparameter tuning to enhance model performance. An ensemble model combining random forest, gradient boosting, and light gradient boosting yielded the best results. LLMs like BERT, BioBERT, ClinicalBERT, and COReBERT were employed for text-based analysis. Implementation details highlighted the use of PyTorch and Hugging Face's Transformers library.
Performance metrics such as precision, recall, accuracy, F1-score, and AUC ROC were used to evaluate model performance.
Performance and Evaluation
The authors evaluated the performance of machine learning algorithms and LLMs in predicting life satisfaction. Machine learning models like random forest and gradient boosting showed high accuracy and F1 scores, with the ensemble classifier performing best. LLMs, including BERT and BioBERT, also demonstrated strong predictive capabilities. Statistical tests revealed significant differences in performance between certain models, highlighting the strengths of random forest and BioBERT.
Error analysis identified a tendency for models to predict false positives, with boosting algorithms showing a more balanced misclassification behavior. Ablation studies emphasized the importance of data resampling techniques and feature selection methods in enhancing model performance, with RFECV feature selection proving the most effective. XAI was employed to justify model decisions, offering insights into the determinants of life satisfaction.
Age group insights revealed varying factors influencing satisfaction across different age brackets, with health consistently playing a pivotal role. An interactive app was developed for real-time predictions of life satisfaction, facilitating broader accessibility and promoting the use of machine learning in understanding subjective well-being.
Insights into Life Satisfaction
Machine learning ensemble models outperformed others, emphasizing the importance of balancing false positives and negatives. Boosting models excelled due to their ability to refine accuracy sequentially, while random forests offered robustness. Support vector classifier (SVC) performed poorly, struggling with dataset imbalance and complex decision boundaries.
Feature analysis highlighted social, economic, cultural, physical, and mental health factors' impact on life satisfaction. However, limitations included dataset specificity and the static nature of data, potentially limiting model generalizability and accounting for life satisfaction's dynamic nature. The results underscored the need for diverse model approaches and comprehensive feature analysis in understanding and predicting life satisfaction across varied contexts.
Conclusion
In conclusion, the researchers showcased the efficacy of machine learning algorithms and LLMs in predicting life satisfaction. Achieving high accuracy and F1 scores with minimal questions, the authors offered a practical tool for mental health assessment. Incorporating XAI enhanced model transparency and trust. Insights into demographic variations underscored the multifaceted nature of life satisfaction. Future directions involve diversifying datasets and exploring deeper neural network architectures for enhanced predictive capability.
Journal reference:
- Khan, A. E., Hasan, M. J., Anjum, H., Mohammed, N., & Momen, S. (2024). Predicting life satisfaction using machine learning and explainable AI. Heliyon, 10(10), e31158. https://doi.org/10.1016/j.heliyon.2024.e31158, https://www.sciencedirect.com/science/article/pii/S2405844024071895
Article Revisions
- May 28 2024 - Correction to the journal name, ScienceDirect changed to Heliyon. Correction to research paper URL.