Unraveling Dataset Bias in Risky Choices through Machine Learning

Download PDF Copy

By Muhammad OsamaReviewed by Susha Cheriyedath, M.Sc.Jan 18 2024

In an article published in the journal Nature Human Behaviour, researchers from the Technical University of Darmstadt and Hessian Center for Artificial Intelligence, Germany, analyzed the relationship between different datasets and models of human risky choices between goods or gambles.

*Study: Unraveling Dataset Bias in Risky Choices through Machine Learning. Image credit: Summit Art Creations/Shutterstock*

They used machine learning (ML) techniques and highlighted the evidence for dataset bias that indicates the difference in the choice behavior of participants between online and laboratory experiments. Moreover, they identified features of gambling that were predictive of the difference in choice behavior and proposed a hybrid model that accounted for the increased decision noise in the online dataset.

Background

Human choices between gambles or goods have been studied by various disciplines, such as economics, psychology, neuroscience, and cognitive science, using normative and descriptive models. Normative models explain why a person should make some decisions, while descriptive models capture how people actually decide. However, human decisions often differ from the predictions of normative models, leading to the development of alternative descriptive models that incorporate cognitive and behavioral factors.

In recent years, advancements in ML, especially data-driven techniques involving neural networks (NNs), have enabled the discovery of new patterns and phenomena in various domains of science. This trend has expanded to developing neural network (NN) models for human decision-making, with training conducted on recently gathered datasets of human choices. These endeavors aim not only to achieve more precise descriptive models but also to explain the decisions of humans and improve the theory of human decision-making.

About the Research

In the present paper, the authors systematically investigated the interplay between decision datasets and ML models. They used three different datasets of human choices between gambles: the Choice Prediction Competition 2015 (CPC15) dataset, which contains choices by 446 participants in laboratory experiments; the Choice Prediction Competition 2018 (CPC18) dataset, which contains choices by 1,000 participants in laboratory experiments; and the Choices13k dataset, which contains choices by 13,000 participants in a large-scale online experiment.

The study trained several ML models on these datasets, including three classical methods that performed well in the CPC challenges and two different neural network architectures that were previously proposed by Bourgin et al. and Peterson et al. Then, they compared the performance of these models on the training and test data of the respective other datasets, as well as on the problem space of all gambles.

The researchers used several ML models, including supervised, unsupervised, semi-supervised, reinforcement learning, deep NNs, random forests, support vector machines, and a psychological model, to predict human choices on the different datasets. They also analyzed the features of the gambles, such as expected value, variance, stochastic dominance, and probability weighting, to understand how they relate to the difference in predictions between models.

Research Findings

The outcomes showed that the models trained on the choices13k dataset were the largest and most diverse so far, exhibiting poor generalization to the smaller laboratory datasets CPC15 and CPC18. Similarly, the models trained on the CPC15 dataset did not transfer well to the choices13k dataset, indicating a dataset bias, where the choice behavior of participants differed systematically between laboratory and online experiments. The study also found that the models trained on the choices13k dataset avoided predicting extreme choice proportions, such as choosing one gamble with a probability close to 1 or 0, while the models trained on the CPC15 dataset predicted more extreme choice proportions.

To understand the source of this bias, the authors analyzed the features of gambles predictive of the difference in choice behavior between the datasets. They used linear regressions and an explainable artificial intelligence (XAI) technique called SHapley Additive exPlanations (SHAP) to quantify the importance of each feature for the difference in model predictions. However, they found that the features explaining the most variance of the difference were obtained from the psychology and behavioral economics literature, such as stochastic dominance, probability of winning, and the difference in expected value.

These features were all related to the degree to which one gamble was expected to yield a higher payoff as compared to the other. The paper also highlighted that the choice behavior in the choices13k dataset was less sensitive to these features than the choice behavior in the CPC15 dataset, suggesting that online participants were more noisy or indifferent in their choices.

Based on these findings, a hybrid model was proposed that accounted for the increased decision noise in the choices13k dataset. This model consisted of a probabilistic generative model that assumed a proportion of participants in the online experiment were guessing randomly while the remaining participants were choosing according to a neural network trained on the CPC15 dataset with added decision noise in log-odds space. When this model was fitted to the choices13k dataset, it improved prediction accuracy and reduced the difference with the neural network trained on the CPC15 dataset.

Conclusion

In summary, the research demonstrated the usefulness of combining ML, data analysis, and theory-driven reasoning to understand the complex interactions of ML models and data of human risky choices. It also revealed the challenges and limitations of using large-scale online datasets to discover general theories of human decision-making, as the context of data collection may affect the choice behavior and introduce dataset bias.

The researchers suggested that a careful combination of theory and data analysis is still required to understand the underlying cognitive mechanisms and contextual factors that influence human decisions. They also opened up new questions for future work, such as how to account for the variability and noise in online data, compare and validate models across different datasets and experimental settings, and design interpretable and robust ML models of human decision-making.

Journal reference:

Thomas, T., Straub, D., Tatai, F. et al. Modelling dataset bias in machine-learned theories of economic decision-making. Nat Hum Behav (2024). https://doi.org/10.1038/s41562-023-01784-6, https://www.nature.com/articles/s41562-023-01784-6.

Posted in: AI Research News

Comments (0)

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Osama, Muhammad. (2024, January 18). Unraveling Dataset Bias in Risky Choices through Machine Learning. AZoAi. Retrieved on July 03, 2025 from https://www.azoai.com/news/20240118/Unraveling-Dataset-Bias-in-Risky-Choices-through-Machine-Learning.aspx.
MLA
Osama, Muhammad. "Unraveling Dataset Bias in Risky Choices through Machine Learning". AZoAi. 03 July 2025. <https://www.azoai.com/news/20240118/Unraveling-Dataset-Bias-in-Risky-Choices-through-Machine-Learning.aspx>.
Chicago
Osama, Muhammad. "Unraveling Dataset Bias in Risky Choices through Machine Learning". AZoAi. https://www.azoai.com/news/20240118/Unraveling-Dataset-Bias-in-Risky-Choices-through-Machine-Learning.aspx. (accessed July 03, 2025).
Harvard
Osama, Muhammad. 2024. Unraveling Dataset Bias in Risky Choices through Machine Learning. AZoAi, viewed 03 July 2025, https://www.azoai.com/news/20240118/Unraveling-Dataset-Bias-in-Risky-Choices-through-Machine-Learning.aspx.