Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models

Download PDF Copy

By Aryaman PattnayakReviewed by Susha Cheriyedath, M.Sc.Jan 5 2024

An article published in the journal SageJournals explores how standard psychometric tests designed for humans can be adapted to evaluate analogous psychological traits in large language models (LLMs). LLMs have become integral to natural language processing applications but may inadvertently acquire biases or views from their training data. The authors propose "AI psychometrics" – leveraging psychometric testing to analyze LLMs' traits and behavior systematically.

*Study: Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models. Image credit: Summit Art Creations/Shutterstock*

Understanding LLMs' "Psychological" Profiles

The massive datasets used to train LLMs contain traces of countless author personalities, values, and biases. Through their complex training process, models may absorb and exhibit similar psychological characteristics that manifest in their downstream behavior. Specifically, the corpora contain sediments of authors' non-cognitive traits such as personality, values, morality, and attitudes. Although LLMs are not sentient, their neural architecture and training techniques enable them to mimic such human traits. As human development channels our psychological makeup, factors like model architecture and training data curation shape how models acquire traits.

There are clear parallels to human socialization, but significant dissimilarities remain. LLMs' traits originate purely from language, and their behavioral range is limited relative to humans. Still, if deployed incautiously, their encoded biases could impact individuals or groups in applications like AI recruitment tools. Careful analysis is thus warranted.

Metaphorically, psychometrics can offer a "lens" into models' psychological profiles. Tests designed for humans can be repurposed, and models respond to verbal questionnaire items by generating a probability distribution over possible responses. Aggregated scores indicate models' trait levels, enabling standardized comparisons within and between models.

Linking Psychometrics and AI

Earlier attempts to apply psychometrics focused narrowly on cognitive assessments, aiming to demonstrate computer programs that could compete with humans on intelligence tests. The 1960s saw basic emotional mechanisms introduced into architectures to address critiques about "inhumane" intelligent systems.

By the 2000s, some proposed "psychometric AI" to consolidate experimental psychology into singular systems that could perform well on established mental ability tests. However, most efforts concentrated on intelligence and cognitive evaluations.

Modern LLMs' natural language capabilities enable analysis across a broader range of socially relevant psychological traits using non-cognitive tests of personality, values, morality, and attitudes. Their advanced language understanding and generation surpass humans on various benchmarks. Where previous models required explicit affective components, today's self-supervised LLMs inadvertently acquire rich psychological nuances from their training corpora.

Approaches for Psychometric Assessments

The authors describe three potential methods:

Masked language prediction presents consecutive questionnaire items to models to predict masked words. Issues arise around ordering effects and aggregation.
Next-word prediction elicits open-ended continuations of item stems. However, this risks inconsistent or stochastically generated responses.
Zero-shot inference presents total items with possible responses, avoiding these problems. Models select the most probabilistically entailed response.

They focus demonstrations on the latter, presenting models with established inventory items and verbal response options to choose between. This resolves vulnerabilities around output randomness while leveraging robust psychometric questionnaires. Responses determine models' trait levels.

Assessing Models' Beliefs

Demonstrations applied well-validated inventories assessing the following:

Big Five personality traits
"Dark triad" traits
Schwartz's fundamental human values
Moral foundations
Beliefs about gender/sex diversity

Personality results indicated balanced, socially positive profiles across models. However, directly assessing dark traits revealed higher Machiavellianism and narcissism in specific models.

Comparing scores for male versus female value inventory versions found slight gender biases. One model's "male" achievement score noticeably exceeded its "female" score. Models diverged from Americans on moral beliefs, emphasizing purity, authority, and in-group foundations associated with social conservatism. For gender beliefs, models emphasized gender uniformity over diversity, with little affirmation of non-traditional identities. This suggests potential difficulty in appropriately handling such gender aspects.

Open Challenges and Conclusions

Many questions remain regarding reliability, validity, stability over time, deliberately engineering traits, multimodal assessments, integrating psychometrics into continual monitoring, and linking profiles to downstream behaviors.

Future priorities also include the following:

Testing consistency using related questionnaires
Comparing models trained on specific corpora
Adversarially probing responses
Synthetically sampling to simulate target populations
Enabling trait manipulation for research ethics and safety
Expanding assessments to other data modalities like visual, audio, and video
Embedding improved monitoring in development lifecycles
Uncovering profile influence on decision-making behaviors

However, "AI psychometrics" already offers exciting opportunities to apply human methods for rigorously and rigorously yet responsibly enhanced model transparency and oversight. As language remains the backbone of both psychometric questionnaires and modern LLMs, adapting standardized human tests represents a promising path toward illuminating model capabilities and limitations.

Metaphorically "assessing" LLMs avoids anthropomorphic pitfalls while providing empirical insights into their capacities and deficiencies as increasingly impactful sociotechnical systems. Continued psychometric analysis will further understand how models acquire and exhibit psychological traits that shape their real-world functioning. Researchers should leverage these tools for transparent and accountable AI advancement.

Journal reference:

Pellert, M., Lechner, C. M., Wagner, C., Rammstedt, B., & Strohmaier, M. (2024). AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories. Perspectives on Psychological Science. DOI: 10.1177/17456916231214460, https://journals.sagepub.com/doi/10.1177/17456916231214460

Posted in: AI Research News

Comments (0)

Written by

Aryaman Pattnayak

Aryaman Pattnayak is a Tech writer based in Bhubaneswar, India. His academic background is in Computer Science and Engineering. Aryaman is passionate about leveraging technology for innovation and has a keen interest in Artificial Intelligence, Machine Learning, and Data Science.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Pattnayak, Aryaman. (2024, June 24). Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models. AZoAi. Retrieved on July 05, 2025 from https://www.azoai.com/news/20240105/Applying-Human-Designed-Tests-to-Evaluate-Psychological-Traits-in-Large-Language-Models.aspx.
MLA
Pattnayak, Aryaman. "Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models". AZoAi. 05 July 2025. <https://www.azoai.com/news/20240105/Applying-Human-Designed-Tests-to-Evaluate-Psychological-Traits-in-Large-Language-Models.aspx>.
Chicago
Pattnayak, Aryaman. "Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models". AZoAi. https://www.azoai.com/news/20240105/Applying-Human-Designed-Tests-to-Evaluate-Psychological-Traits-in-Large-Language-Models.aspx. (accessed July 05, 2025).
Harvard
Pattnayak, Aryaman. 2024. Applying Human-Designed Tests to Evaluate Psychological Traits in Large Language Models. AZoAi, viewed 05 July 2025, https://www.azoai.com/news/20240105/Applying-Human-Designed-Tests-to-Evaluate-Psychological-Traits-in-Large-Language-Models.aspx.