Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Joel ScanlonNov 11 2024

As children and adults effortlessly solve analogies across familiar and new domains, large language models falter, exposing the rigid limitations of artificial intelligence in understanding abstract relationships.

Research: Can Large Language Models generalize analogy solving like people can? Image Credit: Shutterstock AI

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

In an article submitted to the arXiv preprint* server, researchers at the University of Amsterdam, the Netherlands, and the Santa Fe Institute, USA, investigated whether large language models (LLMs) can generalize analogy solving to new domains as humans can.

Children, adults, and LLMs were tested on letter-string analogies in the Latin alphabet, Greek alphabet, and a list of symbols. While humans easily generalized their knowledge across domains, LLMs encountered difficulties with transfer. This difference highlights the significant challenge LLMs face in achieving human-like analogical reasoning.

Background

Past work has explored whether LLMs can generalize analogical reasoning to new domains, like humans. While children and adults can quickly transfer knowledge across domains, LLMs have difficulty, especially with more abstract or far-transfer analogies.

Letter-string analogies have been used to study this, showing that LLMs perform comparably to humans in familiar domains but face challenges in novel ones. This raises questions about whether LLMs truly understand analogical reasoning or simply mimic patterns.

Comparing Performance Across Models

Error Patterns Reveal Gaps: LLMs frequently used a "Literal rule" approach in analogies, often copying the last character instead of applying abstract transformation rules, a tendency absent in human participants.

The study compared the performance of 42 children (aged 7-9), 62 adults, and 54 attempts by each of four prominent LLMs—Claude-3.5, Gemma-2 27B, generative pretrained transformer (GPT-4o), and Llama-3.1 405B—on a letter-string analogy task.

The analogies involved alphabetic string transformations, where participants had to generalize a transformation rule from one string to another.

The task presented in three alphabets—Latin, Greek, and Symbol—tested how well participants could transfer learned patterns across familiar and unfamiliar alphabets. The task was designed using simple transformations, such as successor and predecessor shifts or letter repetitions, that children were expected to recognize.

The letter-string analogy task was adapted for each alphabet. A series of transformations, such as "abc" changing to "abd," were used for the Latin alphabet. These transformations were adapted to Greek (for near transfer) and a unique Symbol alphabet (for far transfer).

The Greek alphabet was chosen because it visually resembles the Latin alphabet but is unfamiliar to children. In contrast, the Symbol alphabet was designed to be an entirely new and abstract domain. The goal was to test how participants generalized the transformation rules across different symbol sets.

Predecessor and Successor Rules Challenge: The study found that while LLMs could handle basic analogy rules, they struggled significantly with more complex transformations, like identifying a "second successor" or "second predecessor" in novel alphabets.

Data collection for human participants was conducted online for adults and in person for children. Adults were recruited through Prolific and completed the task in a web browser, while children aged 7-9 were recruited from local Montessori schools and completed the task on tablets.

Both groups were given initial practice items to check understanding before completing the main task, which involved five items for each alphabet. Children were instructed verbally, while adults followed written instructions. In total, 42 children and 62 adults participated, with a few exclusions based on predefined criteria.

For the LLMs, data was collected from six different models, including Claude-3.5, Gemma-2 27B, GPT-4o, and Llama-3.1 405B. The models were presented with the same task conditions as the human participants, with Greek and Symbol alphabet modifications. The models were prompted in a zero-shot setting and were administered the task using specialized prompt templates optimized for LLM performance.

Each model's performance was evaluated across variations in the tasks to ensure robust comparisons, and the results showed that the larger models outperformed smaller ones. In contrast, others, like Mistral and Qwen, showed poorer performance.

LLMs' Alphabet Performance

The study aimed to compare the performance of adults, children, and LLMs on letter-string analogy problems in different alphabets.

Insights into Human Cognitive Flexibility: The study suggests that cognitive flexibility in analogy solving, particularly in young children, could hold key insights for developing more adaptive AI systems capable of cross-domain generalization.

Mixed analysis of variance (ANOVAs) were conducted to evaluate (1) the differences in performance between participant groups (Adults, Children, and LLMs) on the Latin alphabet and (2) the ability of these groups to generalize analogy-solving across alphabets (Latin, Greek, and Symbol).

The results revealed that, as expected, adults and some LLMs outperformed children in solving analogies with the Latin alphabet. OpenAI's GPT-4o performed similarly to adults, while Meta's Llama-3.1 405B followed closely behind. In contrast, Gemma-2 27B and Claude-3.5 had weaker performances in this domain.

The study found that while adults and children performed consistently across alphabets, LLMs' performance degraded from Latin to Greek and Symbol, particularly in the Symbol domain. LLMs excelled at simple transformations but struggled with more complex transformations, such as second successor rules.

To better understand the LLMs' struggles, a Next-Previous Letter Task was designed, where LLMs were asked to identify the previous and next letters in a sequence.

The results showed that while LLMs successfully handled simple transformations, they needed help with complex transformations, particularly in less familiar alphabets.

Further error analysis revealed that the LLMs often relied on the "Literal rule," copying the final character rather than applying the correct transformation rule.

Conclusion

To sum up, the study found that while LLMs performed well on letter-string analogies in the familiar Latin alphabet, their performance deteriorated in less familiar alphabets like Greek and Symbol.

The LLMs needed help generalizing abstract rules and were prone to simpler errors when transformations involved unfamiliar symbols. Unlike humans, who can quickly adapt to novel alphabets, LLMs’ rigid abstraction methods hindered their performance.

These findings highlight LLMs' challenges in transferring analogical reasoning across domains, indicating a fundamental difference between human and artificial general intelligence.

Journal reference:

Preliminary scientific report. Stevenson, C. E., Pafford, A., J., H. L., & Mitchell, M. (2024). Can Large Language Models generalize analogy solving as people can? arXiv. DOI: 10.48550/arXiv.2411.02348, https://arxiv.org/abs/2411.02348

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, November 11). Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills. AZoAi. Retrieved on July 18, 2025 from https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx.
MLA
Chandrasekar, Silpaja. "Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills". AZoAi. 18 July 2025. <https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx>.
Chicago
Chandrasekar, Silpaja. "Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills". AZoAi. https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx. (accessed July 18, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills. AZoAi, viewed 18 July 2025, https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx.