Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills

As children and adults effortlessly solve analogies across familiar and new domains, large language models falter, exposing the rigid limitations of artificial intelligence in understanding abstract relationships.

Research: Can Large Language Models generalize analogy solving like people can? Image Credit: Shutterstock AI

Research: Can Large Language Models generalize analogy solving like people can? Image Credit: Shutterstock AI

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

In an article submitted to the arXiv preprint* server, researchers at the University of Amsterdam, the Netherlands, and the Santa Fe Institute, USA, investigated whether large language models (LLMs) can generalize analogy solving to new domains as humans can.

Children, adults, and LLMs were tested on letter-string analogies in the Latin alphabet, Greek alphabet, and a list of symbols. While humans easily generalized their knowledge across domains, LLMs encountered difficulties with transfer. This difference highlights the significant challenge LLMs face in achieving human-like analogical reasoning.

Background

Past work has explored whether LLMs can generalize analogical reasoning to new domains, like humans. While children and adults can quickly transfer knowledge across domains, LLMs have difficulty, especially with more abstract or far-transfer analogies.

Letter-string analogies have been used to study this, showing that LLMs perform comparably to humans in familiar domains but face challenges in novel ones. This raises questions about whether LLMs truly understand analogical reasoning or simply mimic patterns.

Comparing Performance Across Models

The study compared the performance of 42 children (aged 7-9), 62 adults, and 54 attempts by each of four prominent LLMs—Claude-3.5, Gemma-2 27B, generative pretrained transformer (GPT-4o), and Llama-3.1 405B—on a letter-string analogy task.

The analogies involved alphabetic string transformations, where participants had to generalize a transformation rule from one string to another.

The task presented in three alphabets—Latin, Greek, and Symbol—tested how well participants could transfer learned patterns across familiar and unfamiliar alphabets. The task was designed using simple transformations, such as successor and predecessor shifts or letter repetitions, that children were expected to recognize.

The letter-string analogy task was adapted for each alphabet. A series of transformations, such as "abc" changing to "abd," were used for the Latin alphabet. These transformations were adapted to Greek (for near transfer) and a unique Symbol alphabet (for far transfer).

The Greek alphabet was chosen because it visually resembles the Latin alphabet but is unfamiliar to children. In contrast, the Symbol alphabet was designed to be an entirely new and abstract domain. The goal was to test how participants generalized the transformation rules across different symbol sets.

Data collection for human participants was conducted online for adults and in person for children. Adults were recruited through Prolific and completed the task in a web browser, while children aged 7-9 were recruited from local Montessori schools and completed the task on tablets.

Both groups were given initial practice items to check understanding before completing the main task, which involved five items for each alphabet. Children were instructed verbally, while adults followed written instructions. In total, 42 children and 62 adults participated, with a few exclusions based on predefined criteria.

For the LLMs, data was collected from six different models, including Claude-3.5, Gemma-2 27B, GPT-4o, and Llama-3.1 405B. The models were presented with the same task conditions as the human participants, with Greek and Symbol alphabet modifications. The models were prompted in a zero-shot setting and were administered the task using specialized prompt templates optimized for LLM performance.

Each model's performance was evaluated across variations in the tasks to ensure robust comparisons, and the results showed that the larger models outperformed smaller ones. In contrast, others, like Mistral and Qwen, showed poorer performance.

LLMs' Alphabet Performance

The study aimed to compare the performance of adults, children, and LLMs on letter-string analogy problems in different alphabets.

Mixed analysis of variance (ANOVAs) were conducted to evaluate (1) the differences in performance between participant groups (Adults, Children, and LLMs) on the Latin alphabet and (2) the ability of these groups to generalize analogy-solving across alphabets (Latin, Greek, and Symbol).

The results revealed that, as expected, adults and some LLMs outperformed children in solving analogies with the Latin alphabet. OpenAI's GPT-4o performed similarly to adults, while Meta's Llama-3.1 405B followed closely behind. In contrast, Gemma-2 27B and Claude-3.5 had weaker performances in this domain.

The study found that while adults and children performed consistently across alphabets, LLMs' performance degraded from Latin to Greek and Symbol, particularly in the Symbol domain. LLMs excelled at simple transformations but struggled with more complex transformations, such as second successor rules.

To better understand the LLMs' struggles, a Next-Previous Letter Task was designed, where LLMs were asked to identify the previous and next letters in a sequence.

The results showed that while LLMs successfully handled simple transformations, they needed help with complex transformations, particularly in less familiar alphabets.

Further error analysis revealed that the LLMs often relied on the "Literal rule," copying the final character rather than applying the correct transformation rule.

Conclusion

To sum up, the study found that while LLMs performed well on letter-string analogies in the familiar Latin alphabet, their performance deteriorated in less familiar alphabets like Greek and Symbol.

The LLMs needed help generalizing abstract rules and were prone to simpler errors when transformations involved unfamiliar symbols. Unlike humans, who can quickly adapt to novel alphabets, LLMs’ rigid abstraction methods hindered their performance.

These findings highlight LLMs' challenges in transferring analogical reasoning across domains, indicating a fundamental difference between human and artificial general intelligence.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
  • Preliminary scientific report. Stevenson, C. E., Pafford, A., J., H. L., & Mitchell, M. (2024). Can Large Language Models generalize analogy solving as people can? arXiv. DOI: 10.48550/arXiv.2411.02348, https://arxiv.org/abs/2411.02348
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, November 11). Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills. AZoAi. Retrieved on November 13, 2024 from https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx.

  • MLA

    Chandrasekar, Silpaja. "Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills". AZoAi. 13 November 2024. <https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills". AZoAi. https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx. (accessed November 13, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Can AI Think Like Kids? Study Reveals Major Gaps in LLMs’ Analogical Skills. AZoAi, viewed 13 November 2024, https://www.azoai.com/news/20241111/Can-AI-Think-Like-Kids-Study-Reveals-Major-Gaps-in-LLMse28099-Analogical-Skills.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.