In a recent publication in the journal Nature, researchers introduced the technique known as meta-learning for compositionality (MLC). It enhances the capability of human-like systematic generalization in neural networks.
Background
Individuals exhibit a remarkable capacity to learn new concepts and systematically integrate them with pre-existing ones. This is exemplified in a child's ability to comprehend concepts like skipping backward or skipping around a cone twice once they have acquired the skill of skipping, thereby showcasing their compositional abilities.
Fodor and Pylyshyn, however, contended that neural networks lack such systematicity, questioning their suitability as cognitive models. Conversely, opposing viewpoints have surfaced, challenging the extent of systematicity in human compositional skills and proposing that neural networks, when endowed with advanced architectures, can manifest a greater degree of systematic behavior. While neural networks have made substantial progress in recent years, they still struggle with the systematicity tests that a minimally algebraic mind should pass. As technology progresses, the systematicity debate continues.
The present study provides compelling evidence that neural networks can attain human-like systematic generalization using MLC, an optimization technique designed to enhance systematicity via few-shot compositional tasks. MLC operates without additional symbolic components or hand-designed internal representations. Instead, it guides the desired behavior through high-level guidance or human examples, allowing the neural network to develop the necessary learning skills through meta-learning.
Behavioral results
The current study commenced by innovatively assessing human systematic generalization, distinct from previous approaches. Participants were engaged in processing pseudo-language instructions to generate abstract outputs without evaluating grammaticality explicitly. This approach allowed computational systems to model data directly, employing sequence-to-sequence (seq2seq) techniques from machine learning.
The systematic generalization test involved 25 participants presented with 14 study instructions. They were required to generate outputs for 10 query instructions, demanding an understanding of underlying interpretation grammar and the ability to generalize. Impressively, participants matched the algebraic standard in 80.7 percent of cases, including generalizing to longer output sequences than encountered during training, a challenging feat for neural networks.
Meta-learning for systematicity
More than 35 years ago, Fodor and Pylyshyn introduced the concept of systematicity in neural networks, which remains relevant in today's landscape of advanced language models. Preliminary experiments indicate that systematicity remains a challenging issue, even for large language models such as Generative Pre-Trained Transformers (GPT-4).
To resolve this debate and explore the extent to which neural networks can capture human-like compositional skills, a series of experiments were conducted. The results revealed that human responses often exhibit algebraic and systematic patterns, as discussed by Fodor and Pylyshyn. However, humans also rely on inductive biases, deviating from pure algebraic processes.
The introduction of memory-based MLC emerges as a solution to enhance the systematicity of neural networks. MLC empowers standard neural networks to mimic or even surpass human systematic generalization. It does so through meta-learning, instilling systematic generalizations and human biases from data rather than relying on inherent network properties. Despite its successes, MLC is not without limitations. It may struggle with unpracticed forms of generalization and concepts outside the meta-learning distribution. It does not handle nuances in inductive biases for which it was not explicitly optimized. Furthermore, the effectiveness of MLC in the full complexity of natural language and other modalities remains to be fully explored.
Results and analysis
The proposed method, MLC is designed to guide neural networks towards parameter values that enable these types of generalizations, addressing previous limitations in systematicity. Importantly, MLC focuses on modeling adult compositional skills and does not delve into the skill acquisition process, an aspect considered in the broader discussion.
MLC employs the standard transformer architecture for memory-based meta-learning. It optimizes the transformer to respond to novel instructions, adapting dynamically to changing episodes that define different seq2seq tasks. These episodes are defined through randomly generated latent grammars, enabling the transformer to extract meanings from study words and compose responses, incorporating innovations in the transformer architecture not previously considered.
In evaluating MLC, it was found to produce highly systematic behavior, match human performance, and accurately handle longer output sequences, much like humans. It exhibited inductive biases such as 'one-to-one' translations and iconic concatenations, resembling human behavior. MLC outperformed more rigidly systematic models in predicting human behavior, demonstrating its superior modeling capacity. When assessed for open-ended behavior, MLC excelled in replicating human patterns, even capturing nuanced responses partially using inductive biases. It outperformed other models in this context, highlighting its strengths for open-ended tasks.
MLC excels in two popular benchmarks: simplified versions of the CommAI Navigation tasks (SCAN) and a compositional generalization challenge based on semantic interpretation (COGS), particularly in systematic lexical generalization tasks, focusing on new words and word combinations. Despite using standard transformer components, MLC effectively handles longer sequences, enhancing its versatility. On the SCAN benchmark, MLC achieves remarkable accuracy, with error rates below 0.22 percent, even in challenging scenarios such as novel combinations of known words. In the COGS benchmark, MLC maintains an error rate of only 0.87 percent across 18 types of lexical generalization, surpassing basic seq2seq models without meta-learning by a significant margin.
Conclusion
In summary, the current study explored the challenges posed by long in-context sequences in machine learning benchmarks. It introduced a scalable approach, MLC for optimizing these benchmarks, and employed specific training and testing procedures to evaluate the model's generalization capabilities.