In the pursuit of augmenting the adaptability of artificial intelligence (AI) in real-world scenarios, continual learning aims to maintain a delicate equilibrium between memory stability and learning plasticity. While current methodologies predominantly focus on preserving memory stability, they encounter challenges in effectively accommodating incremental changes.
In a recent publication in the journal Nature Machine Intelligence, researchers proposed a generic solution that employs multiple learning modules to actively regulate forgetting. This approach diminishes the impact of old memories on parameter distributions, thereby enhancing learning plasticity. The study adopts a multi-learner architecture to ensure compatibility with the evolving nature of problem-solving.
Background
Continual learning, also known as lifelong learning, serves as the foundation for empowering AI systems to navigate dynamic and unpredictable real-world situations. Existing strategies primarily concentrate on preserving memory stability but exhibit limited efficacy across diverse experimental settings, especially in mitigating catastrophic forgetting in neural networks.
To address this gap, a comprehensive approach is advocated, emphasizing the necessity of balancing the memory stability of old tasks with learning plasticity for new tasks while ensuring compatibility with their respective distributions. Drawing inspiration from the Gamma (γ) subset of the Drosophila mushroom body’s (γMB) natural continual learning, the study explores the functional advantages of the γMB system, particularly its ability to actively regulate memories.
The proposed strategy integrates active forgetting with stability protection, achieving a nuanced trade-off between new and old tasks. The model, featuring parallel learning modules mirroring the compartmentalized organization of γMB, demonstrates superior generality and performance across various continual learning benchmarks.
Synaptic expansion-renormalization framework
The authors introduced a framework called synaptic expansion-renormalization for continual learning. It focuses on the case of two tasks but extends to multiple tasks. The approach involves finding a posterior distribution that combines knowledge from different tasks. The learner optimizes a loss function considering the current task and the knowledge of previous tasks.
The method introduces active forgetting with a forgetting rate to enhance learning plasticity. The framework is further extended to multiple parallel continual learners, each with its neural network. Theoretical analyses are provided, discussing generalization ability and the benefits of using multiple learners.
For practical implementation, various benchmark datasets are used, and the approach is compared with baseline methods such as elastic weight consolidation (EWC), synaptic intelligence (SI), memory-aware synapses (MAS), adaptive group sparsity-based continual learning (AGS-CL), progress and compress (P&C), and classifier-projection regularization (CPR). Evaluation metrics include average accuracy, forward transfer, backward transfer, and diversity of learners' predictions. The method is also applied to Atari reinforcement tasks, evaluating normalized average reward, normalized plasticity, and normalized stability.
The study concludes with a proposition discussing the benefits of using multiple continual learners and active forgetting to enhance generalization bounds. Theoretical analyses based on probably approximately correct (PAC)-Bayes theory are presented to provide insights into the generalization errors of the proposed solution.
Results and analysis
The precise recall of old tasks can impede the effective learning of new tasks due to differences in data distribution. Drawing inspiration from biological active forgetting, a forgetting rate is introduced to modulate the impact of old knowledge. The loss function is formulated to balance stability protection and active forgetting, with hyperparameters controlling the strengths of the respective regularization terms. Active forgetting is optimized in two equivalent ways (AF-1 and AF-2), encouraging network parameters to renormalize during new task learning. The benefits of active forgetting are theoretically analyzed, demonstrating improved learning probability for new tasks and minimizing generalization errors.
Evaluations on visual classification benchmarks showcase the efficacy of active forgetting, particularly in enhancing average accuracy and forward transfer. Additionally, the study explores a γMB-like architecture with multiple parallel continual learners, demonstrating that adaptive implementations of active forgetting enhance performance by effectively coordinating the diversity of expertise among learners. The approach proves scalable and outperforms single continual learners across various experimental settings, showcasing its applicability in task-incremental learning scenarios, including Atari reinforcement tasks.
Conclusion
In summary, the authors introduced a generic approach for continual learning in artificial neural networks inspired by biological learning systems. The proposed method demonstrates superior performance and generality, holding promise for applications in smartphones, robotics, and autonomous driving. The energy-efficient deployment of continual learning avoids retraining all previous data, aligning with eco-friendly AI development.
Active forgetting, allowing flexibility for external changes, is supported by theoretical and empirical evidence in the computational model, offering testable hypotheses for further research. The study highlights the importance of generalized theories and methodologies for integrating advances in artificial and biological intelligence, promoting mutual progress and inspiration.