Leveraging Successor Representation for Behavior Learning in Artificial Agents

In a paper recently submitted to the ArXiv* server, the authors discussed how successor representation (SR) and its generalizations have facilitated advances in artificial agents that transfer and learn behaviors from experience by reviewing some artificial intelligence (AI) applications.

Study: Leveraging Successor Representation for Behavior Learning in Artificial Agents. Image credit: frank60/Shutterstock
Study: Leveraging Successor Representation for Behavior Learning in Artificial Agents. Image credit: frank60/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

The SR is suitable for resolving sets of tasks with similar transition structures and varied reward structures. It is a cumulant of discounted occupancy. Cumulants are primarily quantities of interest summed across time. Specifically, the SR, which is a predictive representation, is a compilation of transition matrices/a predictive model. It discards individual transitions' information and replaces them with their cumulants.

Additionally, SR eliminates the need to iterate/simulate roll-outs over dynamic programming updates, unlike model-based algorithms, as it already compiles transition information into a convenient form. State values are recomputed considering the inner product between the immediate reward vector and the SR. Thus, the efficiency of SR-based value computation is similar to model-free algorithms. Moreover, the SR quickly adapts to specific types of environmental changes like model-based algorithms.

Specifically, local changes to the reward structure of an environment induce local changes within the reward function, which instantly propagate to the value estimates when these changes are combined with the SR. Thus, the flexibility of SR-based value computation is similar to model-based algorithms for reward structure changes. However, transition structure changes require more significant non-local changes to the SR owing to the non-availability of the detailed transition structure in its internal model.

Successor models and successor features

The successor model (SM) is a generalization of the SR that defines an explicit model over trajectories that are temporally abstract. Temporal abstraction refers to a conditional distribution over future states in a certain time horizon. The SM utilizes a k-step conditional distribution as its cumulant and describes a valid probability distribution as the model integrates into one. Thus, the SM can be estimated using density estimation techniques like contrastive learning, variational inference, and generative adversarial learning.

Additionally, SM is also useful for model-based control and policy evaluation. Successor features (SF) is a feature-based generalization of the SR. The SR cannot be learned or computed easily when the state space is unknown or large as it necessitates maintaining occupancy expectations over each state. In such settings, occupancy measures over cumulants, which are features shared across states, can be maintained instead of maintaining occupancy measures for states. This generalization of SR is defined as SFs.

AI applications

Learning to act and explore in the environment before exposure to reward is a strategy that allows the agent to explore the environment without exposure to punishment or reward for a certain period. The agent strives to learn a policy that can transfer to an unknown task.

For instance, this strategy was leveraged to develop agents that can explore Atari games for 250 million time-steps without any reward and then have 100,000 time-steps for earning reward. Superhuman performance was achieved across most Atari games by leveraging this strategy despite no exposure to any reward for most of the experience.

This algorithm was further improved by incorporating an intrinsic reward function that explores the surprising parts/parts inducing high entropy of the state space in the memory of the agent's experience. Thus, the approach significantly enhanced the sample efficiency for several Atari games.

Count-based exploration with a bonus of 1/√N(s) is a method that provides optimal exploration in tabular settings, where N(s) is the number of times a state (s) has been visited. Exploration in sparse-reward Atari games like Montezuma's Revenge was improved using this exploration bonus.

Learning about non-task goals is the strategy where an agent learning a task can also want to do well on another task simultaneously. This strategy can be effective for hindsight experience replay, where an agent can re-label failed experiences at reaching specific task goals to successful experiences after reaching some non-task goal, which is useful for tasks with sparse rewards.

Recently, an asymptotically unbiased importance sampling algorithm was developed that removes bias leveraging SFs while estimating value functions using hindsight experience replay, which enabled learning for both real-world robotic manipulation and simulation tasks in environments with large action and state spaces.

SFs enable transfer to tasks that are linear combinations of known tasks, which is one of the key benefits of leveraging SFs. For instance, an agent learned to move in a set of directions, such as right, left, down, and up, in continuous control settings can move in novel angles, such as up-right and down-left, instantly as required to complete a task.

Overall, this paper reviewed how SR and its variants help AI agents learn and transfer behaviors. SR captures discounted future states across similar tasks, enabling efficient decision-making and adaptation to reward changes. Examples include exploring Atari games without initial reward and transferring learned movement directions to novel angles.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report. Carvalho, W., Tomov, M. S., De Cothi, W., Barry, C., Gershman, S. J. (2024). Predictive representations: Building blocks of intelligence. ArXiv. https://arxiv.org/abs/2402.06590 
Samudrapom Dam

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dam, Samudrapom. (2024, February 14). Leveraging Successor Representation for Behavior Learning in Artificial Agents. AZoAi. Retrieved on September 20, 2024 from https://www.azoai.com/news/20240214/Leveraging-Successor-Representation-for-Behavior-Learning-in-Artificial-Agents.aspx.

  • MLA

    Dam, Samudrapom. "Leveraging Successor Representation for Behavior Learning in Artificial Agents". AZoAi. 20 September 2024. <https://www.azoai.com/news/20240214/Leveraging-Successor-Representation-for-Behavior-Learning-in-Artificial-Agents.aspx>.

  • Chicago

    Dam, Samudrapom. "Leveraging Successor Representation for Behavior Learning in Artificial Agents". AZoAi. https://www.azoai.com/news/20240214/Leveraging-Successor-Representation-for-Behavior-Learning-in-Artificial-Agents.aspx. (accessed September 20, 2024).

  • Harvard

    Dam, Samudrapom. 2024. Leveraging Successor Representation for Behavior Learning in Artificial Agents. AZoAi, viewed 20 September 2024, https://www.azoai.com/news/20240214/Leveraging-Successor-Representation-for-Behavior-Learning-in-Artificial-Agents.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Real-Time Safety Helmet Detection with Improved YOLOv5 Algorithm