Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Oct 31 2023

In an article recently submitted to the ArXiV* server, researchers have introduced Adaptive Policy Regularization (APRL), a novel framework that improves deep reinforcement learning for robots. APRL efficiently enables a quadrupedal robot to learn to walk in the real world within minutes and continue improving, even beyond the capabilities of previous methods. The researchers provide videos and code to reproduce their results. This work advances the field of deep reinforcement learning for real-world applications.

*Study: Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion. Image credit: Generated using DALL.E.3*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Enhancing Quadrupedal Locomotion with APRL

In the complex and unpredictable real world, traditional manual engineering for robot behavior needs to catch up. APRL, an Adaptive Policy Regularization system, offers a solution for efficient and adaptive learning in quadrupedal robots. By dynamically regulating the robot's action space, APRL balances exploration and efficiency, enabling rapid mastery of versatile locomotion. This approach empowers robots to navigate diverse real-world scenarios and adapt to changing dynamics.

In prior research, legged locomotion in robotics primarily relied on model-based optimal control or simulation-based learning, with limitations related to complex model engineering and the inability to learn from real-world failures. However, the current study explores direct learning in the real-world environment, with past approaches mainly focusing on higher-level trajectory actions or low-level Proportional-Derivative (PD) target actions, showcasing limitations in speed, adaptability to different terrains, and transferability.

Leveraging APRL for Efficient Quadrupedal Locomotion

The "learning to walk" task uses the Markov Decision Process (MDP) framework to enhance quadrupedal locomotion. MDP encompasses state and action spaces, initial state distribution, transition functions, reward mechanisms, and discount factors in Reinforcement Learning. The algorithm utilized integrates a critic to estimate and enhance the policy, maximizing the cumulative return. The approach builds upon actor-critic RL techniques and incorporates resets to improve plasticity. Researchers construct and train all neural networks using JAX, a library for numerical computing, and they conduct experiments on an Origin EON15-X laptop equipped with an NVIDIA GeForce Ray Tracing eXtensions (RTX) 2070 Graphics Processing Unit (GPU).

For the "learning to walk" task, the Go1 quadruped robot is employed in real-world experiments, utilizing the MuJoCo Menagerie model for simulation analysis. The robot aims to walk as fast as possible while maintaining an upright orientation. An Intel RealSense T265 camera enhances velocity estimation, and researchers use target joint angles for action, following standard practice in learned locomotion controllers.

A reward function maximizes local linear velocity while ensuring an upright orientation, along with penalties for angular velocity and torque smoothness. To strike an optimal balance between exploration and efficiency in learning, soft constraints on policy actions are introduced. These constraints are adaptively regulated based on the robot's familiarity with its current situation, as indicated by a dynamics model. The familiarity heuristic encourages more exploratory behavior in familiar settings and promotes conservative actions in unfamiliar ones. The training strategy incrementally grows joint action limits unless the dynamics model indicates a significant discrepancy, and in such cases, it rapidly shrinks the limits to enable swift adaptation.

The real-world experiments with APRL address vital questions, including whether it can enable rapid learning of quadrupedal locomotion without predefined action constraints, if it facilitates continuous improvement as more data is collected, and whether it allows learning in more challenging settings or amid changing dynamics. The presented results evaluate the robot's adaptability and efficiency in diverse real-world scenarios, including Grass, Ramp, Mattress, and Frozen Joint.

Experimental Setup and Real-World Results

In the experimental setup, researchers address critical questions about the capabilities of APRL. Researchers compare the method with the Restricted method, a previous work focused on real-world walking. Researchers introduce four new real-world scenarios (Mattress, Ramp, Grass, and Frozen Joint) to evaluate the adaptability and efficiency of the learned policies actively. These evaluations encompass diverse terrain challenges, focusing on performance related to velocity, fall counts, and relative finish times. The findings reveal that APRL enables real-world learning and continuous improvement, even in challenging scenarios, without manual action space restrictions.

Simulated Analysis and Key Insights: In the simulated analysis, researchers delve into essential questions about this approach. Explore whether restricting the action space limits the robot's achievable velocity. The comparison shows that the exploration strategy significantly influences training performance, with APRL surpassing hard-constrained policies. Determine whether APRL can approximate optimal behavior in simulations, finding it performs remarkably close to an "oracle" method without risking excessive falls. Lastly, compare with an alternative method that uses a safety critic, demonstrating the effectiveness of APRL's direct action penalization in contrast to critic-based approaches. The comprehensive investigation reveals that APRL not only facilitates real-world learning and adaptability but also excels in simulated scenarios, offering a promising approach to efficient legged locomotion learning in both domains.

Conclusion

In summary, the APRL system significantly improves real-world learning and adaptability in quadrupedal robots. While it offers more efficient learning and enhanced performance, it has limitations, including a lack of an all-encompassing safety mechanism and visual perception. Nonetheless, APRL is a crucial step towards adaptive robotic systems that can learn from real-world experiences and adapt, reducing the need for policies that never fail but can instead learn from their mistakes.

Journal reference:

Preliminary scientific report. Smith, L., Cao, Y., & Levine, S. (2023). Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion. ArXiv. https://doi.org/10.48550/arXiv.2310.17634, https://arxiv.org/pdf/2310.17634.pdf

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2023, October 31). Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion. AZoAi. Retrieved on July 13, 2025 from https://www.azoai.com/news/20231031/Enhancing-Deep-Reinforcement-Learning-for-Real-World-Robotic-Locomotion.aspx.
MLA
Chandrasekar, Silpaja. "Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion". AZoAi. 13 July 2025. <https://www.azoai.com/news/20231031/Enhancing-Deep-Reinforcement-Learning-for-Real-World-Robotic-Locomotion.aspx>.
Chicago
Chandrasekar, Silpaja. "Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion". AZoAi. https://www.azoai.com/news/20231031/Enhancing-Deep-Reinforcement-Learning-for-Real-World-Robotic-Locomotion.aspx. (accessed July 13, 2025).
Harvard
Chandrasekar, Silpaja. 2023. Enhancing Deep Reinforcement Learning for Real-World Robotic Locomotion. AZoAi, viewed 13 July 2025, https://www.azoai.com/news/20231031/Enhancing-Deep-Reinforcement-Learning-for-Real-World-Robotic-Locomotion.aspx.