Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning

In a study submitted to the arxiv* server, researchers from Carnegie Mellon University demonstrated that a quadrupedal robot can learn extremely dynamic athletic behaviors like parkour directly from pixel inputs without relying on explicit mapping or planning. Their work shows the robot traversing obstacles that are twice its height and length through end-to-end deep reinforcement learning.

Study: Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning. Image credit: Sergey Nivens/Shutterstock

Study: Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning. Image credit: Sergey Nivens/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Parkour Challenges for Robots

Parkour involves fluidly moving over obstacles by climbing, jumping, vaulting, rolling, and balancing. This requires precise coordination between perception and action since small mistakes can lead to catastrophic failure. For robots, replicating parkour skills poses significant hardware and software challenges. On the hardware side, parkour operates nearer the limits of actuation capabilities, and on the software side, perception and control must be tightly coupled to make the right moves at the right time.

Classical approaches decompose the problem into mapping, planning, and control modules. However, this requires precise sensing and actuation engineered to low tolerances. It also restricts robots to predetermined obstacle courses in lab settings. In contrast, humans learn parkour skills through practice without changing their biological hardware. The Carnegie Mellon team takes a similar learning-based approach using a low-cost, off-the-shelf quadrupedal robot.

Extreme Parkour with Legged Robots via a Single Neural Net

End-to-End Learning Framework

The robot used is the Unitree A1, which has 12 joints and stands 26 cm tall. It uses a single front-facing Intel RealSense depth camera running at 10 Hz for perception. The researchers trained a neural network policy end-to-end using reinforcement learning (RL) in simulation. This policy outputs joint motor commands directly from input depth images and proprioception without relying on an explicit terrain map.

A novel dual distillation method is proposed to dynamically enable the robot to adapt its direction based on obstacles. In Phase 1, an RL policy is trained with privileged information, including waypoint-based direction commands. In Phase 2, the policy is distilled to predict its heading direction from the depth image. At deployment, the network rapidly adjusts the heading and outputs agile control all from the depth input.

An essential contribution is also the unified reward function design. Simple inner product-based rewards lead to the emergence of diverse behaviors like jumping, balancing, and crossing gaps.

Emergent Athletic Behaviors

The trained policy exhibits several complex athletic maneuvers that seamlessly adapt to the geometry of different terrains. For example, when approaching a high box-shaped obstacle twice the height of the robot's hip joint at 0.26 meters, the policy makes finely timed adjustments to the leg motions to jump and clear it. As the robot nears the obstacle, it reduces its stride length and positions the front and hind feet at just the proper distance from the box. This allows it to kick off powerfully with the back legs while simultaneously extending the front legs upwards to grab the top edge of the box and pull itself over. The hind legs are also bent mid-air to tuck closer to the body and avoid hitting the box. Finally, the robot returns to a stable gait after clearing the tall obstacle.

Similarly, when facing a gap twice as long as the robot's 0.4-meter body length, the policy makes optimal adjustments to jump across successfully. It lines up the front feet precisely at the edge of the gap while moving the hind feet as close as possible to maximize takeoff range. Right before takeoff, the back legs thrust powerfully to propel the robot forward and upward over the gap while the front legs stretch out to reach the far side. While in mid-air, the robot extends its hind legs to apply force for as long as possible. Once across, it can land stably and continue walking.

In addition to jumping and crossing gaps, the policy can make smooth transitions between walking on four legs and just the front two legs. Balancing on two legs is inherently less stable for quadrupeds, but the robot can learn the minute active adjustments needed to maintain balance. As a result, it can walk stably in a handstand position on the front legs even while descending stairs with irregular heights. This maneuver requires proprioceptive rather than visual adaptation.

The policy can also run up tilted ramps by fluidly modulating the heading direction based on the visual input. As the angle changes partway across a ramp, the network automatically adjusts the robot's orientation to stay on course. Comparative tests on challenging parkour courses in simulation and the natural world demonstrate a 20-80% higher success rate using the learned end-to-end policy than baseline methods. Fundamental limitations of the baselines include relying on human directional inputs that are challenging to provide reactively and training without penalties for unstable foot placements near obstacle edges.

Future Work

In simulation tests, the approach outperforms baselines relying on noisy elevation maps or different reward structures in average distance traveled before falling. The method achieves higher success rates on traversing obstacles twice the robot's size on real-world parkour courses. Baselines fail to provide timely human direction adjustments or exhibit unstable edge stepping. The handstand walking skill also displays surprising robustness when deployed directly using proprioception, even down irregular outdoor steps.

The results demonstrate how a single learned neural network policy can achieve dynamic parkour on a real-world robot despite perception delays and imprecise actuation. This sets a new benchmark for agile vision-based quadrupedal skills. Future work includes extending the approach to mobile manipulation. The methodology of training adaptive policies directly from pixels using simple rewards provides a promising direction for further research in embodied AI.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
Aryaman Pattnayak

Written by

Aryaman Pattnayak

Aryaman Pattnayak is a Tech writer based in Bhubaneswar, India. His academic background is in Computer Science and Engineering. Aryaman is passionate about leveraging technology for innovation and has a keen interest in Artificial Intelligence, Machine Learning, and Data Science.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Pattnayak, Aryaman. (2023, September 28). Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning. AZoAi. Retrieved on December 22, 2024 from https://www.azoai.com/news/20230928/Quadrupedal-Robot-Learns-Parkour-Through-Deep-Reinforcement-Learning.aspx.

  • MLA

    Pattnayak, Aryaman. "Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning". AZoAi. 22 December 2024. <https://www.azoai.com/news/20230928/Quadrupedal-Robot-Learns-Parkour-Through-Deep-Reinforcement-Learning.aspx>.

  • Chicago

    Pattnayak, Aryaman. "Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning". AZoAi. https://www.azoai.com/news/20230928/Quadrupedal-Robot-Learns-Parkour-Through-Deep-Reinforcement-Learning.aspx. (accessed December 22, 2024).

  • Harvard

    Pattnayak, Aryaman. 2023. Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning. AZoAi, viewed 22 December 2024, https://www.azoai.com/news/20230928/Quadrupedal-Robot-Learns-Parkour-Through-Deep-Reinforcement-Learning.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Scaling Large Language Models Makes Them Less Reliable, Producing Confident but Incorrect Answers