In a study submitted to the arxiv* server, researchers from Carnegie Mellon University demonstrated that a quadrupedal robot can learn extremely dynamic athletic behaviors like parkour directly from pixel inputs without relying on explicit mapping or planning. Their work shows the robot traversing obstacles that are twice its height and length through end-to-end deep reinforcement learning.
Study: Quadrupedal Robot Learns Parkour Through Deep Reinforcement Learning. Image credit: Sergey Nivens/Shutterstock
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Parkour Challenges for Robots
Parkour involves fluidly moving over obstacles by climbing, jumping, vaulting, rolling, and balancing. This requires precise coordination between perception and action since small mistakes can lead to catastrophic failure. For robots, replicating parkour skills poses significant hardware and software challenges. On the hardware side, parkour operates nearer the limits of actuation capabilities, and on the software side, perception and control must be tightly coupled to make the right moves at the right time.
Classical approaches decompose the problem into mapping, planning, and control modules. However, this requires precise sensing and actuation engineered to low tolerances. It also restricts robots to predetermined obstacle courses in lab settings. In contrast, humans learn parkour skills through practice without changing their biological hardware. The Carnegie Mellon team takes a similar learning-based approach using a low-cost, off-the-shelf quadrupedal robot.
Extreme Parkour with Legged Robots via a Single Neural Net
End-to-End Learning Framework
The robot used is the Unitree A1, which has 12 joints and stands 26 cm tall. It uses a single front-facing Intel RealSense depth camera running at 10 Hz for perception. The researchers trained a neural network policy end-to-end using reinforcement learning (RL) in simulation. This policy outputs joint motor commands directly from input depth images and proprioception without relying on an explicit terrain map.
A novel dual distillation method is proposed to dynamically enable the robot to adapt its direction based on obstacles. In Phase 1, an RL policy is trained with privileged information, including waypoint-based direction commands. In Phase 2, the policy is distilled to predict its heading direction from the depth image. At deployment, the network rapidly adjusts the heading and outputs agile control all from the depth input.
An essential contribution is also the unified reward function design. Simple inner product-based rewards lead to the emergence of diverse behaviors like jumping, balancing, and crossing gaps.
Emergent Athletic Behaviors
The trained policy exhibits several complex athletic maneuvers that seamlessly adapt to the geometry of different terrains. For example, when approaching a high box-shaped obstacle twice the height of the robot's hip joint at 0.26 meters, the policy makes finely timed adjustments to the leg motions to jump and clear it. As the robot nears the obstacle, it reduces its stride length and positions the front and hind feet at just the proper distance from the box. This allows it to kick off powerfully with the back legs while simultaneously extending the front legs upwards to grab the top edge of the box and pull itself over. The hind legs are also bent mid-air to tuck closer to the body and avoid hitting the box. Finally, the robot returns to a stable gait after clearing the tall obstacle.
Similarly, when facing a gap twice as long as the robot's 0.4-meter body length, the policy makes optimal adjustments to jump across successfully. It lines up the front feet precisely at the edge of the gap while moving the hind feet as close as possible to maximize takeoff range. Right before takeoff, the back legs thrust powerfully to propel the robot forward and upward over the gap while the front legs stretch out to reach the far side. While in mid-air, the robot extends its hind legs to apply force for as long as possible. Once across, it can land stably and continue walking.
In addition to jumping and crossing gaps, the policy can make smooth transitions between walking on four legs and just the front two legs. Balancing on two legs is inherently less stable for quadrupeds, but the robot can learn the minute active adjustments needed to maintain balance. As a result, it can walk stably in a handstand position on the front legs even while descending stairs with irregular heights. This maneuver requires proprioceptive rather than visual adaptation.
The policy can also run up tilted ramps by fluidly modulating the heading direction based on the visual input. As the angle changes partway across a ramp, the network automatically adjusts the robot's orientation to stay on course. Comparative tests on challenging parkour courses in simulation and the natural world demonstrate a 20-80% higher success rate using the learned end-to-end policy than baseline methods. Fundamental limitations of the baselines include relying on human directional inputs that are challenging to provide reactively and training without penalties for unstable foot placements near obstacle edges.
Future Work
In simulation tests, the approach outperforms baselines relying on noisy elevation maps or different reward structures in average distance traveled before falling. The method achieves higher success rates on traversing obstacles twice the robot's size on real-world parkour courses. Baselines fail to provide timely human direction adjustments or exhibit unstable edge stepping. The handstand walking skill also displays surprising robustness when deployed directly using proprioception, even down irregular outdoor steps.
The results demonstrate how a single learned neural network policy can achieve dynamic parkour on a real-world robot despite perception delays and imprecise actuation. This sets a new benchmark for agile vision-based quadrupedal skills. Future work includes extending the approach to mobile manipulation. The methodology of training adaptive policies directly from pixels using simple rewards provides a promising direction for further research in embodied AI.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.