ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework

Download PDF Copy

By Aryaman PattnayakReviewed by Susha Cheriyedath, M.Sc.Sep 12 2023

A study published in Nature Machine Intelligence introduces a hybrid hierarchical learning framework called ROMAN that enables robots to solve complex long-horizon manipulation tasks. By orchestrating specialized neural networks trained on subtasks, ROMAN exhibits advanced skills beyond its demonstrations. Experiments demonstrate that the approach succeeds in randomized and unseen scenarios while recovering from failures.

*Study: ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework. Image credit: Amorn Suriyan/Shutterstock*

Humans adeptly perform lengthy sequences of varied manipulation skills with minimal cognitive effort. However, robotic systems still struggle with such compound tasks over long timescales. This study presents ROMAN, a hierarchical mixture-of-experts architecture, to address this critical challenge in embodied AI.

ROMAN integrates imitation and reinforcement learning to acquire reusable motor skills from humans. A central controller then sequences appropriate skills to accomplish multi-step jobs. This divide-and-conquer methodology allows for solving elaborate jobs by reusing simpler building blocks.

Tests in a simulated laboratory environment demonstrate ROMAN’s effectiveness on complex, interdependent subtasks. The system displays resilience to uncertainties, generalization beyond demonstrations, and failure recovery, highlighting its versatility.

ROMAN Hierarchical Framework

ROMAN consists of specialized neural networks, called experts, that focus on distinct manipulation skills. Examples include pushing, picking, inserting, and rotating objects. A central master network coordinates these experts by activating them in the required sequence to achieve composite tasks. It focuses on high-level understanding rather than low-level control.

This hierarchy provides two key advantages. First, it breaks down sophisticated jobs into manageable pieces that are easier to learn from humans. Second, the modular structure limits the complexity of the master network. ROMAN’s training methodology synergizes imitation learning to leverage human insight and reinforcement learning for additional exploration. This balances mimicking demonstrations and improving upon them.

Simulated Experiments

The researchers evaluated ROMAN on a lengthy multi-step job inspired by laboratory workflows. The robot had to grab and insert a vial, mount it on a rack, slide it, and push a button. This required coordinating seven diverse skills: pushing, picking, rotating, inserting, unboxing, pulling, and pressing. The master network successfully learned to sequence these skills from just 42 human demonstrations.

Tests under randomized initial conditions proved the system’s versatility. It adapted to arrange the skills appropriately for given scenarios. ROMAN also showed resilience to sensory noise, offsets in object positions, and occlusion. It maintained high success rates despite uncertainties and variability.

Adaptive Recovery from Failures

A key advantage of ROMAN’s architecture was exhibited when recovering from anomalous situations. Occasionally, the robot failed to retain stable grasps, resulting in drops. Nevertheless, the master network quickly detected this and re-adapted the policy to regroup and complete the subtask.

In some cases, the gripper jammed under a shelf mid-operation. ROMAN reacted by taking alternative trajectories to free itself and continue the job. Such recovery highlights the benefits of blending imitation learning with reinforcement-driven exploration. The system extends beyond demonstrated behavior when required.

Benefits Over Traditional Robot Programming

ROMAN’s learning-based approach confers several key advantages compared to conventional methods of manually programming industrial robots. Instead of requiring experts to hand-code motion sequences or tediously tune every trajectory via teach pendants, the system enables intuitive and quick programming by demonstrating reusable modular skills.

Firstly, robot motions can be acquired by simply kinesthetically guiding the arm through critical subtasks multiple times. The modular network architecture allows encapsulating these demonstrations into coherent skills without intensive additional engineering. This leverages a natural means of communicating desired physical capabilities.

Secondly, the approach inherently encodes nuanced force and motion signatures that are often challenging to specify analytically. Compliant insertion, controlled pushing, and delicate grasping behaviors emerge from observing humans. This facilitates precise and safe interaction skills demanded in many manipulation tasks.

Thirdly, the object-centric representation of skills learned from demonstrations allows flexible adaptation to varying configurations and workplace conditions. ROMAN’s skills remain robust to differing object poses and unexpected disturbances, unlike fixed programmed trajectories. The system exhibits reliable performance across randomized scenarios.

Fourthly, the software integration with standard robot middleware enables deployment on actual platforms in the field. Learned skills can be directly transferred to actual world equipment rather than confined to simulators. The modular design integrates with existing industrial control architectures.

Finally, the discrete, semantic nature of separately trained skills provides interpretability and explicability compared to end-to-end neural policies. Humans can comprehend the reusable building blocks and monitor the high-level task execution. This transparency will be essential for adopting learning-driven automation in safety-critical domains.

Future Outlook

While showing significant progress, some key challenges remain for ROMAN and similar learning-based manipulation frameworks before they can be extensively adopted in real-world settings. A limitation is that the time required to evaluate and sequence multiple skills grows exponentially as the variations increase. This quickly becomes intractable for online adaptation or replanning, preventing responding to rapid changes. More efficient search or approximation methods will be necessary to scale up skill repertoires.

Additionally, the current approach assumes static environments while executing each constituent skill open-loop. Nevertheless, dynamically moving objects will cause failures. Tighter perception-action loops are needed via predictive state representations and reactive policies.

Another issue is that human demonstrations may not reveal optimal robot policies, limited by our capabilities and suboptimalities. Extracting the proper physical parameters and behaviors from imperfect examples remains an open research problem in imitation learning.

Furthermore, high-level task constraints are challenging to incorporate during the sequencing process explicitly. Requirements like stability, geometric limitations, and collision avoidance require more sophisticated planning architectures. Finally, providing safety guarantees for autonomous skill execution around humans poses challenges that learning-based systems still need to solve entirely. Rigorous verification and explainable supervision will be essential for manipulating near people.

Future research should address these limitations through advances in real-time motion planning, multimodal state estimation, human-robot collaboration, physically constrained optimization, sim-to-real transfer, and formal policy verification. Overcoming these gaps will accelerate intelligent robot assistants’ safe and reliable deployment in human environments.

Journal reference:

Triantafyllidis, E., Acero, F., Liu, Z., & Li, Z. (2023). Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN. Nature Machine Intelligence, 1–15. https://doi.org/10.1038/s42256-023-00709-2, https://www.nature.com/articles/s42256-023-00709-2

Posted in: AI Research News

Comments (0)

Written by

Aryaman Pattnayak

Aryaman Pattnayak is a Tech writer based in Bhubaneswar, India. His academic background is in Computer Science and Engineering. Aryaman is passionate about leveraging technology for innovation and has a keen interest in Artificial Intelligence, Machine Learning, and Data Science.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Pattnayak, Aryaman. (2023, September 12). ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework. AZoAi. Retrieved on July 11, 2025 from https://www.azoai.com/news/20230912/ROMAN-Advancing-Robots-with-a-Hybrid-Hierarchical-Learning-Framework.aspx.
MLA
Pattnayak, Aryaman. "ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework". AZoAi. 11 July 2025. <https://www.azoai.com/news/20230912/ROMAN-Advancing-Robots-with-a-Hybrid-Hierarchical-Learning-Framework.aspx>.
Chicago
Pattnayak, Aryaman. "ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework". AZoAi. https://www.azoai.com/news/20230912/ROMAN-Advancing-Robots-with-a-Hybrid-Hierarchical-Learning-Framework.aspx. (accessed July 11, 2025).
Harvard
Pattnayak, Aryaman. 2023. ROMAN: Advancing Robots with a Hybrid Hierarchical Learning Framework. AZoAi, viewed 11 July 2025, https://www.azoai.com/news/20230912/ROMAN-Advancing-Robots-with-a-Hybrid-Hierarchical-Learning-Framework.aspx.