RoboGen: Automating Robotic Skill Learning with GPT-4

Download PDF Copy

By Aryaman PattnayakReviewed by Susha Cheriyedath, M.Sc.Nov 7 2023

In a paper submitted to the arxiv* server, researchers developed RoboGen, a generative framework that leverages large language models to generate diverse robotic skills and endless training data automatically. The system marks a milestone towards scalable and automated robotic skill learning with minimal human involvement.

*Study: RoboGen: Automating Robotic Skill Learning with GPT-4. Image credit: Generated using DALL.E.3*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Teaching robots diverse skills like manipulating deformable objects or performing dexterous in-hand object reorientation remains an open challenge. Physics simulators have become indispensable for robotic skill learning by providing unlimited exploration opportunities and accelerated data collection compared to the real world. However, constructing skill learning environments in these simulators requires extensive manual effort in designing tasks, selecting relevant assets, generating plausible spatial layouts, and crafting training supervisions. The laborious process of hand-crafting these components for countless real-world tasks severely bottlenecks robotic skill learning even in simulated settings.

Recent advances in foundation models like Generative Pretrained Transformer 3 (GPT-3) offer new pathways to automate robotic skill learning by extracting their vast common sense knowledge. While directly employing these models for low-level control policies remains challenging due to their need to understand physical interactions, the researchers propose strategically utilizing them for higher-level scene and task generation better suited to their capabilities.

The Study

The authors constructed an automated pipeline called RoboGen that generates all necessary components for robotic skill learning in simulation. Their system utilizes the latest GPT 4 model and comprises four key stages:

Task Proposal: This stage focuses on generating meaningful and diverse tasks for robots to learn. Instead of directly prompting GPT-4, RoboGen initializes with a robot type and randomly sampled object to provide a basis for GPT-4 to reason about possible tasks. The robot and object affordances inform what kinds of meaningful interactions can occur. For articulated objects, it also provides the category, articulation tree, and semantic link descriptions to aid task reasoning. With this conditioning information, GPT-4 proposes various tasks involving interactions between the robot and objects. By sampling different robots and objects, RoboGen elicits an endless stream of novel, valuable tasks.

Scene Generation: With a proposed task, RoboGen generates an entire 3D scene for training the skill. It queries GPT-4 to retrieve additional relevant objects from a database or generate them from text descriptions. To ensure valid sizes and spatial relationships for a plausible scene layout, it prompts GPT-4 to output physically reasonable object sizes and poses conditioned on the task description.

For articulated objects, it further prompts for valid initial configurations based on the task needs. The final scene population assets are retrieved from a database of over 800k 3D object meshes based on textual similarity or generated from text prompts. It incorporates verification steps with vision-and-language models to filter unsuitable retrieved objects.

Training Supervision: To facilitate skill learning, RoboGen elicits training supervision from GPT-4 tailored to the generated task. It first prompts GPT-4 to decompose long-horizon tasks into shorter sub-tasks. Then, it asks GPT-4 to choose suitable learning algorithms like reinforcement learning, trajectory optimization, or motion planning for each sub-task based on provided examples. It further prompts GPT-4 to author reward functions and suggest action spaces using provided simulator API calls. This provides the necessary supervision for acquiring the skills.

Skill Learning: Using all the generated information, RoboGen constructs training environments in the Genesis simulator and acquires skills using algorithms chosen by GPT-4. It integrates reinforcement learning, trajectory optimization, motion planning, and evolutionary strategies, each suited for different tasks. For instance, it selects trajectory optimization for fine manipulation of soft bodies based on the differentiability of Genesis. The system learns policies for the proposed skills using the automatically generated training supervision.

Results

The researchers demonstrated RoboGen’s capabilities across diverse manipulation skills, soft-body shaping, and locomotion. It generates more varied tasks compared to prior benchmarks according to automated diversity metrics. The automatically retrieved assets and sizes result in more valid scenes versus ablations, and the generated training supervisions successfully induce complex, multi-step skills. Integrating multiple learning algorithms significantly improves success over just reinforcement learning.

Importantly, RoboGen’s fully automated pipeline can produce an endless stream of skill demonstrations when queried repeatedly. The authors have released videos of over 100 learned skills, spanning articulated object manipulation like opening doors, non-prehensile manipulation of soft bodies, and locomotion over uneven terrain.

While RoboGen shows promise, limitations exist around large-scale skill verification, sim-to-real transfer, and the robustness of current learning algorithms. Employing better foundation models, iterative reward refinement, improving simulation realism, and integrating more advanced policy learning techniques can help overcome these challenges in future work.

Future Outlook

This work represents an essential step towards unleashing infinite training data for robotic skill learning, enabled by the common sense knowledge embedded in large language models. While limitations exist around sim-to-real transfer and skill verification, the increasing realism of physics simulators and future improvements in foundation models can help address these challenges.

Overall, the proposed RoboGen framework marks a milestone in automating generalized robot learning by extracting versatile knowledge from data-driven generative models. The authors open-source RoboGen to spur more research into integrating learning-based methods with analytical robotics techniques.

Journal reference:

Preliminary scientific report. Wang, Y., Xian, Z., Chen, F., Wang, T.-H., Wang, Y., Fragkiadaki, K., Erickson, Z., Held, D., & Gan, C. (2023). RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation. ArXiv.org. https://doi.org/10.48550/arXiv.2311.01455, https://arxiv.org/abs/2311.01455

Posted in: AI Research News

Comments (0)

Written by

Aryaman Pattnayak

Aryaman Pattnayak is a Tech writer based in Bhubaneswar, India. His academic background is in Computer Science and Engineering. Aryaman is passionate about leveraging technology for innovation and has a keen interest in Artificial Intelligence, Machine Learning, and Data Science.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Pattnayak, Aryaman. (2023, November 07). RoboGen: Automating Robotic Skill Learning with GPT-4. AZoAi. Retrieved on July 18, 2025 from https://www.azoai.com/news/20231107/RoboGen-Automating-Robotic-Skill-Learning-with-GPT-4.aspx.
MLA
Pattnayak, Aryaman. "RoboGen: Automating Robotic Skill Learning with GPT-4". AZoAi. 18 July 2025. <https://www.azoai.com/news/20231107/RoboGen-Automating-Robotic-Skill-Learning-with-GPT-4.aspx>.
Chicago
Pattnayak, Aryaman. "RoboGen: Automating Robotic Skill Learning with GPT-4". AZoAi. https://www.azoai.com/news/20231107/RoboGen-Automating-Robotic-Skill-Learning-with-GPT-4.aspx. (accessed July 18, 2025).
Harvard
Pattnayak, Aryaman. 2023. RoboGen: Automating Robotic Skill Learning with GPT-4. AZoAi, viewed 18 July 2025, https://www.azoai.com/news/20231107/RoboGen-Automating-Robotic-Skill-Learning-with-GPT-4.aspx.