In a recent submission to the arXiv server*, researchers introduced Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a system that symbiotically amalgamates Task and Motion Planning (TAMP) with teleoperation. This system engages a TAMP-gated control mechanism, facilitating the exchange of control between a TAMP system and a human teleoperator.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Background
In the field of robotics, the utilization of human demonstrations to impart intricate manipulation skills to robots has shown considerable promise. However, translating this paradigm into real-world, extended tasks presents challenges due to the labor-intensive nature of providing prolonged manipulation demonstrations. Not all aspects of a task pose equal difficulties. Nonetheless, these planning methods often require precise dynamics models and accurate perception, limiting their effectiveness in contact-rich and low-tolerance manipulation scenarios.
To overcome these obstacles, this research aims to integrate the benefits of learning and planning approaches, focusing on enhancing TAMP systems. TAMP techniques are skilled at devising strategies for complex manipulation tasks by exploring combinations of basic skills. This approach combines human teleoperation and closed-loop learning for specific skills while automating other tasks. Challenges arise in integrating these components to optimize human resource utilization.
In the current study, researchers considered robots navigating a discrete-time Markov Decision Process (MDP) defined by state space, action space, transition distribution, reward function, and initial state distribution. An offline dataset of partial demonstration trajectories was collected via the HITL-TAMP system. Policies are trained using behavioral cloning to minimize the loss.
Integrating human teleoperation and TAMP
Researchers introduced essential components to ensure seamless compatibility between TAMP and traditional human teleoperation systems. These components include a novel constraint learning mechanism for TAMP to plan states suitable for human teleoperation and the core TAMP-gated teleoperation algorithm. Incorporating human teleoperation into the planning process necessitates an approximate model of the teleoperation process.
This approach builds on Wang et al.'s high-level modeling, defining action schemas for each skill, specifying constraints that classical techniques can model, and extracting the remaining constraints from a limited number of teleoperation trajectories. Model learning aims to generatively represent enabling preconditions. Constraint learning involves bootstrapping constraint models from a few human demonstrations, creating a dataset of pose relations, and grasping affordances.
The TAMP-gated teleoperation system determines when a human operator should control a task segment and when to resume control. Each task has a goal formula, and during TAMP iterations, the system assesses the current state, plans actions, and transitions to human intervention if necessary. After completing the task segment, control returns to the TAMP system for further planning and execution, ensuring efficient cooperation between TAMP and human operators.
Scaling data collection for learning
To boost data throughput, a queuing system is introduced for concurrent management of multiple robots and data collection sessions. This system involves a single human process, multiple robot processes, and a shared queue. Each robot process operates asynchronously in one of three modes: controlled by the TAMP system, awaiting human control, or under human control. This enables the parallel operation of multiple robots.
When the TAMP system requires human intervention, it enqueues the environment, allowing the human process to communicate with and control one robot at a time. After each human-controlled segment, the TAMP system regains control, and the next session in the queue is dequeued.
Regarding policy deployment, HITL-TAMP demonstrations involve both TAMP and human-controlled segments. A policy is trained using behavioral cloning on the human-controlled parts. To deploy the learned agent, a TAMP-gated control loop, like the handoff logic, is used, with the policy replacing the human.
Experimental results
In the evaluation of HITL-TAMP, tasks were selected to validate its capability in contact-rich, long-horizon scenarios. Variants of these tasks involved objects placed in broad workspace regions, a challenging condition for previous imitation learning systems. A pilot user study involved 15 participants, comparing HITL-TAMP with a conventional teleoperation system.
Each participant performed demonstrations on three tasks for 10 minutes on each system. HITL-TAMP demonstrated significantly higher data throughput, with users collecting 2.5x to 4.5x more demonstrations on various tasks compared to the conventional system. Policies trained on HITL-TAMP data outperformed those trained with conventional teleoperation data.
HITL-TAMP efficiently enabled non-experts to collect high-quality demonstration data. Four out of 15 users with no prior teleoperation experience were able to gather more data using HITL-TAMP, and policies trained on their data achieved higher success rates compared to the conventional system. In terms of policy learning, HITL-TAMP allowed training proficient agents from multi-user data, with high success rates achieved.
The system was broadly applicable to a range of contact-rich and long-horizon tasks, with near-perfect agents trained on several tasks, even in challenging scenarios. HITL-TAMP outperformed conventional teleoperation systems, even when using equal amounts of task demonstrations, resulting in shorter data collection times. Additionally, deploying learned agents with TAMP-gated control significantly increased their success rates, demonstrating compatibility with datasets consisting of entire human demonstration trajectories.
Real robot validation with a physical setup showed that HITL-TAMP achieved successful policy learning results in tasks involving a robotic arm, cameras, and perception challenges, outperforming previous approaches. The TAMP-gated agent achieved success rates ranging from 62 percent to 74 percent across various tasks.
Conclusion
In summary, researchers introduced a novel approach, HITL-TAMP, that integrates automated planning and human control to teach robots intricate manipulation skills. Through a TAMP-gated control mechanism, it collects human demonstrations and learns preimage models of human skills. HITL-TAMP enables efficient human supervision of worker robots, improving data collection and policy learning efficiency compared to traditional full-task human demonstrations.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.