In an article recently submitted to the arxiv* server, researchers introduced large language model (LLM)3, a novel task and motion planning (TAMP) framework that leveraged LLMs to bridge symbolic task planning and continuous motion generation.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
LLM3 incorporated motion planning feedback iteratively to refine action sequences, alleviating the need for domain-specific interfaces. Through simulations and physical experiments, LLM3 demonstrated effectiveness in solving TAMP problems and efficiently selecting action parameters, with motion failure reasoning contributing significantly to its success.
Background
TAMP is critical for autonomous robots to effectively navigate complex environments and accomplish diverse tasks. TAMP divides planning into symbolic task planning and low-level motion planning stages. Traditional TAMP methods often rely on manually designed interfaces between symbolic and continuous domains, leading to domain-specific solutions and limited generalizability.
Previous approaches have attempted to address this challenge by incorporating data-driven heuristics or designing specialized communication modules. However, these methods lack generalizability across domains and require substantial manual effort.
This paper introduces LLM3, a novel TAMP framework that leverages pre-trained LLMs to bridge the gap between symbolic and continuous planning domains. LLM3 utilized LLMs to propose symbolic action sequences and generate continuous action parameters, benefiting from the implicit heuristics encoded in the LLM. Additionally, LLM3 reasoned over motion planning feedback to refine action sequences and parameters iteratively. By employing LLMs as both task planners and informed parameter samplers, LLM3 offered a domain-independent approach to TAMP, eliminating the need for manually designed symbolic domain files.
Methods
The researchers introduced LLM3, a TAMP framework leveraging pre-trained LLMs to reason on motion failure and generate refined action sequences. LLM3 iteratively refined symbolic actions and continuous parameters by integrating motion planning feedback. The framework alternated between reasoning with the LLM and verifying action feasibility with a motion planner, aiming to solve TAMP problems efficiently.
Each planning iteration involved generating action sequences guided by previous motion failure reasoning and updating the motion planning feedback trace. LLM3 aimed to improve action sequence quality incrementally, benefiting from the LLM's intrinsic heuristics and previous failure insights. Using a system message and task description as prompts, LLM3 prompted the LLM to generate reasoning and action sequences autonomously, facilitating domain-independent planning.
Two strategies, backtrack and from scratch, were employed to generate new action sequences, enabling the LLM to refine its outputs based on previous failures. Motion planning feedback was synthesized to provide meaningful insights into motion failures, aiding LLM3 in improving high-level planning effectively. Feedback included collision and unreachability categorizations, enhancing the LLM's understanding of failure causes. The paper conducted simulations in a box-packing domain to quantify LLM3's effectiveness and efficiency, demonstrating its superiority over unguided planners.
Ablation studies underscored the importance of motion failure reasoning in LLM3's success. Additionally, qualitative experiments on a physical manipulator showcased the practical applicability of LLM3 in real-world settings. Overall, LLM3 represented a significant advancement in TAMP, offering a domain-independent interface and leveraging pre-trained LLMs for efficient and effective task planning with motion feedback integration.
Simulation and experiment
Through simulations and experiments, the effectiveness and efficiency of LLM3 were demonstrated, highlighting its potential for real-world applications. In simulations, LLM3 was evaluated in two settings: one with increasing object sizes and a constant basket size, and another with increasing basket sizes. LLM3's success rate (%SR), the number of LLM calls (#LM), and the number of motion planner calls (#MP) were quantitatively assessed.
Results showed that integrating motion planning feedback led to improvements in %SR while reducing the #LM and #MP. Surprisingly, no clear advantage was observed between using backtracking and planning from scratch strategies. Additionally, an ablation study compared LLM3 with baseline methods, revealing the framework's superiority in terms of %SR and efficiency. LLM3 significantly reduced the number of iterations and #MP required to achieve feasible action sequences compared to random sampling.
Furthermore, LLM3's ability to act as an informed action parameter sampler was investigated. Results indicated that leveraging LLMs for action parameter selection substantially reduced the sampling iterations and motion planner calls needed to generate feasible action sequences, with further improvements observed when incorporating motion planning feedback. In a real-world experiment with a physical robot manipulator, LLM3 successfully performed a box-packing task despite uncertainties in perception and execution. The robot accurately identified and manipulated objects, demonstrating the practicality and robustness of LLM3 in real-world scenarios.
Conclusion
In conclusion, LLM3 represented a significant advancement in TAMP by leveraging pre-trained LLMs to bridge symbolic task planning and continuous motion generation. Through simulations and physical experiments, LLM3 demonstrated its effectiveness in solving TAMP problems efficiently, with motion failure reasoning playing a crucial role in refining action sequences. The framework's ability to integrate motion planning feedback and its practical applicability in real-world settings showcased its potential for autonomous robotic manipulation tasks.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Wang, S., Han, M., Jiao, Z., Zhang, Z., Wu, Y. N., Zhu, S.-C., & Liu, H. (2024, March 18). LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning. ArXiv.org. https://doi.org/10.48550/arXiv.2403.11552, https://arxiv.org/abs/2403.11552