SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs

In an article posted to the ArXiv* server, researchers demonstrated the feasibility of a new scalable large language model (LLM)-based task planning framework for robotics.

Background

Recent advancements in LLMs have enabled robots to plan complex strategies for different tasks that require a significant amount of semantic comprehension and background knowledge.

However, LLMs must adhere to constraints present in the physical environment where the robot operates, including the relevant predicates, the effect of actions on the current state, and available affordances, to become efficient planners in robotics.

Additionally, the robots must be able to understand their location, identify the items of interest, and realize the topological arrangement of the environment to plan across the important regions in expansive environments.

Study: SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs. Image credit: 3rdtimeluckystudio /Shutterstock
Study: SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs. Image credit: 3rdtimeluckystudio /Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Several studies have investigated the feasibility of using planning domain definition language description of a scene, object detectors, and vision-based value functions to ground the LLM-based planner output. However, these efforts were confined to single rooms/small-scale environments with pre-encoded information on all existing objects and assets present in the environment.

Moreover, scaling these models is extremely challenging as a higher number of entities and rooms in the scene increases the complexity and expands the dimensions of the environment, which make the pre-encoding of all critical information in the LLM context increasingly infeasible.

Thus, a new scalable approach is necessary to ground the LLM-based task planners in expansive, multi-room, and multi-floor environments.

A new approach for LLM-based large-scale task planning for robots

In this paper, researchers proposed SayPlan, a new scalable approach to large-scale task planning based on LLMs for robotics using three-dimensional scene graph (3DSG) representations. The scalable approach was developed to ground the LLM-based task planners across expansive environments spanning several floors and rooms by exploiting the growing body of 3DSG research.

The study addressed the challenge of long-range planning for autonomous agents in an expansive environment based on natural language instructions. Thus, the experiments were designed to assess the 3DSG reasoning capabilities of LLMs on high-level task planning for a mobile manipulator robot.

This long-range planning involved comprehending ambiguous and abstract instructions, understanding the scene, and generating task plans for manipulating and navigating a mobile robot within an environment.

3DSGs can capture a rich hierarchically organized and topological semantic graph representation of an environment and encode the information necessary for task planning, including predicates, attributes and affordances, and object state, using natural language that can be parsed by an LLM. The JavaScript Object Notation (JSON) representation of this graph was leveraged as input to a pre-trained LLM.

Additionally, the scalability of the approach was ensured by reducing the LLM planning horizon through the integration of a classical path planner, introducing an iterative replanning pipeline to refine the initial plan using the scene graph simulator feedback to correct infeasible actions and avoid planning failures, and exploiting the hierarchical nature of 3DSGs to allow LLMs to perform a semantic search for task-relevant subgraphs from a collapsed, smaller representation of the full graph,

The proposed approach was assessed across 90 tasks that were organized into four difficulty levels, including semantic search tasks and long-horizon, interactive tasks with multi-room ambiguous objectives that require substantial common sense reasoning.

These tasks were evaluated in two large-scale environments, including a three-story house with 121 objects and 32 rooms and a large office floor with 36 rooms and 150 interactable objects and assets.

Significance of the study

The findings of this study demonstrated the effectiveness of the approach to ground long-horizon, large-scale task plans from abstract, natural language instruction for execution by a mobile manipulator robot.

SayPlan GPT-4 achieved 73.3% and 86.7% success in finding the desired subgraph across both complex and simple search tasks, respectively. Additionally, the input tokens required to represent the home environment and office environment were reduced by 60.4% and 82.1%, respectively, due to the semantic reasoning capabilities of LLMs and the hierarchical nature of 3DSGs, which allowed the agent to explore the scene graph from the highest hierarchical level. 

Moreover, SayPlan attained near-perfect executability due to iterative replanning by a scene graph simulator, which ensured that the generated plans adhere to predicates and constraints imposed by the environment.

The approach produced the highest number of executable and correct plans that can be followed by a mobile robot compared to current baseline techniques. Thus, SayPlan successfully addressed two key issues, including the mitigation of LLM erroneous outputs and hallucinations while generating long-horizon plans in expansive environments and the representation of large-scale scenes within LLM token limitations.

Limitations of the approach and future outlook

The graph-based reasoning capabilities of the underlying LLM fail at node negation, node count-based reasoning, and simple distance-based reasoning, which is a significant limitation of this proposed approach.

Additionally, the current SayPlan framework requires a pre-built 3DSG and assumes that all objects remain static post-map generation, which significantly restricts the adaptability of the framework to dynamic real-world environments.

Thus, more research is required to fine-tune these LLMs for large-scale task planning for robotics, to incorporate more complex graph reasoning tools to facilitate decision-making and to integrate the online scene graph simultaneous localization and mapping (SLAM) systems within the SayPlan framework.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Samudrapom Dam

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dam, Samudrapom. (2023, July 16). SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs. AZoAi. Retrieved on July 07, 2024 from https://www.azoai.com/news/20230716/SayPlan-Scaling-Up-LLM-Based-Task-Planning-for-Robotics-Using-3D-Scene-Graphs.aspx.

  • MLA

    Dam, Samudrapom. "SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs". AZoAi. 07 July 2024. <https://www.azoai.com/news/20230716/SayPlan-Scaling-Up-LLM-Based-Task-Planning-for-Robotics-Using-3D-Scene-Graphs.aspx>.

  • Chicago

    Dam, Samudrapom. "SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs". AZoAi. https://www.azoai.com/news/20230716/SayPlan-Scaling-Up-LLM-Based-Task-Planning-for-Robotics-Using-3D-Scene-Graphs.aspx. (accessed July 07, 2024).

  • Harvard

    Dam, Samudrapom. 2023. SayPlan: Scaling Up LLM-Based Task Planning for Robotics Using 3D Scene Graphs. AZoAi, viewed 07 July 2024, https://www.azoai.com/news/20230716/SayPlan-Scaling-Up-LLM-Based-Task-Planning-for-Robotics-Using-3D-Scene-Graphs.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
ChemCrow: AI-Driven Chemistry Breakthrough