Interactive Task Planning with Language Models

In an article recently submitted to the ArXiv* server, researchers introduced an interactive robot framework that excelled in long-term task planning and adapted effortlessly to new goals and tasks, even during execution. Unlike traditional methods with predefined modules, this innovative approach harnessed Large Language Models (LLMs), reducing the need for extensive prompt engineering or domain-specific models.

Study: Interactive Task Planning with Language Models. Image credit: NicoElNino/Shutterstock
Study: Interactive Task Planning with Language Models. Image credit: NicoElNino/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

It seamlessly integrated high-level planning and low-level execution using language, demonstrating robustness in generating high-level instructions for unforeseen objectives and adaptability across tasks by simply substituting guidelines. Additionally, the system efficiently recalibrated its planning in response to user requests.

LLMs in Robotics

The rise of LLMs and chatbots has underscored the significance of human interaction within AI systems. Past work in robotic task planning used symbolic planners and later task and motion planning (TAMP) but faced challenges with parameter definitions and search spaces. Recent research has explored using LLMs for planning in robotics, including zero-shot planning and code generation.

Interactive Task Planning (ITP) Framework

ITP integrates high-level planning and low-level execution powered by LLMs.Unlike prior work, ITP enables the LLM to create high-level plans based on contextual information, which are then executed by another LLM with access to the robot's functional API, grounded by a pre-trained Vision-Language Model (VLM). The ITP framework consists of three primary building blocks:

Visual Scene Grounding: The VLM transforms observable inputs into concise language descriptions, which ITP can use for planning and execution. In a drink-making system it identifies menu items and their locations using a mapping algorithm.

LLMs for Planning and Execution: ITP employs a Generative Pre-trained Transformer (GPT-4) as its language model. The high-level planner takes input prompts, task guidelines, and user requests to generate step-by-step plans for task execution. A second LLM, provided with scene information and robot skills, attempts to execute each step. Task guidelines in natural language outline the robot's tasks and allow for generalization to new drinks based on few-shot learning.

Robot Skill Grounding: The language model interfaces with predefined Python skills that control the robot. Researchers transform these skills into a functional API without needing specific examples or function details. They can prompt the language model with natural language documentation of the functions.

Beyond these components, ITP considers user requests as human-in-the-loop feedback, generating new plans based on completed steps, task guidelines, new requests, and chat history. 

Results and Comparative Analysis

The robot experiments focused on a drink-making system with an overhead camera providing visual scene information to the Grounded-Data-IN/Data-Out (DINO) model. This system tasked the robot with combining ingredients to create specific drinks. The robot was equipped with predefined skills, such as "grasp cup," "pour," and "scoop boba to location," enabling it to execute high-level tasks. For instance, the "grasp cup" skill relied on a feedback policy for accurate gripper placement. The designers created the "pour" skill to handle various ingredients, with the robot adjusting the tilt angle accordingly. The comparison was made between the ITP system and a baseline approach called Code as Policies, providing both systems with identical task guidelines, including task-specific conditions and additional code prompts. The experiments evaluated the number of high-level steps correctly generated and the successful completion of the task. ITP outperformed the baseline, demonstrating robustness in high-level planning and the ability to generalize to novel instructions and unavailable materials.

The approach's adaptability was also evident in the experiments on the dishwashing task, which employed different task guidelines and function definitions for low-level execution. By simply replacing the task guidelines for drink-making with those for dishwashing, the system excelled in high-level planning and task execution for this entirely different task. Notably, the system generated accurate and novel instructions for various dishwashing scenarios, making it adaptable to new tasks. The simplicity of task guideline modification and minimal need for code examples or function details illustrate the system's ease of generalization.

The experiments demonstrated that ITP is a flexible and robust framework for task planning and execution, showcasing its ability to adapt to diverse tasks with minimal reconfiguration and prompting. It provides a solid and efficient solution for real-world applications in robotics and automation.

Conclusion and Future Work

To summarize, this method represents a crucial step towards developing a tool that can assist scientists in uncovering novel avenues for exploration. Confidently, the outlined ideas and extensions pave the way for achieving practical, personalized, interdisciplinary AI-based suggestions for new impactful discoveries. Such a tool holds the potential to become an influential catalyst, transforming the way scientists approach research questions and collaborate in their respective fields.

As for future work, there are exciting possibilities to explore. Further refinement of the AI algorithms and integration of additional data sources could enhance the tool's capabilities. Additionally, considering the ever-evolving nature of scientific research, continuous updates and adaptations will be necessary to keep the tool relevant and effective. Moreover, expanding its application to different domains and industries beyond scientific research could open new avenues for innovation and discovery. This tool's future holds great potential to impact how to approach complex problems and generate valuable insights.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, October 19). Interactive Task Planning with Language Models. AZoAi. Retrieved on October 25, 2025 from https://www.azoai.com/news/20231019/Interactive-Task-Planning-with-Language-Models.aspx.

  • MLA

    Chandrasekar, Silpaja. "Interactive Task Planning with Language Models". AZoAi. 25 October 2025. <https://www.azoai.com/news/20231019/Interactive-Task-Planning-with-Language-Models.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Interactive Task Planning with Language Models". AZoAi. https://www.azoai.com/news/20231019/Interactive-Task-Planning-with-Language-Models.aspx. (accessed October 25, 2025).

  • Harvard

    Chandrasekar, Silpaja. 2023. Interactive Task Planning with Language Models. AZoAi, viewed 25 October 2025, https://www.azoai.com/news/20231019/Interactive-Task-Planning-with-Language-Models.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Large Language Models Redefine Intelligence With Human-Like Thinking