In an article recently submitted to the ArXiv* server, researchers proposed a pioneering platform, AgentOhana, to address the complex challenges related to the consolidation of heterogeneous data sources concerning multi-turn large language model (LLM) agent trajectories.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Challenges in LLM-based autonomous agents
LLMs have demonstrated robust abilities in conversational AI, AI agents, mathematical reasoning, and code generation. Among them, autonomous agents powered by LLMs are increasingly gaining greater research attention. For instance, LangChain, XAgent, BOLAA, OpenAgent, and Auto generative pre-trained transformer (GPT) are recent frameworks designed for LLM agents to support agent tasks and have attracted substantial interest in the open-source community.
However, several existing agents are primarily powered by closed-source LLM application programming interfaces (APIs) like Gemini and GPT-4, as most open-source models cannot efficiently handle complex agent tasks and perform long-horizon reasoning. Recently, several efforts have been made to train open-source models instead of solely depending on commercialized APIs.
However, fully harnessing the LLMs' potential for agent-based tasks is an inherent challenge owing to the heterogeneous nature of various data sources/non-standardized data formats sourced from different dataset collections featuring multi-turn trajectories, which are common in agent-relevant data. The heterogeneity in processing methods, labeling conventions, syntaxes, and data structures across datasets complicates the fine-tuning and training processes of LLMs.
Specifically, the absence of standardized formats leads to complexities while harmonizing diverse data sources, resulting in potential inconsistencies and biases. Effective preprocessing pipelines must be developed to ensure compatibility and unification across diverse data formats, and strategies must be implemented to mitigate biases caused by the non-standardized representations to address these challenges.
The proposed approach
In this study, researchers proposed a comprehensive solution /agent data training and collection pipeline designated as AgentOhana to address these challenges effectively. The study's objective was to establish an effective method for managing non-standardized data formats to realize the robust performance of LLM agents in various applications, considering the rising demand for diverse and comprehensive datasets.
AgentOhana can aggregate agent trajectories from different environments, spanning various scenarios. The platform can meticulously standardize and unify these trajectories into a consistent format to streamline the development of a generic data loader optimized for agent training.
Specialized processes were employed by AgentOhana to transform various data into a uniform format for seamless integration across several s ources. Moreover, the data collection was subjected to a meticulous filtering process to ensure high-quality trajectories, which introduced an additional quality control layer.
Thus, the proposed training pipeline ensures equilibrium across various data sources and preserves independent randomness in all devices during model training and dataset partitioning by leveraging data unification and standardization to prevent the inadvertent introduction of biases during the training process.
In the AgentOhana workflow, a homogeneous multi-turn data format that was designed to consolidate trajectories from heterogeneous data sources was initially adopted. Then, a method designated as AgentRater was introduced to filter and assess agent trajectories based on robust close-world models such as ChatGPT or public models such as Mistral.
Eventually, a generic data loader was adopted as a central component to enable seamless integration of different datasets into a distributed training process. Additionally, researchers also presented a large action model, designated as xLAM-v0.1, which was tailored for AI agents.
A supervised fine-tuning approach was adopted to improve the performance of the xLAM-v0.1 agent model, which was pre-trained initially on the Mixtral-8x7B-Instruct-v0.1 model. This fine-tuning process was executed by leveraging AgentOhana's capabilities. Four benchmarks, including MINT-Bench, ToolEval, HotpotQA, and Webshop, were used for the experimental evaluations of the model.
Study findings
xLAM-v0.1 displayed exceptional performance across various benchmarks. The model consistently outperformed both GPT-3.5-Turbo-Instruct and GPT-3.5-Turbo across every agent configuration and also outperformed GPT-4-0613 in five out of six settings within the Webshop environment. Similarly, the xLAM-v0.1 model demonstrated superior performance compared to Mixtral-8x7B-Instruct-v0.1 and GPT-3.5-Turbo in all settings in the HotpotQA environment. However, GPT-4-0613 showed a slight performance edge over the proposed model.
On ToolEval, xLAM-v0.1 outperformed both GPT-3.5-Turbo-0125 and TooLlama V2 across all scenarios and also outperformed GPT-4-0125-preview in two out of the three settings. In the comprehensive and challenging MINT-Bench environment, the xLAM-v0.1 model secured third rank, outperforming AgentLM-70b and Lemur-70b-Chatv1 agent-based models, and GPT-3.5-Turbo-0613 and Claude-2 general LLMs, which indicated the created model's exceptional capability to navigate complexities of task resolution and multi-turn interactions.
To summarize, the findings of this study demonstrated that AgentOhana can effectively address the inherent challenges in consolidating diverse data of the multi-turn LLM agent trajectories.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.