A recent study submitted to the arXiv* server introduces an innovative framework called Graph of Thoughts (GoT) that unlocks substantially more flexible and powerful reasoning capabilities for large language models (LLMs) such as GPT-3 and GPT-4 without requiring any model updates. GoT represents a monumental conceptual advance that models an LLM’s internal reasoning process as an arbitrary graph, facilitating revolutionary new thought transformations that qualitatively improve performance on even extremely elaborate deliberative tasks compared to prior prompting techniques.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In recent years, large language models (LLMs) like GPT-3 have achieved remarkable success on a diverse array of natural language tasks through prompt engineering. This technique involves carefully formulating the task prompt provided to the LLM. However, existing prompting methods impose rigid constraints on reasoning, such as linear chains or tree structures. This severely limits the flexibility and complexity of LLM deliberation compared to human cognition, which forms rich associative networks drawing nonlinear connections. Developing more powerful reasoning is essential for LLMs to solve hard problems across diverse real-world domains.
Advantages of GoT
Rationality as Associative Graphs: When humans tackle complex novel challenges, their reasoning follows an intricate, networked associative process rather than a simple linear chain. Insights leap between different subtasks, combining partial solutions and looping back to re-evaluate ideas from new perspectives. GoT makes a profound connection, showing that for artificial agents to achieve human-like rationality, their thoughts too, must form rich, flexible graphs rather than strictly controlled chains or trees.
Overcoming Limitations of LLMs: LLMs can perform various natural language processing tasks where the task is described within the prompt provided to the model. However, existing prompting paradigms like Chain-of-Thought and Tree-of-Thoughts fundamentally constrain the LLM’s reasoning within a prompt by imposing linear chains or rigid tree structures. GoT overcomes these severe limitations by embracing a graph abstraction to represent LLM thoughts as vertices and their semantic dependencies as edges.
GoT-driven Thought Transformations: This conceptual leap enables revolutionary new capabilities. The graph structure supports arbitrary thought transformations, for instance, aggregating only the most promising intermediate thoughts into extraordinarily powerful new ones or looping thoughts to iteratively refine them far beyond human patience. GoT seamlessly generalizes yet extends prior prompting schemes by odelling reasoning as a rich, flexible network. This brings LLMs closer to replicating the complex, associative graphs formed by human thinking and cognition.
Optimal Performance: The researchers demonstrated that GoT is particularly well-suited for handling elaborate tasks that can be decomposed into more manageable sub-problems first solved independently before merging the solutions. For example, on a sorting task, GoT intelligently split the numbers into subsets, sorted each subset individually, and then aggregated the sorted subsets into the final correct solution. This approach reduced errors in sorting a sequence of 128 numbers by over 60% compared to the current state-of-the-art Tree-of-Thoughts prompting technique while simultaneously cutting computational costs by 31% as well.
Modular Architecture: To unlock the potential of networked reasoning, the researchers carefully designed a modular software architecture for GoT with fine-grained control over thought generation, scoring, and selection. This enables interactively exploring different graph structures and thought transformations for a given task. Extensive experiments underscore GoT’s qualitative improvements across diverse tasks, including sorting numbers, set operations on lists, keyword counting for text summarization, and document merging.
Substantial Accuracy Gains: Across all experiments, GoT substantially increased accuracy over existing methods like Chain-of-Thought and basic input-output prompting. For instance, on the sorting task, GoT enhanced accuracy by approximately 70% over Chain-of-Thought and 83% over input-output prompting. Moreover, the gains grew considerably with larger and more difficult problem sizes, as GoT could methodically decompose them into more tractable sub-problems before combining the solutions. This strongly highlights the power of networked reasoning for handling complex tasks.
Volume of Thought: The researchers also introduced an insightful new metric called the “volume of thought” for evaluating and comparing different prompting strategies. The volume of thought measures the total amount of previous thoughts that could potentially contribute to or impact the generation of that thought. GoT aggregation mechanisms enabled individual thoughts to have dramatically larger volumes than approaches like Tree-of-Thoughts that restrict thought combinations.
Future outlook
This study makes groundbreaking contributions that advance the design and evaluation of prompting techniques for LLMs. Modeling the reasoning process as a graph structure confers substantially more flexibility compared to rigid tree architectures or linear chains of thought. The work also offers a modular, extensible architecture to prototype and assess revolutionary new ideas rapidly. Most significantly, the empirical results demonstrate that networked thought transformations qualitatively improve LLM inference across diverse tasks. Future work can build upon this paradigm to develop even more sophisticated, human-like deliberative reasoning capabilities in LLMs that match biological intelligence’s complexity and insight.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajda, J., Lehmann, T., Podstawski, M., Niewiadomski, H., Nyczyk, P., & Hoefler, T. (2023, August 21). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. ArXiv.org. https://doi.org/10.48550/arXiv.2308.09687, https://arxiv.org/abs/2308.09687