In a paper published in the journal Entropy, researchers introduced an augmented large language model (LLM) agent called the private agent, which uses private deliberation and deception in repeated games. Utilizing the partially observable stochastic game (POSG) framework, in-context learning (ICL), and chain-of-thought (CoT) prompting, the study found that the private agent achieved higher long-term payoffs than its baseline counterpart in both competitive and cooperative scenarios.
Despite its success, the study also noted inherent algorithmic deficiencies in LLMs. These findings highlight the potential for improving LLM agents in multi-player games through advanced deception and communication strategies.
Related Work
Past work explored generative agents in cooperative settings, defining them as capable of simulating human behavior for believable actions. Another study introduced the collaborative agents for multi-agent environments with learning (CAMEL) framework for complex cooperative problem-solving.
Other research focused on cooperation and competition, examining buyer-seller negotiations to understand social interactions. Modeling social dynamics remains challenging due to the need for extensive human participation and the complexity of designing such systems. Techniques like prototyping and jury learning help refine social system designs and resolve group disagreements, but overcoming these challenges requires innovative approaches and iterative improvements.
Agent Decision Strategies
This study introduces two types of agents within a POSG framework, focusing on their decision-making processes: a private thought process agent (private agent) and a public thought process agent (public agent). The private agent strategizes by considering future actions privately, keeping its strategic thoughts hidden from other agents using techniques such as CoT and ICL.
In contrast, the public agent communicates all thought processes openly without employing additional reasoning techniques like CoT. Both agent types are implemented as separate instances of an LLM, interacting with an environment class that facilitates communication, action delivery, and reward assignment according to the game rules. The POSG is formalized with a finite set of agents, states, actions, observations, and a state transition function. Each agent receives initial prompts containing game rules and policies, with the private agent having an additional layer of private thoughts.
The environment manages interactions, ensuring that the private thoughts of the private agent are concealed from other agents. Agents make decisions based on observations and beliefs, with the private agent forming a communication strategy in private thoughts before revealing selected information publicly. The game dynamics involve agents receiving observations, choosing actions, and receiving rewards based on joint actions and states, with the environment ensuring that observations and rewards are accurately relayed.
In exploring the LLM's capabilities, the study assessed the ability to generate outputs aligned with a given policy using ICL and CoT. The hypothesis was that the LLM could sample from a probability distribution and calculate near-optimal action selections.
However, experiments indicated that the LLM could not effectively sample from distributions like Gaussian, Poisson, and Uniform, thus disproving the first hypothesis. The second hypothesis tested the action choice probabilities in a two-player game, evaluating the LLM's capability to select actions based on conversation history and discern the opponent's type.
The empirical studies aimed to compare the two types of agents and evaluate their decision-making efficacy.The private agent's use of CoT in private thoughts demonstrated enhanced reasoning capabilities, whereas the public agent's transparent communication highlighted different strategic interactions. By evaluating the LLM's performance in these scenarios, insights were gained into its strengths and limitations in executing gameplay tasks and aligning outputs with defined policies, contributing to understanding LLM's computational abilities in multiplayer game settings.
LLM Decision-Making Experiments
In the study, experiments were conducted to investigate agents' decision-making processes in different game scenarios using advanced LLMs. The investigation focused on two types of agents: a private agent strategizing by keeping its thoughts concealed using techniques like CoT and ICL, and a public agent openly communicating its decision-making processes.
These experiments aimed to compare the performance of these agents across various games such as prisoner's dilemma, stag hunt, chicken game, and others. The long-chain framework was utilized to implement the agents, ensuring they maintained context from previous interactions and interacted effectively with the environment application programming interface (API).
The experimental setup involved multiple rounds and iterations, and agents could only recall the last two iterations of context, which influenced their strategy formulation and responses to game observations. For instance, private agents tended to produce longer and more contextually rich responses than public agents, indicating deeper internal deliberation. These differences were crucial for evaluating LLMs' strategic capabilities and adaptability in dynamic and competitive game environments.
The experiments aimed to achieve game theory equilibrium using LLM agents by testing various equilibria, including related equilibrium, Nash equilibrium, Pareto efficiency, and focal points, across different game scenarios. The findings highlighted generative pre-trained transformer 4's (GPT-4) superior performance in developing optimal strategies under varying conditions, emphasizing its role in strategic decision-making tasks.
Conclusion
To sum up, this research explored the effectiveness of GPT-4 and introduced the private agent in two-player repeated games. Implemented through ICL and CoT, the private agent's ability to deliberate on interactions privately led to superior long-term payoffs compared to public and heuristic agents across various scenarios within the POSG framework.
Challenges remain in opponent-type identification and sampling from diverse probability distributions, suggesting avenues for future research. The private agent's strategic advantages in competitive settings and its potential for deception highlight its promising applications beyond gaming, including interactive simulations and decision support systems.