Gaming, a historically niche economic sector, is expected to witness a massive rise in revenue leveraging the recent artificial intelligence (AI) advances. For instance, techniques developed within the field of reinforcement learning (RL) and machine learning (ML) can analyze and learn from gameplay experiences, enabling more engaging, immersive, and interactive games. This article deliberates on the multi-faceted impact of AI on the gaming industry, especially focusing on RL.
Importance of AI in Gaming
AI in gaming is built around making situations or characters. The most basic AI type in games is a nonplayable character (NPC). These characters are script-based, and thus, interact according to it. However, the characters can be slightly tweaked to make them more realistic.
For instance, the Left 4 Dead game employed the adaptive dramatic pacing algorithm. Prediction has a big role in board games such as Dota AutoChess or Hearthstone, which led to the implementation of neural networks in these games. AI is also implemented in games for data mining on user behavior, where researchers try to investigate players’ minds by putting them into situations and then assessing their problem-handling techniques and reactions.
The Card game has remained a persistent challenge for AI as a typical in-perfect information game. Libratus and DeepStack are the AI systems that can defeat professional poker players in HUNL. They both share a similar basic technique, designated as counterfactual regret minimization (CFR). DouDiZhu and Mahjong have even been more challenging for AI.
The first AI system that outperformed many top human Mahjong players was designated Suphx and built by Microsoft Research Asia. Similarly, the DouZero AI system designed for DouDiZhu was ranked first among 344 AI agents on the Botzone leaderboard.
Deep RL has demonstrated significant potential in the gaming industry, with several deep RL methods being used for ATARI games, board games, and newer multiplayer online battle arena (MOBA)/strategic games.
Deep Q-network (DQN)
RL is unpredictable from the beginning/deviates upon applying a nonlinear function approximation for estimating the energy evaluation function. The correlation in sequence observations, and between the temporal difference (TD) target and energy values, coupled with the several significant changes in the agent’s policy due to minute changes in the energy value function, often lead to turbulence/instability.
DeepMind developed DQN, a Q-learning method variation that employs two core principles, to address this instability. An experience repetition/replay mechanism is utilized, eliminating the relationship between a sequence of observations by selecting random observations for learning.
Additionally, the energy value function is shifted in the TD goal direction only periodically instead of after every observation, diminishing the association between these two aspects. This is achieved by replicating the Q network regularly into a second Q network, which estimates the TD target and updates the Q-network parameters.
DQN represents states using deep convolutional neural networks to decrease the complexity, which cannot be achieved using classic methods in real-world problems. DeepMind utilized this DQN to train accessible moves/movements, terminals, incentives/rewards, and video data for ATARI games.
No fundamental knowledge related to the game was provided to the network. The purpose was to build a single neural network agent to play and train many games. Out of the seven games, the DeepMind Technologies agent with DQN defeated other algorithms in six games and also broke the world record set by human players in three games.
Dueling Networks
Dueling network architecture is based on the concept that assessing every possible action’s value is not always necessary, but determining the relevance of states is essential for bootstrapping methods. Google DeepMind developed the dueling network to realize this concept. Two streams/sequences of ultimately linked layers are employed in place of using coherent layers followed by a single fully connected layers’ sequence.
The sequences are set up to generate independent estimates of value and reward functions. Eventually, the two sequences are combined to provide only a single Q model output. The dueling network can be trained using multiple techniques, including SARSA and DDQN, as the network’s performance is a Q function.
DeepMind using the new dueling Q–network architecture outperformed in most of the ATARI games, with significantly higher top scores compared to old DQN in Asterix and compared to double DQN in Atlantis games.
The dueling DQN has been successful in all ATARI games except for Bowling, where the top scores achieved were reduced by 1.89%. In the Atlantis, Space Invaders, and Tennis games, the dueling DQN outperformed ordinary DQN by over 100%.
AlphaGo
AlphaGo is a Google DeepMind-developed RL algorithm implemented to play the Go board game. This was the first AI algorithm to defeat a professional Go player on a 19 × 19 full-size game board. AlphaGo utilizes the Monte Carlo tree search algorithm (MCTS), integrating branching ML approaches with rigorous human and computer games training. Moreover, deep learning networks are used that receive a game board description as input in every state, which passes through various layers.
Then, the policy network selects the next optimal action for the computer player, while the value network determines the value of the current state. AlphaGo Zero, which is the recent development of AlphaGo, has accumulated thousands of years of human experience in a few days with the help of AlphaGo, the best Go player in the world.
AlphaGo Zero learns by playing against itself, while AlphaGo learned to play professionally after playing thousands of games with professional and novice players. Thus, AlphaGo Zero quickly succeeded and outperformed all of its earlier versions.
During the game, a single neural network initially lacks any information about Go. The algorithm eventually ends up playing with itself, combining the deep neural network (DNN) with an effective neural network search algorithm updated and regulated to evaluate movements.
The updated network is again combined with the same algorithm, leading to the emergence of a stronger Zero. The entire process iterates several times, gradually enhancing system performance with each repetition. Moreover, Zero utilizes a single neural network that integrates the logic of two policies and value networks presented in the initial implementations. The performance of Zero improved based on computing power management and algorithmic power due to these changes. Only four TPUs are required in AlphaGo Zero, which makes it the most power–efficient system.
OpenAI Five
Dota 2 is a highly challenging MOBA game due to the various calculations required, the large number of moves that the player can make, and the several goals during a match. Specifically, the players are only visible in specific areas of the world, which makes the environment partially observable/visible. This nuanced style of play requires a steep learning curve for any system. OpenAI Five has recently overcome this challenge by winning the OpenAI Five Finals in a five-agents versus the world champions match.
Five was composed of five neural networks observing the game environment as a list of 20,000 numbers encoding the observable/visible field of play and function by choosing moves from an 8-number list. Each of the five team’s neural networks is a single long short–term memory (LSTM) network with 1024 units that receives game status through a Bot application programming interface (API) and sends moves with semantic value.
Every neural network determines its movements/makes decisions independently based on the current match goal and assistance required by a teammate agent. OpenAI Five successfully interpreted conditions that were considered risky using sufficient reward arrangement/reward shaping.
Additionally, after many days of practicing, the system also adapted to advanced gaming practices, like stealing vital items from the opponent and team play pressure on the opponent’s area, spending 80% of the time playing against itself and 20% against previous models.
AlphaStar
Google’s DeepMind introduced the 2019 AlphaStar implementation that defeated one of the best human StarCraft II players as the first AI in a series of experimental races held under professional classification match conditions. This AI system consists of DNNs, RL, and deep learning techniques and takes raw game data as input in the StarCraft II setting. The data are interpreted as a list of available properties and sections, and a collection of commands is generated as output that comprises the movement performed at every time level.
DeepMind designated this architecture as the Transformer, which is based on attention mechanisms distributed using convolutional neural networks and recurrent neural networks. The transformer body employs one deep core LSTM, an indicator network, and an assessment aggregate value approximation.
In conclusion, RL/deep RL is revolutionizing the game industry, creating incredibly skilled AI players. However, the ethical considerations, computational costs, and explainability must be considered while implementing AI algorithms in this field.
References and Further Reading
Souchleris, K., Sidiropoulos, G. K., Papakostas, G. A. (2022). Reinforcement Learning in Game Industry—Review, Prospects and Challenges. Applied Sciences, 13(4), 2443. https://doi.org/10.3390/app13042443
Yin, Q. Y., Yang, J., Huang, K. Q., Zhao, M. J., Ni, W. C., Liang, B., Huang, Y., Wu, S., Wang, L. (2023). AI in Human-computer Gaming: Techniques, Challenges and Opportunities. Machine Intelligence Research, 20(3), 299-317. https://doi.org/10.1007/s11633-022-1384-6
Tyagi, S., Sengupta, S. (2020). Role of AI in Gaming and Simulation. Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2019), 259-266. https://doi.org/10.1007/978-3-030-43192-1_29
Yakan, S. A. (2022). Analysis of development of artificial intelligence in the game industry. International Journal of Cyber and IT Service Management, 2(2), 111-116. https://doi.org/10.34306/ijcitsm.v2i2.100
Mekni, M., Jayaramireddy, C. and Naraharisetti, S. (2022) Reinforcement Learning Toolkits for Gaming: A Comparative Qualitative Analysis. Journal of Software Engineering and Applications, 15, 417-435. https://doi.org/10.4236/jsea.2022.1512024