In an article published in the journal Information Sciences, researchers proposed a transformative potential of cooperative DRL, and its synergy with the Shapley value reward system will be explored to revolutionize traffic signal management, leading to reduced congestion and enhanced traffic flow efficiency.
Urban traffic congestion remains a pressing challenge, impacting daily commutes, environmental sustainability, and overall urban productivity. Traditional traffic signal control methods have shown limitations in adapting to the complex and dynamic nature of traffic patterns. Recently, artificial intelligence and machine learning have emerged as a potential game-changer. Particularly, deep reinforcement learning (DRL) and the concept of cooperation between traffic intersections have gained prominence.
Cooperative deep reinforcement learning
Deep reinforcement learning (DRL) is an area of machine learning that holds immense promise in solving intricate challenges. In the context of traffic signal control, intersections can be likened to intelligent agents. Each agent learns how to time traffic signals by interacting with the traffic environment, adapting its actions to maximize a predefined reward, typically reduced travel times and minimized congestion. However, the real power of DRL emerges when these agents collaborate. Cooperative DRL introduces the concept of communication and collaboration between agents. By sharing information, such as queue lengths and vehicle counts, agents can collectively work to optimize traffic flow. This approach proves particularly effective in urban scenarios where intersections are interconnected, such as traffic grids in cities.
Shapley value reward system
Cooperation among agents necessitates a well-designed and equitable reward system. This is where the Shapley value comes into play. The Shapley value is a concept from cooperative game theory that fairly distributes the contribution of each agent in a cooperative setting. In the context of traffic signal control, the Shapley value assigns a reward to each intersection based on its individual contribution to reducing traffic congestion. This elegant mechanism encourages intersections to work together harmoniously, as their actions directly influence the collective goal of optimizing traffic flow.
Putting theory into practice
Implementing cooperative DRL with the Shapley value reward system involves several pivotal steps:
Problem Framing: Intersections are considered intelligent agents responsible for traffic signal control. The primary objective is to enhance collaboration among these agents to achieve optimal traffic flow.
Cooperative Learning: Agents engage in communication, sharing their localized observations and relevant information with neighboring intersections. This facilitates joint decision-making, thereby improving traffic signal synchronization.
Optimized Learning: Agents learn from their experiences through deep neural networks. To ensure stable and effective learning, outdated experiences are eliminated using the Kullback-Leibler divergence technique.
Shapley Value Reward: The Shapley value reward system calculates rewards for each agent based on their contributions toward mitigating traffic congestion. By doing so, the system encourages intersections to harmonize their actions and work collectively to enhance traffic flow.
Results and comparison
To validate the effectiveness of the proposed approach, a series of experiments are conducted using both simulated and real-world traffic datasets. The outcomes demonstrate the substantial benefits of cooperative DRL with Shapley value rewards compared to conventional fixed-time signal control methods:
Remarkable congestion reduction: The approach significantly reduces average travel times, thereby alleviating congestion and enhancing overall traffic flow efficiency.
Strengthened collaborative efforts: Cooperative DRL fosters deeper collaboration among traffic intersections, leading to smoother traffic flow and reduced bottlenecks.
Consistency and stability: Combining the optimized loss function and the Shapley value reward system ensures stable learning, even in the face of dynamically changing traffic conditions.
Outperforming conventional approaches: In direct comparison to traditional methods, the cooperative approach consistently outperforms them, demonstrating its superiority across diverse traffic grid sizes and complexities.
Conclusion
Traffic congestion continues to be a persistent urban challenge, but the convergence of cooperative DRL and the Shapley value reward system offers a promising avenue for resolution. By empowering intersections to communicate and collaborate effectively, this innovative approach paves the way for traffic signal control systems optimized for all road users' benefit. As cooperative DRL evolves and matures, it presents a powerful solution for enhancing traffic flow, reducing congestion, and elevating the overall urban commuting experience. As urban populations continue to grow, the potential impact of this synergy becomes increasingly profound, suggesting a brighter and less congested future for cities.
Journal reference:
- Liu, J., Qin, S., Su, M., Luo, Y., Wang, Y., & Yang, S. (2023). Multiple Intersections Traffic Signal Control based on Cooperative Multi-agent Reinforcement Learning. Information Sciences. https://doi.org/10.1016/j.ins.2023.119484, https://www.sciencedirect.com/science/article/abs/pii/S0020025523010691