In an article published in the journal Nature, researchers presented an enhanced method for traffic signal control using deep reinforcement learning (DRL). They addressed slow convergence and robustness issues in DRL by incorporating dueling networks, double quality (Q)-learning, priority sampling, and noise parameters (PN_D3QN).
The approach processed high-dimensional traffic data and used realistic reward functions, achieving faster convergence and improved performance in reducing queue lengths and waiting times, and demonstrating robustness across various traffic conditions.
Background
The global increase in private car usage has led to frequent traffic congestion, contributing significantly to greenhouse gas emissions and economic disruptions. Traditional traffic signal control (TSC) methods like fixed-time control and induction control are limited in addressing dynamic traffic conditions. Adaptive TSC (ATSC) methods, such as split cycle offset optimization technique (SCOOT) and traffic-responsive urban control (TUC), dynamically adjust signal timing but still face limitations.
Reinforcement Learning (RL) offers promise for real-time adaptive TSC, with DRL combining deep learning's hierarchical data abstraction with RL's adaptive strategy adjustments. Despite successes, DRL suffers from inefficient training sample selection and slow convergence, and its models need improved robustness for varying traffic conditions.
This paper proposed a comprehensive TSC model, PN_D3QN, integrating dueling networks, double Q-learning, prioritized experience replay, and noise injection. It introduced a phase-cycle action space and a realistic reward function. The model's effectiveness and robustness were validated across various traffic scenarios, addressing previous gaps in training efficiency and adaptability.
Preliminary Analysis and Problem Formulation
The researchers focused on TSC at urban intersections, particularly four-leg intersections, using DRL. The TSC problem was modeled as a Markov decision process (MDP) involving states, actions, transition probabilities, rewards, and a discount factor. The neural network-based agent learned to adaptively select optimal actions to manage traffic congestion.
Traditional RL methods using Q-tables were limited by large state spaces, so DRL employed neural networks to approximate optimal action-value functions, allowing for more flexible and efficient handling of complex traffic environments.
TSC Model
The TSC model utilized discrete state encoding techniques to define states based on vehicle positions, velocities, and signal phases, employing convolutional networks for feature extraction. Traffic data was acquired through loop and video detectors, as well as autonomous vehicles acting as mobile detectors.
The state space included position and velocity matrices for all lanes, while the action space defined green light durations for current phases to ensure safety and efficiency. The reward function evaluated agent performance by measuring queue length differences between time steps, guiding the agent to alleviate traffic congestion effectively.
Algorithm and Model Training
The dueling network and double Q-learning algorithm was an enhanced version of the deep Q-network (DQN). It improved the neural network structure by splitting one fully connected layer into two parts, decomposing the Q-value into the state value function and the advantage function. This structure provided a more accurate action-value estimation. The double Q-learning aspect used a main network to determine the optimal action and a target network to evaluate its Q value, which helps reduce bias.
To expedite training, the prioritized experience replay (PER) mechanism assigned higher sampling probabilities to more important samples, enhancing convergence speed. Additionally, introducing noise into the network parameters (mean and standard deviation) increases the model’s robustness by enabling it to better adapt to variations in traffic conditions.
The training process involved using a convolutional network to extract vehicle state information and a fully connected network to output Q values. These values guided the agent in selecting actions to control traffic signals. Experience samples were stored and prioritized for training, gradually updating the network parameters to approximate the optimal action-value function. This process ultimately helped the agent learn a policy that maximized expected rewards.
Experimental Setup and Evaluation Analysis
The experimental setup used the simulation of urban mobility (SUMO) traffic simulation platform with the PyTorch framework to evaluate the PN_D3QN algorithm for TSC. Vehicles were randomly generated, and key metrics such as cumulative reward, average waiting time, and average queue length were used to assess performance. The PN_D3QN algorithm was compared against fixed time control (FTC), max-pressure (MP), and dueling double deep Q-network (D3QN).
Results showed that PN_D3QN outperformed other methods, demonstrating faster learning, quicker convergence, and higher stability. PN_D3QN's optimizations, including noise networks and PER, enabled it to achieve better performance in reducing average waiting time and queue length.
In tests with different traffic scenarios, PN_D3QN consistently showed better performance, particularly in high-density and complex traffic conditions, where it significantly reduced waiting times and queue lengths compared to D3QN, MP, and FTC, highlighting the effectiveness and robustness of the PN_D3QN algorithm in dynamic traffic environments.
Conclusion
In conclusion, the PN_D3QN method enhanced TSC using advanced DRL techniques, achieving faster convergence and robust performance across diverse traffic conditions. By integrating dueling networks, double Q-learning, prioritized experience replay, and noise parameters, it effectively reduced queue lengths and waiting times, demonstrating superior efficiency compared to traditional methods like FTC and MP. Future research should address initial training challenges and expand to multi-agent signal control for broader applicability in real-world scenarios.