Innovative DRL Approach to Alleviate Traffic Congestion

In an article published in the journal Nature, researchers presented an enhanced method for traffic signal control using deep reinforcement learning (DRL). They addressed slow convergence and robustness issues in DRL by incorporating dueling networks, double quality (Q)-learning, priority sampling, and noise parameters (PN_D3QN).

Study: Innovative DRL Approach to Alleviate Traffic Congestion. Image Credit: ddisq/Shutterstock
Study: Innovative DRL Approach to Alleviate Traffic Congestion. Image Credit: ddisq/Shutterstock

The approach processed high-dimensional traffic data and used realistic reward functions, achieving faster convergence and improved performance in reducing queue lengths and waiting times, and demonstrating robustness across various traffic conditions.

Background

The global increase in private car usage has led to frequent traffic congestion, contributing significantly to greenhouse gas emissions and economic disruptions. Traditional traffic signal control (TSC) methods like fixed-time control and induction control are limited in addressing dynamic traffic conditions. Adaptive TSC (ATSC) methods, such as split cycle offset optimization technique (SCOOT) and traffic-responsive urban control (TUC), dynamically adjust signal timing but still face limitations.

Reinforcement Learning (RL) offers promise for real-time adaptive TSC, with DRL combining deep learning's hierarchical data abstraction with RL's adaptive strategy adjustments. Despite successes, DRL suffers from inefficient training sample selection and slow convergence, and its models need improved robustness for varying traffic conditions.

This paper proposed a comprehensive TSC model, PN_D3QN, integrating dueling networks, double Q-learning, prioritized experience replay, and noise injection. It introduced a phase-cycle action space and a realistic reward function. The model's effectiveness and robustness were validated across various traffic scenarios, addressing previous gaps in training efficiency and adaptability.

Preliminary Analysis and Problem Formulation

The researchers focused on TSC at urban intersections, particularly four-leg intersections, using DRL. The TSC problem was modeled as a Markov decision process (MDP) involving states, actions, transition probabilities, rewards, and a discount factor. The neural network-based agent learned to adaptively select optimal actions to manage traffic congestion.

Traditional RL methods using Q-tables were limited by large state spaces, so DRL employed neural networks to approximate optimal action-value functions, allowing for more flexible and efficient handling of complex traffic environments.

TSC Model

The TSC model utilized discrete state encoding techniques to define states based on vehicle positions, velocities, and signal phases, employing convolutional networks for feature extraction. Traffic data was acquired through loop and video detectors, as well as autonomous vehicles acting as mobile detectors.

The state space included position and velocity matrices for all lanes, while the action space defined green light durations for current phases to ensure safety and efficiency. The reward function evaluated agent performance by measuring queue length differences between time steps, guiding the agent to alleviate traffic congestion effectively.

Algorithm and Model Training

The dueling network and double Q-learning algorithm was an enhanced version of the deep Q-network (DQN). It improved the neural network structure by splitting one fully connected layer into two parts, decomposing the Q-value into the state value function and the advantage function. This structure provided a more accurate action-value estimation. The double Q-learning aspect used a main network to determine the optimal action and a target network to evaluate its Q value, which helps reduce bias.

To expedite training, the prioritized experience replay (PER) mechanism assigned higher sampling probabilities to more important samples, enhancing convergence speed. Additionally, introducing noise into the network parameters (mean and standard deviation) increases the model’s robustness by enabling it to better adapt to variations in traffic conditions.

The training process involved using a convolutional network to extract vehicle state information and a fully connected network to output Q values. These values guided the agent in selecting actions to control traffic signals. Experience samples were stored and prioritized for training, gradually updating the network parameters to approximate the optimal action-value function. This process ultimately helped the agent learn a policy that maximized expected rewards.

Experimental Setup and Evaluation Analysis

The experimental setup used the simulation of urban mobility (SUMO) traffic simulation platform with the PyTorch framework to evaluate the PN_D3QN algorithm for TSC. Vehicles were randomly generated, and key metrics such as cumulative reward, average waiting time, and average queue length were used to assess performance. The PN_D3QN algorithm was compared against fixed time control (FTC), max-pressure (MP), and dueling double deep Q-network (D3QN).

Results showed that PN_D3QN outperformed other methods, demonstrating faster learning, quicker convergence, and higher stability. PN_D3QN's optimizations, including noise networks and PER, enabled it to achieve better performance in reducing average waiting time and queue length.

In tests with different traffic scenarios, PN_D3QN consistently showed better performance, particularly in high-density and complex traffic conditions, where it significantly reduced waiting times and queue lengths compared to D3QN, MP, and FTC, highlighting the effectiveness and robustness of the PN_D3QN algorithm in dynamic traffic environments.

Conclusion

In conclusion, the PN_D3QN method enhanced TSC using advanced DRL techniques, achieving faster convergence and robust performance across diverse traffic conditions. By integrating dueling networks, double Q-learning, prioritized experience replay, and noise parameters, it effectively reduced queue lengths and waiting times, demonstrating superior efficiency compared to traditional methods like FTC and MP. Future research should address initial training challenges and expand to multi-agent signal control for broader applicability in real-world scenarios.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, June 28). Innovative DRL Approach to Alleviate Traffic Congestion. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240628/Innovative-DRL-Approach-to-Alleviate-Traffic-Congestion.aspx.

  • MLA

    Nandi, Soham. "Innovative DRL Approach to Alleviate Traffic Congestion". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240628/Innovative-DRL-Approach-to-Alleviate-Traffic-Congestion.aspx>.

  • Chicago

    Nandi, Soham. "Innovative DRL Approach to Alleviate Traffic Congestion". AZoAi. https://www.azoai.com/news/20240628/Innovative-DRL-Approach-to-Alleviate-Traffic-Congestion.aspx. (accessed November 21, 2024).

  • Harvard

    Nandi, Soham. 2024. Innovative DRL Approach to Alleviate Traffic Congestion. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240628/Innovative-DRL-Approach-to-Alleviate-Traffic-Congestion.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Revolutionizing Gemstone Analysis with Deep Learning