In an article published in the journal Nature, researchers introduced a novel approach to address beam management challenges in vehicle-to-vehicle (V2V) communication scenarios, leveraging a deep reinforcement learning (DRL) framework based on real-world traffic flow datasets. The proposed method aimed to enhance spectral efficiency and reliability in intelligent connected vehicles, crucial for developing smart cities and intelligent transportation systems.
Background
The growing importance of intelligent connected vehicles in smart cities and transportation systems has propelled research into optimizing vehicle-to-everything (V2X) communication. The 5G New Radio (NR) Frequency Range 2 (FR2) frequency band, recommended by the 3rd Generation Partnership Project (3GPP), poses challenges such as high path loss and mobility issues in V2V scenarios. While existing research has explored various beam management methods, they often overlook the impact of vehicle mobility and lack comprehensive traffic data distribution analysis.
This paper addressed these gaps by proposing a DRL-assisted intelligent beam management method for V2V communication. The method utilized a traffic flow dataset-based DRL framework, carefully structuring states, actions, and rewards to improve algorithm effectiveness. Statistical analysis revealed high self-similarity in the temporal dimension of traffic flow data, prompting the introduction of a Recurrent Neural Network (RNN) structure to the DRL framework to address this self-similarity and enhance network performance.
Furthermore, the paper introduced the Twin Delayed Deep Deterministic Policy Gradient (TD3) model, finding it more suitable for V2V scenarios compared to existing models. The resulting ITD3 with RNN framework optimized beam management, achieving spectral efficiency optimization while ensuring communication latency and reliability.
Network architecture
This study focused on a V2V network simulated using Anylogic 8.8, emphasizing mobility patterns in a typical highway scenario. Employing DRL for beam management, the network operated in time slots, with vehicles determining beam patterns at the frame's start. Two main phases, beam alignment and beam tracking/data transmission, define V2V communication. The beam alignment phase established the initial connection, while the beam tracking phase-maintained link quality. Unpredictable vehicle movements, modeled with small location errors, may necessitate repeating the beam alignment phase. The study introduced a DRL-assisted beam tracking method to address challenges like uplink transmission failures and blockages by other vehicles in the mmWave frequency band.
Performance evaluation
Researchers addressed the beam management process during the beam alignment and tracking phases in V2V communication. The study employed a DRL approach for optimal beam pattern selection, focusing on the 5G NR FR2 frequency band. It introduced a codebook structure for beamforming and defined the Signal to Interference plus Noise Ratio (SINR) as the metric for link quality. The codebook, comprising multi-level codewords, allowed for adaptive beam pattern adjustments. The problem was formulated as a Markov decision process (MDP), and a DRL-based method was proposed to optimize spectral efficiency by selecting suitable beam patterns under dynamic channel conditions.
DRL
The study utilized Independent Proximal Policy Optimization (IPPO), a multi-agent DRLalgorithm, to address beam management in V2V communication. This approach accommodated the dynamic and distributed nature of V2V networks, with each agent making independent policy updates. The chosen IPPO method differs from Multi-Agent PPO (MAPPO) due to the rapid and changing topology of V2V networks, making centralized control impractical. The state of the environment was defined by the coordinates and velocity of vehicles, incorporating localization and sensing data.
The action was the selection of beam patterns from a codebook, and the reward was based on achieving beam alignment and spectral efficiency. The training process involved generating synthetic V2V communication scenarios using Anylogic 8.8. The performance was evaluated against a baseline method using Extended Kalman Filter (EKF) predictions. The study introduced modifications to the state and reward definitions for improved learning efficiency.
Furthermore, an analysis of temporal dependence in the training data suggested the adoption of RNNs, specifically Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU), to enhance the DRL model's performance. The Iterative Twin Delayed Deep Deterministic (ITD3) framework with GRU was proposed as a modification, and the performance was assessed, indicating improvements in beam tracking accuracy and data transmission capacity.
Results and discussion
The study employed the ITD3 framework with GRU in DRL to address beam management in V2V communication. Results indicated improved average spectral efficiency and tracking accuracy compared to IPPO and EKF methods. ITD3 outperformed EKF in tracking accuracy, but both exhibited similar spectral efficiency. Testing showed ITD3's superior performance, achieving over 90% tracking accuracy even at high transmission frequencies. The model adapted well to varied settings, demonstrating resilience in the face of interference challenges posed by increased vehicle density.
Testing at different carrier frequencies and transmit powers revealed consistent advantages of ITD3 over EKF in spectral efficiency. The ITD3 framework efficiently selected beam patterns, and despite a slight increase in latency, it outperformed the 5G-based method. Future directions include exploring advanced DRL models, real-world testbed implementation, integration with emerging technologies, enhanced training strategies, and holistic network analysis to optimize vehicular communication systems.
Conclusion
In conclusion, this paper introduced a novel DRL approach, specifically the ITD3 framework with GRU, for effective beam management in V2V communication. Addressing challenges like short time slots, high vehicle velocities, and frequency-related path loss, the proposed method surpassed existing IPPO and EKF-based techniques. The ITD3 framework exhibited superior spectral efficiency, tracking accuracy, and lower latency in simulations, highlighting its efficacy in optimizing V2V communication under dynamic and challenging conditions.