In an article recently published in the journal Drones, researchers demonstrated the effectiveness of the stepwise soft actor-critic (SeSAC) method for autonomous flight control of unmanned aerial vehicles (UAVs).
Background
UAVs are utilized extensively in smart agriculture, networking, entertainment, and defense. UAVs are also used in missions that are inaccessible or dangerous to humans, such as counter-terrorism, natural disaster risk measurement, surveillance, and search and rescue.
Although the technology for UAV control has advanced steadily in recent years, the primary UAV control techniques still depend on preprogramming or wireless remote operation. Wireless remote operation poses the risk of real-time communication link failure. Additionally, the controllable distance in such a mode of operation is limited.
Preprogramming does not have any limitations on communication links. However, preprogramming cannot adapt to sudden/unexpected events and can only be used in a few missions, which necessitated the development of autonomous flight algorithms to maximize the potential and address the existing limitations of UAVs.
Several studies have investigated the feasibility of approaches based on rule-based and reinforcement learning (RL) techniques to realize autonomous flight for UAVs, specifically fixed-wing UAVs. Although the traditional autonomous flight approach for UAVs has depended on rule-based approaches, these approaches enable the UAV to perform only predetermined maneuvers under specific conditions/situations, increasing the challenges of responding to unexpected/new situations properly.
In recent years, studies have assessed the feasibility of RL techniques that can quickly make decisions in uncertain/unpredictable situations for autonomous fixed-wing UAV flights. For instance, Bayesian optimization and multi-agent deep deterministic policy gradient were used to optimize the network formation and trajectory of UAVs for rapid transmission of data and reduction in transmission delay and energy consumption when several UAVs are employed as repeaters within a wireless network.
Although studies have proposed RL-based approaches for maintaining the aircraft altitude and landing under different flight conditions to realize autonomous flight, most studies were performed in overly simplified simulation environments or had a limited space of action, which necessitated further verification of these approaches in more realistic and diverse environments/real-world UAV operation scenarios.
Autonomous UAV flight in realistic environments
In this paper, the authors proposed a novel SeSAC method for efficient learning of fixed-wing UAVs in action space environments and continuous states to address the limitations of previous studies on training these UAVs for autonomous flight and to realize autonomous real-world UAV operation in different complex environments.
The SeSAC algorithm performs stepwise learning to overcome the learning inefficiency caused by attempting difficult tasks from the beginning. A positive buffer was initially added to past success experiences for effectively learning the high-dimensional action spaces and state of the autonomous flight environments.
Subsequently, a new technique was applied that suppressed alpha, the temperature parameter encouraging exploration, after achieving the goal of maintaining a stable performance of the SAC algorithm. Eventually, the novel SeSAC was proposed that assigns simplified/easier tasks/missions at the start of the training and then increases the difficulty levels gradually/in a stepwise manner during training to achieve the desired goal successfully.
Optimal actions, states, and rewards were designed, and past states were integrated into the learning process using a one-dimensional (1D) convolutional layer to train the UAV agent in six degrees of freedom (6-DOF) effectively. Researchers implemented the SeSAC algorithm in realistic simulation environments built using JSBSim, a 6-DOF flight dynamics model, in place of simplified environments. Specifically, experimental scenarios with two separate missions, including a precise approach mission (PAM) and a moving target chasing mission (MTCM), were constructed to evaluate the effectiveness of the proposed method.
In PAM, which involves disaster management, the UAV agent must access a proper point to perform firefighting activities or enter the disaster site, while in MTCM, which involves a counter-terrorism mission, an agent must approach the moving target at a specific distance to reduce the threat.
Comparative experiments on proximal policy optimization (PPO), conventional SAC, SAC + positive buffer (SAC-P), SAC-P + cool-down alpha (SAC-PC), and SAC-PC + stepwise learning (SeSAC) were performed to verify the effects of positive buffer, cool-down alpha, and SeSAC.
Significance of the study
The UAV agent trained using the proposed SeSAC algorithm successfully completed missions in both challenging scenarios with a higher average reward and fewer learning epochs. The SeSAC approach outperformed the baseline conventional SAC and PPO approaches based on scores and overall number of first convergence episodes and successful episodes, indicating stable learning results and faster convergence.
SAC-PC and SAC-P converged in 1951 and 1602 episodes, respectively, while the traditional SAC and PPO did not converge at all when the First convergent episode was used as an indicator to assess the learning efficiency. However, SeSAC converged to the desired score in only 660 episodes, displaying the proposed methodology's effectiveness. All three techniques used in SeSAC, including stepwise learning, cool-down alpha, and positive buffer, individually contributed to the performance stability and improvement.
To summarize, the findings of this study demonstrated the feasibility of using the SeSAC-based approach for autonomous flight control of fixed-wings UAVs and other UAVs, including flexi-wing/rotary-wing UAVs. However, more research is required to develop a new approach that allows UAV agents to adapt to different situations by training complex missions individually as modular units and then connecting those units.
Journal reference:
- Hwang, H. J., Jang, J., Choi, J., Bae, J. H., Kim, S. H., Kim, C. O. (2023). Stepwise Soft Actor–Critic for UAV Autonomous Flight Control. Drones, 7(9), 549. https://doi.org/10.3390/drones7090549, https://www.mdpi.com/2504-446X/7/9/549