Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method

In an article recently published in the journal Drones, researchers demonstrated the effectiveness of the stepwise soft actor-critic (SeSAC) method for autonomous flight control of unmanned aerial vehicles (UAVs).

Study: Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method. Image credit: Spiffy Digital Creative/Shutterstock
Study: Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method. Image credit: Spiffy Digital Creative/Shutterstock

Background

UAVs are utilized extensively in smart agriculture, networking, entertainment, and defense. UAVs are also used in missions that are inaccessible or dangerous to humans, such as counter-terrorism, natural disaster risk measurement, surveillance, and search and rescue.

Although the technology for UAV control has advanced steadily in recent years, the primary UAV control techniques still depend on preprogramming or wireless remote operation. Wireless remote operation poses the risk of real-time communication link failure. Additionally, the controllable distance in such a mode of operation is limited.

Preprogramming does not have any limitations on communication links. However, preprogramming cannot adapt to sudden/unexpected events and can only be used in a few missions, which necessitated the development of autonomous flight algorithms to maximize the potential and address the existing limitations of UAVs.

Several studies have investigated the feasibility of approaches based on rule-based and reinforcement learning (RL) techniques to realize autonomous flight for UAVs, specifically fixed-wing UAVs. Although the traditional autonomous flight approach for UAVs has depended on rule-based approaches, these approaches enable the UAV to perform only predetermined maneuvers under specific conditions/situations, increasing the challenges of responding to unexpected/new situations properly.

In recent years, studies have assessed the feasibility of RL techniques that can quickly make decisions in uncertain/unpredictable situations for autonomous fixed-wing UAV flights. For instance, Bayesian optimization and multi-agent deep deterministic policy gradient were used to optimize the network formation and trajectory of UAVs for rapid transmission of data and reduction in transmission delay and energy consumption when several UAVs are employed as repeaters within a wireless network.

Although studies have proposed RL-based approaches for maintaining the aircraft altitude and landing under different flight conditions to realize autonomous flight, most studies were performed in overly simplified simulation environments or had a limited space of action, which necessitated further verification of these approaches in more realistic and diverse environments/real-world UAV operation scenarios.

Autonomous UAV flight in realistic environments

In this paper, the authors proposed a novel SeSAC method for efficient learning of fixed-wing UAVs in action space environments and continuous states to address the limitations of previous studies on training these UAVs for autonomous flight and to realize autonomous real-world UAV operation in different complex environments.

The SeSAC algorithm performs stepwise learning to overcome the learning inefficiency caused by attempting difficult tasks from the beginning. A positive buffer was initially added to past success experiences for effectively learning the high-dimensional action spaces and state of the autonomous flight environments.

Subsequently, a new technique was applied that suppressed alpha, the temperature parameter encouraging exploration, after achieving the goal of maintaining a stable performance of the SAC algorithm. Eventually, the novel SeSAC was proposed that assigns simplified/easier tasks/missions at the start of the training and then increases the difficulty levels gradually/in a stepwise manner during training to achieve the desired goal successfully.

Optimal actions, states, and rewards were designed, and past states were integrated into the learning process using a one-dimensional (1D) convolutional layer to train the UAV agent in six degrees of freedom (6-DOF) effectively. Researchers implemented the SeSAC algorithm in realistic simulation environments built using JSBSim, a 6-DOF flight dynamics model, in place of simplified environments. Specifically, experimental scenarios with two separate missions, including a precise approach mission (PAM) and a moving target chasing mission (MTCM), were constructed to evaluate the effectiveness of the proposed method.

In PAM, which involves disaster management, the UAV agent must access a proper point to perform firefighting activities or enter the disaster site, while in MTCM, which involves a counter-terrorism mission, an agent must approach the moving target at a specific distance to reduce the threat.

Comparative experiments on proximal policy optimization (PPO), conventional SAC, SAC + positive buffer (SAC-P), SAC-P + cool-down alpha (SAC-PC), and SAC-PC + stepwise learning (SeSAC) were performed to verify the effects of positive buffer, cool-down alpha, and SeSAC.

Significance of the study

The UAV agent trained using the proposed SeSAC algorithm successfully completed missions in both challenging scenarios with a higher average reward and fewer learning epochs. The SeSAC approach outperformed the baseline conventional SAC and PPO approaches based on scores and overall number of first convergence episodes and successful episodes, indicating stable learning results and faster convergence. 

SAC-PC and SAC-P converged in 1951 and 1602 episodes, respectively, while the traditional SAC and PPO did not converge at all when the First convergent episode was used as an indicator to assess the learning efficiency. However, SeSAC converged to the desired score in only 660 episodes, displaying the proposed methodology's effectiveness. All three techniques used in SeSAC, including stepwise learning, cool-down alpha, and positive buffer, individually contributed to the performance stability and improvement.

To summarize, the findings of this study demonstrated the feasibility of using the SeSAC-based approach for autonomous flight control of fixed-wings UAVs and other UAVs, including flexi-wing/rotary-wing UAVs. However, more research is required to develop a new approach that allows UAV agents to adapt to different situations by training complex missions individually as modular units and then connecting those units.

Journal reference:
Samudrapom Dam

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Dam, Samudrapom. (2023, August 27). Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method. AZoAi. Retrieved on July 07, 2024 from https://www.azoai.com/news/20230827/Enhancing-Autonomous-Flight-Control-of-UAVs-Using-Stepwise-Soft-Actor-Critic-Method.aspx.

  • MLA

    Dam, Samudrapom. "Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method". AZoAi. 07 July 2024. <https://www.azoai.com/news/20230827/Enhancing-Autonomous-Flight-Control-of-UAVs-Using-Stepwise-Soft-Actor-Critic-Method.aspx>.

  • Chicago

    Dam, Samudrapom. "Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method". AZoAi. https://www.azoai.com/news/20230827/Enhancing-Autonomous-Flight-Control-of-UAVs-Using-Stepwise-Soft-Actor-Critic-Method.aspx. (accessed July 07, 2024).

  • Harvard

    Dam, Samudrapom. 2023. Enhancing Autonomous Flight Control of UAVs Using Stepwise Soft Actor-Critic Method. AZoAi, viewed 07 July 2024, https://www.azoai.com/news/20230827/Enhancing-Autonomous-Flight-Control-of-UAVs-Using-Stepwise-Soft-Actor-Critic-Method.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Emergency Communication with UAVs: ISATR Algorithm Optimization