In a paper published in the journal Remote Sensing, researchers addressed the challenges presented by the space-air-ground information network (SAGIN), which provided extensive global connectivity across diverse sensors and devices, leading to significant data generation. Traditional deep learning methods encountered limitations due to the need to transmit large data volumes to central servers, raising privacy concerns. Federated learning (FL) emerged as a solution but faced delays and energy consumption issues.
Researchers established delay and energy consumption models to tackle these and introduced a strategic node selection approach to minimize training costs. They proposed an innovative double deep Q network (DDQN)-based algorithm, low-cost node selection in FL (LCNSFL), enabling edge servers to select optimal devices for federated training. Simulation-based comparisons demonstrated LCNSFL's superior performance, highlighting its efficacy for practical applications in networks like SAGIN.
Related Work
Past work has extensively explored the challenges and opportunities arising from the proliferation of Internet of Things (IoT) devices and the evolution of communication technologies like 6G. Researchers have highlighted the significance of satellite communication (SatCom) networks in bridging connectivity gaps, particularly in remote areas.
Additionally, the application of FL has gained traction, offering a distributed approach to artificial intelligence (AI) training while preserving data privacy. However, existing FL methodologies face hurdles, including high latency and energy consumption, especially in dynamic network environments. Prior studies have proposed various techniques to address these issues, such as deep deterministic policy gradient (DDPG) and fuzzy logic-based client selection. Nonetheless, most approaches assume stable network connections, which may not hold in practical scenarios.
Investigating FL Energy Consumption
The paper investigates FL's time and energy consumption dynamics in IoT environments, considering various IoT devices and fluctuating channel states. It begins by defining the time and energy consumption metrics and subsequently focuses on selecting nodes to minimize consumption during each federated training step. Researchers propose a DDQN-based LCNSFL algorithm to address this challenge, dynamically balancing time and energy costs.
Researchers depicted the architecture of the FL system, illustrating the involvement of satellites, unmanned aerial vehicles (UAVs), ground base stations, and IoT devices like cameras and robotic arms. These devices transmit local model parameters to an edge cloud server, aggregating them to update the global federation model. Moreover, a reinforcement learning intelligence (RLI) component aids in strategically selecting devices for participation in the training process, optimizing resource utilization.
The paper then delves into the detailed model aggregation process and the energy consumption framework. It categorizes the time and energy requirements into computational time, data transmission duration, and idle device waiting periods. The energy consumption model considers central processing unit (CPU) cycles, CPU frequency, channel gain, noise power, and transmission power. The paper provides equations to calculate the energy consumed by each device during local model training, model transmission, and idle waiting periods. Additionally, researchers formulate the optimization problem for node selection to minimize training time and energy consumption costs while satisfying constraints related to device operating frequency, bandwidth allocation, and the number of selected devices.
Challenges in FL Optimization
Solving the optimization problem for node selection in FL becomes immensely challenging due to the complexity of constraints and the unpredictable nature of network states for each device. The DDQN algorithm dynamically balances time and energy costs, forming the basis of the proposed node selection scheme, LCNSFL. Researchers abstracted the problem as a Markov Decision Process (MDP) comprising system states, action spaces, policies, reward functions, and adjacent states.
The system state encompasses parameters such as data transmission rate, operating frequency, signal transmission power, and the number of device samples. This state evolves dynamically due to device heterogeneity and network instability, requiring multiple data samplings before each round of federated training. The action space consists of binary variables indicating device selection status, while the policy maps state to actions, aiming to maximize expected rewards. Researchers designed the reward function to minimize the weighted sum of time and energy costs, fostering efficient training rounds.
The DDQN reinforcement learning model tackles the FL node selection problem, enabling dynamic trade-offs between energy consumption and training time. Specifically tailored for FL within IoT scenarios, DDQN addresses the challenge of processing continuous states and generating discrete actions through neural networks. It offers algorithmic simplicity, sample complexity, and parameter tuning flexibility, mitigating issues like Q-value overestimation. Deploying edge cloud servers near production or sensing devices optimizes data processing efficiency, reducing transmission delays and enhancing system responsiveness.
The DDQN comprises a Q network and a target Q network, periodically updating the latter to improve stability. During training, separating action selection and value estimation ensures efficient learning. The loss function guides parameter updates via gradient descent to minimize temporal difference errors and improve network estimation accuracy. Through iterative training steps, the DDQN algorithm converges towards an optimal node selection strategy, minimizing overall training costs while maximizing efficiency.
Experimental Evaluation Summary
The experimental evaluation of the LCNSFL algorithm for FL demonstrates its efficacy in reducing both time and energy costs while maintaining the high accuracy of the global model. Compared to traditional node selection strategies like random selection and best network quality (Bnq) selection, LCNSFL significantly minimizes time, energy, and weighted costs, achieving reductions of 63.7%, 25.1%, and 32.9% for the first 50 training rounds. Moreover, LCNSFL ensures robustness and resilience in dynamic network scenarios, converging quickly to high accuracy levels for the global model while optimizing resource utilization through precise node selection.
Conclusion
In summary, the LCNSFL algorithm optimized FL in SAGIN environments by dynamically selecting device subsets to minimize time and energy costs. Through simulation experiments, LCNSFL demonstrated superior performance to traditional strategies like random and Bnq selection, achieving effective convergence and reducing resource consumption without compromising accuracy in dynamic network scenarios.