In an article published in the journal Heliyon, researchers addressed the challenge of modeling uncertainty in power systems with large-scale intermittent renewable sources like wind and solar. They introduced a hierarchical deep reinforcement learning (HDRL) approach to handle this complexity by combining global reinforcement learning with local heuristic algorithms.
The HDRL scheme effectively managed the sparse reward problem and high-dimensional spaces in DRL, improving both decision-making speed and efficiency in power system economic dispatch under uncertain conditions.
Background
Power systems integrated with large-scale intermittent renewable sources, like wind and photovoltaic generation, face significant challenges due to their inherent volatility and uncertainty. Accurately modeling these uncertainties and making optimal dispatch decisions is complex.
Previous work in this area has focused on traditional methods that struggle with the high-dimensional nature of the problem and the sparse reward issues inherent in DRL. These methods often fail to efficiently handle the complexity of uncertainty and provide timely decision-making.
To address these gaps, this paper introduced an HDRL scheme. By decomposing the problem into a global RL stage and a local heuristic algorithm stage, the HDRL approach effectively managed system uncertainties and improved the speed and efficiency of decision-making. Simulation results confirmed that this approach enhanced performance in both deterministic and uncertain scenarios, filling the gaps left by earlier methods.
HDRL for Uncertainty Management in Power Systems
The researchers introduced an HDRL scheme to address economic dispatch problems in power systems with uncertainty due to intermittent renewable sources. The HDRL approach was divided into two stages, the global stage and the local stage. The global stage used DRL for long-term strategy optimization, focusing on energy storage (ES) strategies and system stability. The local stage employed particle swarm optimization (PSO) to optimize immediate system costs and outputs, addressing high-dimensional space and sparse reward challenges.
HDRL was trained using historical data, with the DRL agent learning optimal ES strategies based on system states, including load predictions and ES state-of-charge (SOC). The local PSO algorithm provided real-time optimization of operational costs, which served as feedback for the DRL agent. This hierarchical structure enabled continuous learning and adaptation to system uncertainties while simplifying reward function design.
The HDRL scheme improved decision-making speed and accuracy, overcoming limitations of traditional DRL methods by combining DRL with PSO to handle the complexity of economic dispatch under varying conditions.
Case Study and Performance Analysis
The case study evaluated the HDRL scheme's effectiveness in managing a power system with high renewable energy integration. The researchers used a modified Institute of Electrical and Electronics Engineers (IEEE) 30-node system, featuring six generators, two wind turbines, two photovoltaic units, and two ES units. The system operated over a 24-hour period with hourly steps and included technical parameters for ES and intermittent generation.
The HDRL scheme's training involved 2,000 episodes, during which the agent explored and learned to optimize system operations. Initial instability in rewards due to constraint violations decreased as the agent learned optimal strategies, resulting in more stable performance.
Three cases compared HDRL against mixed integer linear programming (MILP) and PSO under deterministic conditions. HDRL performed competitively, closely matching PSO in cost efficiency and outperforming MILP. In scenarios with system uncertainties, such as, fluctuating renewable outputs, HDRL showed a significant reduction in operational and risk costs compared to systems without ES and PSO.
The HDRL method demonstrated adaptability to varying source-load fluctuations, providing optimal scheduling strategies quickly due to its trained neural network, compared to the longer recalculation times required by PSO. Efficiency analysis shows HDRL's online decision-making was faster than PSO, with a notable reduction in computation time, making it a practical and effective solution for balancing energy production and consumption in complex systems.
Conclusion
In conclusion, an HDRL scheme was introduced to address the economic dispatch challenges in power systems with significant renewable energy integration. By combining global RL with local heuristic algorithms, HDRL effectively managed uncertainties related to intermittent renewable sources. The HDRL approach improved decision-making speed and efficiency by decomposing the problem into global and local stages, optimizing long-term strategies and immediate system costs, respectively.
The case study demonstrated that HDRL performed competitively in both deterministic and uncertain environments. It closely matched the cost efficiency of PSO and significantly outperformed MILP. HDRL also showed superior adaptability to source-load fluctuations and achieved faster, more efficient decision-making compared to PSO.
While HDRL offered substantial improvements, the initial training phase could be lengthy and unstable. Future research will focus on refining the HDRL algorithm to enhance learning efficiency and stability, particularly in handling sparse reward challenges.