In a paper published in the journal Ocean Engineering, researchers described using deep reinforcement learning (DRL) to optimize flow control around two square cylinders—one larger and one smaller in front. They achieved minimal flow oscillation by adapting the front cylinder's position.
The authors replaced flow simulations with a feature detection model based on a convolutional neural network to speed up the time-consuming training. This accelerated approach can be applied to similar scenarios in engineering projects.
Related Work
Past work has explored wake flow fluctuation suppression using various flow control methods, including proportional-integral-derivative (PID) controllers and DRL. DRL has shown promise in managing complex flow scenarios, with enhancements such as accelerated training through convolutional neural networks and model-based RL.
Notable advancements include using DRL for optimal flow control in simulations and experiments and combining DRL with reduced-order models for efficiency. These techniques are crucial for improving flow stability and efficiency in practical applications.
Flow Optimization Approach
The flow field geometry involved an inlet boundary on the left and an outflow boundary on the right, with the main square cylinder positioned at the center of the domain. The front square cylinder was movable, and the flow was simulated with a Reynolds number of 100.
The team used the lattice Boltzmann method (LBM) with a BGK collision model to compute velocity fields. The grid resolutions showed minimal variance between the original and refined, thus validating the sufficiency of the initial resolution. Comparisons confirmed the accuracy of the numerical method.
DRL was utilized to optimize the position of the front cylinder to minimize wake flow fluctuations. The deep deterministic policy gradient (DDPG) algorithm was chosen for this continuous action problem and implemented using Google TensorFlow.
The state variable included the position and velocity of the front cylinder, while actions involved applying impulses to adjust its position. Rewards were designed to minimize velocity fluctuations in the wake, with penalties for actions that moved the cylinder out of bounds. The DRL process involved initializing the state, exploring actions, and updating policies based on feedback.
A convolutional neural network (CNN) was used as a surrogate model to predict flow fields and address the computational inefficiencies of flow simulations. The CNN architecture, comprising multiple convolutional and dense layers, was designed to detect flow features and replace the time-consuming LBM simulations quickly.
The input consisted of flow obstacle shapes, and the output was a reward metric reflecting flow fluctuations. The training involved a dataset of 1248 scenarios, with 10% reserved for testing. The analysts integrated this CNN model into the DRL framework to enhance training efficiency.
The CNN-based model effectively predicted flow field characteristics, accelerating the DRL training process by replacing LBM simulations. The model’s architecture was optimized for handling local flow features, and it was trained on a comprehensive dataset to ensure accuracy.
The combined DRL-CNN framework included periodic updates from LBM simulations to maintain precision, improving the overall efficiency of the RL process and enabling faster, more effective training.
Training Efficiency Boosted
The training results for the one-dimensional case involved 2000 episodes, tracking the average reward per episode (ep-reward) to assess the learning progress. The ep-reward initially showed significant fluctuations in the early stages, reflecting the agent's exploration and frequent boundary penalties.
As training progressed, the ep-reward improved, indicating that the agent was reinforcing its learned actions and reducing boundary-related penalties, though it continued to explore different strategies. By the later stages, the agent achieved a stable ep-reward, demonstrating effective training and optimal positioning of the front cylinder, as evidenced by high rewards and improved flow stability.
Integrating a CNN with the DRL algorithm significantly enhanced training efficiency and accuracy. The CNN-based model, designed to predict flow features and quantities, reduced the computational burden of LBM simulations.
The CNN model effectively learned to predict flow field characteristics, improving accuracy over epochs. This model was integrated with the DRL framework to expedite the training process. It led to a significant reduction in training time and improved flow forecast precision.
The combined DRL and CNN approach notably improved training efficiency and flow feature detection. The DRL model's training time was reduced to 10.5% of the original with the CNN surrogate. This integration allowed the model to handle more complex scenarios and perform better. The two-way coupling of DRL with the CNN-based surrogate model proved to be a highly efficient method for optimizing flow control and reducing wake flow fluctuations.
Conclusion
To sum up, this study introduced a two-way coupling approach integrating a CNN-based flow feature model with DRL, enhancing efficiency and accuracy. The DRL method successfully stabilized wake flow around square cylinders.
At the same time, the CNN model effectively served as a surrogate, reducing training time to 10.5% of the original and improving prediction accuracy by 31.4%. This method showed promise for flow control problems and potential ocean engineering and aerodynamics applications. Future research could explore super-resolution techniques to expand its applicability.