In an article recently published in the journal Npj Robotics, researchers proposed an online model update algorithm leveraging a self-attention mechanism embedded in neural networks that can be operated directly in real-world robot systems for risk-sensitive robot control.
Background
The dynamics and kinematics of robots are crucial for precision control as they ensure stable and effective task completion. Several robot control schemes for various tasks, such as prioritized control and motion optimization, depend on models, which increases the importance of accurate model construction in conventional control. However, the calculation of the dynamic features of each component involves errors in the model and is typically tedious.
Model-based reinforcement learning (RL) methods can be suitable for robotics applications such as robot control owing to the data usage efficiency and elimination of computational burdens related to iterative policy improvement and evaluation. Although model-based methods possess several advantages, the control policy generated using them can still cause undesired output behaviors/unexpected motions during policy learning.
Thus, directly applying RL to real-world robots without any preprocessing in a simulated environment represents a significant risk. RL methods implementable in real-world applications, such as uncertainty-based modeling and simulation-to-real (sim2real) methods, must be improved to reduce the risk.
For instance, the Kullback–Leibler divergence can be added to the optimization objective to limit unexpected changes in the desired trajectory. Similarly, a Gaussian process-based model can provide the degrees of uncertainties during model learning to determine the controller's robustness.
The proposed approach
In this study, researchers proposed an online model update algorithm that can be operated directly in real-world robot systems for risk-sensitive robot control. The study's objective was to display the use of the self-attention mechanism in model learning for real-world robotics applications without any simulation. Two model types, the dynamics and the kinematics models, demonstrated the robot motion behavior.
Overall, four neural networks were used in the online model identification algorithm, including two networks for modeling the kinematics and dynamics, and self-attention networks of the dynamics and kinematics. The approximated model consists of redundant self-attention paths to the time-independent dynamics and kinematics models, allowing the detection of abnormalities by calculating the self-attention matrices’ trace values. This approach decreases the randomness during the exploration process and allows the rejection of detected perturbations while updating the model.
The algorithm leveraged a self-attention mechanism embedded in neural networks for the dynamics and kinematics models of the target system. The self-attention layer was used in both models to address the issues that can emerge due to direct RL method application in a real-world environment.
Specifically, the kinematics model’s self-attention layer determines the exploration region through cost function adjustment for the movement range of the robot, while the dynamics model’s self-attention layer detects possible perturbations during the learning process and manages the dataset quality.
In the kinematics model, incorporating the self-attention layer could provide more predictable behaviors and improve control performance. Although the kinematics model is not entirely time-dependent, constructing a time-based self-attention chain enables a closer analysis of the data with low self-referential rates. Additionally, the kinematics model’s self-attention matrix can determine the desired trajectory scaling.
In the dynamics model, constructing a self-attention chain enables the detection of perturbations in the robot due to unintended external forces. The dynamics model learned in the proposed approach considers the relation between the robot’s configuration states and control inputs while excluding the impacts of external forces.
The time series of the input was connected to the encoder network, which was serially connected to the decoder, using feedforward neural networks (FNNs) to implement the self-attention mechanism. The self-attention layer existed between the decoder and the encoder networks.
Experimental evaluation and validation
Researchers validated the proposed method in simulation and using real-world robotic systems in three application scenarios: gait generation of a legged robot, kinesthetic teaching and behavior cloning of an industrial robotic arm, and path/trajectory tracking of a soft robotic manipulator.
A virtual robot control environment/PyBullet was utilized to demonstrate the effectiveness of the proposed approach in the simulation. These demonstrations were achieved without any simulation or prior knowledge of the models, which indicated the universality of the proposed method for different robotics applications.
In the first application scenario/trajectory tracking of a soft robotic manipulator, the experiment results displayed the feasibility of using the proposed algorithm as an accelerator for the RL algorithm for solving high-level tasks, such as interaction, grasping, or tasks involved with model-based RL problems.
In the second application scenario/autonomous manipulation of a robotic arm, the proposed algorithm successfully performed the online complex trajectory tracking task with assistance from a human expert. In the third application scenario/gait training of the quadruped robot, the proposed approach realized the desired locomotion only in three minutes.
Journal reference:
- Kim, D., Lee, S., Hong, T. H., Park, Y. (2023). Exploration-based model learning with self-attention for risk-sensitive robot control. Npj Robotics, 1(1), 1-15. https://doi.org/10.1038/s44182-023-00006-5, https://www.nature.com/articles/s44182-023-00006-5