In a paper published in the journal Actuators, researchers introduced a deep reinforcement learning (DRL) model that combined pushing and grasping actions to improve robotic manipulation in cluttered environments.
The model used two convolutional neural networks (CNNs), push-net and grasp-net, to predict actions from heightmap images. It achieved a high grasp success rate, significantly outperforming traditional grasp-only methods. The model demonstrated strong generalization across various scenarios, showing potential for real-world applications.
Background
Past work in robotic manipulation highlighted advancements in grasping techniques, especially in cluttered environments, but significant gaps remained. Studies often needed help handling extreme clutter, which involves more integration of non-prehensile actions like pushing, and they faced difficulties achieving robust generalization across diverse object shapes and sizes. The challenge of transferring learned behaviors from simulations to real-world scenarios persisted.
Simulated Robotic Manipulation
The section details a simulation setup involving a universal robot 5 (UR5) arm from UR, equipped with a two-finger parallel gripper and an Intel real sense depth camera D435. The team used a robotic arm to test a proposed manipulation model in various environments, including cluttered and well-ordered configurations with a mix of known and novel objects.
The hardware included a UR5 arm with six degrees of freedom, a parallel jaw gripper, and a red, green, blue, depth (RGB-D) camera to capture scene data, processed on a system with an Intel Core i7 processor.
The simulation was conducted using CoppeliaSim and the PyTorch framework. The environment was controlled, and the robot's tasks included identifying, planning, and executing grasps. The UR5's mathematical model was based on Denavit-Hartenberg (D-H) parameters for forward kinematics, allowing precise end effector control. Data collection involved capturing RGB-D images and converting them into height maps for processing using the densely connected convolutional networks (DenseNet-121) model.
DRL was employed to optimize the robot's actions, using a Markov decision process to model state transitions and rewards. The agent learned through Q-learning, with experience replay and a target network to enhance training efficiency. The DenseNet-121 architecture processed heightmaps to predict Q-values for pushing and grasping actions, guiding the robot in its decision-making. The learning process was designed to maximize long-term rewards, with the agent selecting actions based on the highest predicted Q-values from the model.
Model Performance Summary
The results section presents the findings from the proposed model's training and testing sessions. After training the model with self-supervised DRL, it was evaluated in various test scenarios featuring different levels of clutter and novel objects. The model's performance was assessed in cluttered environments, challenging well-ordered configurations, and with novel objects.
The proposed model demonstrated significant improvements in cluttered environments over traditional grasping-only policies. While conventional approaches achieved a grasping success rate of 60%, the new model reached 87%, thanks to the effective integration of pushing and grasping actions. The model's approach involved pushing actions to create space around the target object, facilitating a better grasp.
The introduction of rewards for successful pushes further enhanced grasping efficiency. Comparison graphs indicated that the proposed model outperformed other methods, including those without rewards for pushing or using stochastic gradient descent (SGD) without momentum.
Testing in environments with randomly arranged objects showed that the model could effectively handle dense clutter. With an increased number of objects, the model achieved a grasp success rate of 50.5% and a grasp-to-push ratio of 77.1% in highly cluttered scenes. This performance underscores the model's ability to generalize to various configurations, demonstrating robustness in managing complex environments.
The model excelled in challenging, well-ordered configurations, where objects were stacked closely and placed in difficult orientations. The performance in these test cases, including high grasp success and completion rates, further validated the model's capability. Additionally, the model proved its generalization ability by maintaining high performance with novel objects not seen during training. It includes grasping items like bottles, bananas, and screwdrivers, highlighting the model's versatility and robustness for real-world applications.
Conclusion
To sum up, the proposed model improved grasping in cluttered environments by integrating pushing and grasping actions, achieving an 87% success rate. This approach surpassed grasping-only policies, no-reward-for-pushing policies, and SGD-without-momentum strategies by 27%, 16%, and 8%, respectively.
The model demonstrated robustness against lighting variations through RGB-D camera data and PyTorch data augmentation. Future work will incorporate domain randomization, transfer learning, and robust reward structures to enhance real-world applicability. Additionally, further improvements will be essential to address object properties like fragility and deformability.
Journal reference:
- Birhanemeskel Alamir Shiferaw, Agidew, T. F., et al. (2024). Synergistic Pushing and Grasping for Enhanced Robotic Manipulation Using Deep Reinforcement Learning. Actuators, 13:8, 316–316. DOI: 10.3390/act13080316, https://www.mdpi.com/2076-0825/13/8/316