Multi-fingered robots equipped with tactile sensing enhance precision and dexterity in object manipulation. In a recent submission to the arXiv* server, researchers introduce the Tactile Adaptation from Visual Incentives (TAVI) framework, which optimizes dexterous actions using vision-based rewards.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Background
In the context of human development, dexterity has historically played a pivotal role, enabling effective tool creation and utilization. Although two-fingered grippers have received a lot of attention in robotics, they lack the intrinsic characteristics needed for dexterous manipulation of the fingertip. These additional capabilities, while broadening the range of achievable tasks, introduce higher-dimensional actions. In tasks involving visual occlusion, effective use of tactile data becomes crucial, an aspect that has received limited attention in dexterity research.
Various frameworks for training dexterous policies exist, including model-based control and simulation-to-reality transfer (sim2real). Yet, simulating rich tactile sensing remains a challenge. Consequently, prior work in multi-fingered dexterity often relies solely on visual feedback or binary-touch signals.
TAVI framework
In the context of dexterous manipulation and tactile sensing, researchers have long sought to control multi-fingered robots. Recent approaches involve learning policies in simulations and transferring them to the real world, often lacking fine-grained touch sensors. Physics-based grasping models have been explored, but they are susceptible to sensor-controller noise. Tactile sensors such as GelSight and skin-like sensors mitigate this issue, offering high-resolution tactile data for dexterous policy learning.
Researchers proposed TAVI, a novel framework for tactile-based dexterity. TAVI, requiring only one successful demonstration, generalizes to new object configurations and learns to correct behaviors from failures. It accomplishes this by continuously adjusting the dexterity policy and employing an optimal transport (OT) technique to maximize the match between sensory observations given by the policy and human demonstrations. Unlike traditional inverse reinforcement learning (IRL), TAVI employs visual rewards exclusively, addressing the spatial limitations of tactile signals. A contrastive learning objective enhances visual rewards.
The TAVI process consists of several essential steps:
Robot Setup and Expert Data Collection: TAVI employs a robot system comprising a 6-DOF Jaco arm and a 16-DOF AllegroHand, equipped with 15 XELA uSkin tactile sensors and an RGB camera for visual data capture. Data collection utilizes the teaching dexterity framework (HOLO-DEX), which synchronizes data from various sources, including arm and hand states, tactile information, and image information, aligned using timestamps.
Representation Learning for Vision and Tactile Observations: To reduce the reliance on explicit state estimation, TAVI employs self-supervised learning to map high-dimensional observations to a lower-dimensional latent state. It utilizes an image encoder trained with noise-contrastive estimation (InfoNCE) loss and a change loss to predict state differences between nearby observations. The tactile encoder is pre-trained on tactile-based play data.
Policy Learning through Online Imitation: TAVI utilizes the Fast Imitation of Skills from Humans (FISH) imitation algorithm on a single demonstrated trajectory, with the base policy as an open-loop rollout of the expert demonstration. Visual information is used to calculate the OT reward, and tactile information is excluded to avoid suboptimal behaviors. The exploration strategy enables selective learning in the action space using additive OU noise.
Additionally, TAVI matches the last 10 frames of the robot trajectory to the last frame of the expert trajectory to calculate rewards, allowing the model to learn task completion without immediate feedback. TAVI can enable or disable learning on subsets of the action space, leveraging additive OU noise for effective exploration.
Experiments and results
Six dexterous tasks are explored, including peg insertion, sponge flipping, eraser turning, bowl unstacking, plier picking, and mint opening. Each task involves precise control and manipulation of objects, focusing on different fingers and joints. TAVI's performance is compared to several baselines, including dexterity from touch (T-DEX), behavior transformers (BC-BeT), Tactile Only, Tactile and Image Reward, and No Tactile Information. Evaluation involves robot performance and visual representation quality assessment.
The TAVI undergoes extensive experimental evaluation to address several key questions.
- TAVI significantly outperforms the baselines in terms of task success rates.
- While some baselines struggle with specific tasks or fail to adapt when objects move, TAVI demonstrates robust performance across tasks, highlighting the importance of visual feedback.
- Visual representations are evaluated using different encoders. TAVI's encoder, combined with contrastive and joint-prediction loss, achieves superior results.
- Experiments reveal that including all frames in the reward calculation can lead to suboptimal results, as the policy may converge to local minima. Selectively matching frames in the reward calculation improves performance.
- TAVI's ability to generalize to unseen objects is assessed. It demonstrates the capacity to adapt to new objects in some cases but faces challenges when object shapes or properties change substantially.
- TAVI is evaluated for sequencing sub-policies in long-horizon tasks, demonstrating robustness when sequencing tasks with different objectives, allowing for more extended-horizon policies.
- TAVI's robustness to changes in camera view is explored. It performs well with small variations but experiences a drop in performance with larger variations, highlighting the need for consistent representations across multiple views.
Conclusion
In summary, the current study introduces TAVI, which enhances dexterous manipulation using tactile feedback and optimal transport imitation learning. It outperforms visual-only approaches but has limitations. Firstly, the observational representation lacks historical context. Secondly, performance depends on camera views. Thirdly, automating the exploration mechanism is needed for broader applications. These areas offer exciting opportunities for TAVI's expansion.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Irmak Guzey, Yinlong dai, Ben Evans, Soumith Chintala, and Lerrel Pinto, (2023). See to Touch: Learning Tactile Dexterity through Visual Incentives, arXiv. DOI: DOI:10.48550/arXiv.2309.1230, https://arxiv.org/abs/2309.12300
Article Revisions
- Jul 11 2024 - Fixed broken journal link.