DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

Zhengshen Zhang¹, Lei Zhou¹, Chenchen Liu¹, Zhiyang Liu¹, Chengran Yuan¹, Sheng Guo¹, Ruiteng Zhao¹, Marcelo H. Ang Jr.¹, Francis EH Tay¹,

¹National University of Singapore

Abstract

The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the challenge of synthesizing functional grasps tailored to diverse dexterous robotic hands by proposing DexGrasp-Diffusion, an end-to-end modularized diffusion-based pipeline. DexGrasp-Diffusion integrates MultiHandDiffuser, a novel unified data-driven diffusion model for multi-dexterous hands grasp estimation, with DexDiscriminator, which employs a Physics Discriminator and a Functional Discriminator with open-vocabulary setting to filter physically plausible functional grasps based on object affordances. The experimental evaluation conducted on the MultiDex dataset provides substantiating evidence supporting the superior performance of MultiHandDiffuser over the baseline model in terms of success rate, grasp diversity, and collision depth. Moreover, we demonstrate the capacity of DexGrasp-Diffusion to reliably generate functional grasps for household objects aligned with specific affordance instructions.

DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

Abstract

Video

Pipeline

Denosing Process

Starting from random initial hand poses, our diffusion model iteratively de-noise hand poses for T steps.

Functional Discriminator

Given an open-vocabulary affordance label, the functional discriminator implements point cloud segmentation. Green points represent region where the robot can grasp the object without impeding its intended functionality.