Publications

A collection of my research work.

MiMo-Embodied: X-Embodied Foundation Model Technical Report

MiMo-Embodied: X-Embodied Foundation Model Technical Report

Xiaomi Embodied Intelligence Team

Technical Report 2025

The first open-source foundation model unifying embodied AI and autonomous driving.

PDFCode
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Qixiu Li, Yu Deng, Yaobo Liang, Lin Luo, Lei Zhou, Chengtang Yao, Lingqi Zeng, Zhiyuan Feng, Huizhi Liang, Sicheng Xu, Yizhong Zhang, Xi Chen, Hao Chen, Lily Sun, Dong Chen, Jiaolong Yang, Baining Guo

Under Review 2025

Pretraining VLA models using 1M+ episodes of unlabeled human hand videos.

PDFProject PageCode
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

Zhengshen Zhang, Hao Li, Yalun Dai, Zhengbang Zhu, Lei Zhou, Chenchen Liu, Dong Wang, Francis E. H. Tay, Sijin Chen, Ziwei Liu, Yuxiao Liu, Xinghang Li, Pan Zhou

Under Review 2025

Injecting 3D spatial tokens into VLA models for better geometric understanding.

PDFProject Page

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou, Xi Chen, Lin Luo, Xiaohan Yi, Yizhong Zhang, Yaobo Liang, Chang Xu, Yan Lu, Jiaolong Yang, Baining Guo

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

A Transformer-based network for dexterous robotic grasping using policy distillation.

PDFProject Page

DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

Zhengshen Zhang, Lei Zhou, Chenchen Liu, Zhiyang Liu, Sheng Guo, Ruiteng Zhao, Marcelo H. Ang Jr., Francis EH Tay

International Symposium on Robotics Research (ISRR) 2024

End-to-end pipeline for functional grasp synthesis using diffusion models.

PDFProject Page

You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects

Lei Zhou, Haozhe Wang, Zhengshen Zhang, Zhiyang Liu, Francis EH Tay, Marcelo H. Ang Jr.

IEEE International Conference on Robotics and Automation (ICRA) 2024

Dynamic scene reconstruction using NeRF for 6-DoF robotic grasping.

PDFProject Page

DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation

Lei Zhou, Zhiyang Liu, Runze Gan, Haozhe Wang, Marcelo H. Ang Jr

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

Enhancing category-level 6D object pose estimation via deformation and registration.

PDFProject PageCode