I am a Ph.D. candidate at IRMV Lab, Shanghai Jiao Tong University, advised by Prof. Hesheng Wang.

My research explores how robots can learn from human behavior and understand dynamic 3D environments, with a focus on visual imitation learning, egocentric hand-object interaction prediction, place recognition, and occupancy forecasting. I am particularly interested in extracting transferable manipulation priors from human egocentric videos to alleviate the robot data bottleneck and support robotic manipulation and world modeling.

Hand motion forecasting demo in egocentric views
A demo showcasing hand motion forecasting in egocentric views by marrying our proposed Uni-Hand (T-PAMI'26) with V-JEPA 2.

🔥 News

  • May 2026: Selected as an Outstanding Reviewer of IEEE Robotics and Automation Letters (RA-L).
  • May 2026: Uni-Hand has been accepted by T-PAMI.🎉
  • Nov. 2025: MADiff has been accepted by T-PAMI.🎉
  • Jun. 2025: Four papers have been accepted by IROS 2025.🎉
  • Feb. 2025: EfficientOCF has been accepted by CVPR 2025.🎉

📝 Publications

#: Equal contribution, *: Corresponding author.

Learning from Human Videos

Robot Learning from Human Videos survey

Robot Learning from Human Videos: A Survey

Junyi Ma, Erhang Zhang, Haoran Yang, Ditao Li, Chenyang Xu, Guangming Wang, Hesheng Wang*

arXiv, 2026

Uni-Hand

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Yu Zheng, Erhang Zhang, Xieyuanli Chen, Hesheng Wang*

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2026

MADiff

MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Junyi Ma#, Xieyuanli Chen#, Wentao Bao, Jingyi Xu, Hesheng Wang*

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025

Zero-Shot Temporal Interaction Localization

Zero-Shot Temporal Interaction Localization for Egocentric Videos

Erhang Zhang#, Junyi Ma#, Yin-Dong Zheng, Yixuan Zhou, Hesheng Wang*

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025

EgoLoc

EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos

Junyi Ma#, Erhang Zhang#, Yin-Dong Zheng, Yuchen Xie, Yixuan Zhou, Hesheng Wang*

arXiv, 2025

HOI Prediction

MMTwin

MMTwin: Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction

Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Xieyuanli Chen, Hesheng Wang*

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025

Diff-IP2D

Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos

Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang*

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025

Place Recognition and SLAM

OverlapTransformer

OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition

Junyi Ma, Jun Zhang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen*

IEEE Robotics and Automation Letters (RA-L), 2022; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022

SeqOT

SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data

Junyi Ma, Xieyuanli Chen, Jingyi Xu, Guangming Xiong*

IEEE Transactions on Industrial Electronics (TIE), 2022

CVTNet

CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

Junyi Ma, Guangming Xiong, Jingyi Xu, Xieyuanli Chen*

IEEE Transactions on Industrial Informatics (TII), 2023

LCPR

LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition

Zijie Zhou, Jingyi Xu, Guangming Xiong, Junyi Ma*

IEEE Robotics and Automation Letters (RA-L), 2024; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

Point Cloud and Occupancy Forecasting

EfficientOCF

Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting

Jingyi Xu, Xieyuanli Chen, Junyi Ma, Jiawei Huang, Jintao Xu, Yue Wang, Ling Pei

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Cam4DOcc

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Junyi Ma#, Xieyuanli Chen#, Jiawei Huang, Jingyi Xu, Zhen Luo, Jintao Xu, Weihao Gu, Rui Ai, Hesheng Wang*

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

PCPNet

PCPNet: An Efficient and Semantic-Enhanced Transformer Network for Point Cloud Prediction

Zhen Luo, Junyi Ma, Zijie Zhou, Guangming Xiong Mentorship

IEEE Robotics and Automation Letters (RA-L), 2023; IEEE International Conference on Robotics and Automation (ICRA), 2024

🏆 Honors and Awards

  • Outstanding Reviewer of IEEE Robotics and Automation Letters (RA-L), 2026.
  • Outstanding Master's Thesis, Beijing Institute of Technology, 2023.
  • National Scholarship for Graduate Students, Ministry of Education of China, 2022.
  • National Scholarship for Undergraduate Students, Ministry of Education of China, 2019.
  • Outstanding Master's Graduates in Beijing, 2023.
  • Outstanding Bachelor's Graduates in Beijing, 2020.
  • Best Paper Award at IEEE International Conference on Unmanned Systems (ICUS), 2021.
  • Outstanding Paper Presented at the Autonomous Robotic Technology Seminar (ARTS), 2023.

🎓 Educations

  • Shanghai Jiao Tong University, Ph.D. candidate at IRMV Lab. Supervisor: Prof. Hesheng Wang.
  • Beijing Institute of Technology, M.S. in Mechanical Engineering, 2023. Supervisors: Prof. Guangming Xiong and Prof. Xieyuanli Chen.
  • Beijing Institute of Technology, B.S. in Mechanical Engineering, 2020. Bachelor thesis advisor: Prof. Oliver Dürr.

📦 Datasets

  • Haomo Dataset: mobile-robot LiDAR dataset collected in urban Beijing. Description
  • Cues-Poses Dataset: a toy dataset about mapping multiple cues to mutual poses of robots. Description
  • Cam4DOcc: benchmark for camera-only 4D occupancy forecasting. Description
  • CABH Benchmark: egocentric videos capturing human hands performing simple object manipulation tasks. Description

📄 Patents

  • [China Utility Model] Huilong Yu, Ziang Tian, Junyi Ma, Haotian Dong, Junqiang Xi, and Guangming Xiong. A multifunctional unmanned platform for subterranean space. ZL202123083457.8
  • [China Appearance Design] Huilong Yu, Ziang Tian, Junyi Ma, Haotian Dong, Junqiang Xi, and Guangming Xiong. A multifunctional unmanned caterpillar for subterranean space. ZL202130813635.4
  • [China Invention Publication] Guangming Xiong, Junyi Ma, Jingyi Xu, and Jiarui Song. A reliability analysis-based multi-robot cooperative localization and mapping method. ZL202110318362.5

🎤 Invited Talks

  • "人类视频驱动的机器人技能学习 (Learning to Manipulate by Watching Humans: Visual Imitation for Robotic Skills)" in ARTS Webinar&深蓝学院 link
  • Breakout session "Temporal Interaction Localization for Embodied AI" at Embodied Intelligence Conference 2026 link
  • "面向自动驾驶应用的激光雷达地点识别 (LiDAR-Based Place Recognition for Autonomous Driving Applications)" in 深蓝学院 link

🤝 Service

  • Reviewer of TRO, TMM, RA-L, TASE, TCSVT, ICRA, and IROS.
  • Session Chair for the Localization I session at IROS'24.
  • Lead of the Publicity and Translation Group in the Volunteer Department for IROS'25.
  • Student Executive Committee Member of Autonomous Robotic Technology Seminar (ARTS).