EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction
arxiv(2024)
摘要
A robot's ability to anticipate the 3D action target location of a hand's
movement from egocentric videos can greatly improve safety and efficiency in
human-robot interaction (HRI). While previous research predominantly focused on
semantic action classification or 2D target region prediction, we argue that
predicting the action target's 3D coordinate could pave the way for more
versatile downstream robotics tasks, especially given the increasing prevalence
of headset devices. This study expands EgoPAT3D, the sole dataset dedicated to
egocentric 3D action target prediction. We augment both its size and diversity,
enhancing its potential for generalization. Moreover, we substantially enhance
the baseline algorithm by introducing a large pre-trained model and human prior
knowledge. Remarkably, our novel algorithm can now achieve superior prediction
outcomes using solely RGB images, eliminating the previous need for 3D point
clouds and IMU input. Furthermore, we deploy our enhanced baseline algorithm on
a real-world robotic platform to illustrate its practical utility in
straightforward HRI tasks. The demonstrations showcase the real-world
applicability of our advancements and may inspire more HRI use cases involving
egocentric vision. All code and data are open-sourced and can be found on the
project website.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要