POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
MM '23: Proceedings of the 31st ACM International Conference on Multimedia(2024)
摘要
We humans are good at translating third-person observations of hand-object
interactions (HOI) into an egocentric view. However, current methods struggle
to replicate this ability of view adaptation from third-person to first-person.
Although some approaches attempt to learn view-agnostic representation from
large-scale video datasets, they ignore the relationships among multiple
third-person views. To this end, we propose a Prompt-Oriented View-agnostic
learning (POV) framework in this paper, which enables this view adaptation with
few egocentric videos. Specifically, We introduce interactive masking prompts
at the frame level to capture fine-grained action information, and view-aware
prompts at the token level to learn view-agnostic representation. To verify our
method, we establish two benchmarks for transferring from multiple third-person
views to the egocentric view. Our extensive experiments on these benchmarks
demonstrate the efficiency and effectiveness of our POV framework and prompt
tuning techniques in terms of view adaptation and view generalization. Our code
is available at .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要