Self-Supervised Object Detection from Egocentric Videos.

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)(2023)

引用 1|浏览17
暂无评分
摘要
Understanding the visual world from human perspectives has been a long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity and irregular motion flows compared to typical video understanding tasks. With the egocentric domain in mind, we address the problem of self-supervised, class-agnostic object detection, aiming to locate all objects in a given view, without any annotations or pre-trained weights. Our method, self-supervised object detection from egocentric videos (DEVI), generalizes appearance-based methods to learn features end-to-end that are category-specific and invariant to viewing angle and illumination. Our approach leverages natural human behavior in egocentric perception to sample diverse views of objects for our multi-view and scale-regression losses, and our cluster residual module learns multi-category patches for complex scene understanding. DEVI results in gains up to 4.11% AP 50 , 0.11% AR 1 , 1.32% AR 10 , and 5.03% AR 100 on recent egocentric datasets, while significantly reducing model complexity. We also demonstrate competitive performance on out-of-domain datasets without additional training or fine-tuning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要