Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

IEEE Conference on Computer Vision and Pattern Recognition(2022)

引用 67|浏览27
暂无评分
摘要
Assembly101 is a new procedural activity dataset fea-turing 4321 videos of people assembling and disassembling 101 “take-apart” toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natu-ral variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with si-multaneous static (8) and egocentric (4) recordings. Se-quences are annotated with more than 100K coarse and 1M fine-grained action segments, and I8M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Ad-ditionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations al-low us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assemblyl0l will serve as a new challenge to investigate various activity understanding problems.
更多
查看译文
关键词
Datasets and evaluation, Action and event recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要