Unsupervised method for video action segmentation through spatio-temporal and positional-encoded embeddings.

ACM SIGMM Conference on Multimedia Systems (MMSys)(2022)

引用 0|浏览2
暂无评分
摘要
Action segmentation consists of temporally segmenting a video and labeling each segmented interval with a specific action label. In this work, we propose a novel action segmentation method that requires no prior video analysis and no annotated data. Our method involves extracting spatio-temporal features from videos using a pre-trained deep network. Data is then transformed using a positional encoder, and finally a clustering algorithm is applied, where each produced cluster presumably corresponds to a different single and distinguishable action. In experiments, we show that our method produces competitive results on the Breakfast and Inria Instructional Videos dataset benchmarks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要