Instructional Videos For Unsupervised Harvesting And Learning Of Action Examples
MM '14: 2014 ACM Multimedia Conference Orlando Florida USA November, 2014(2014)
摘要
Online instructional videos have become a popular way for people to learn new skills encompassing art, cooking and sports. As watching instructional videos is a natural way for humans to learn, analogously, machines can also gain knowledge from these videos. We propose to utilize the large amount of instructional videos available online to harvest examples of various actions in an unsupervised fashion. The key observation is that in instructional videos, the instructor's action is highly correlated with the instructor's narration. By leveraging this correlation, we can exploit the timing of action corresponding terms in the speech transcript to temporally localize actions in the video and harvest action examples. The proposed method is scalable as it requires no human intervention. Experiments show that the examples harvested are of reasonably good quality, and action detectors trained on data collected by our unsupervised method yields comparable performance with detectors trained with manually collected data on the TRECVID Multimedia Event Detection task.
更多查看译文
关键词
Semantic Action Detection,Unsupervised Data Collection,Multimedia Event Detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络