Screencast Tutorial Video Understanding

CVPR(2020)

引用 8|浏览149
暂无评分
摘要
Screencast tutorials are videos created by people to teach how to use software applications or demonstrate procedures for accomplishing tasks. It is very popular for both novice and experienced users to learn new skills, compared to other tutorial media such as text, because of the visual guidance and the ease of understanding. In this paper, we propose visual understanding of screencast tutorials as a new research problem to the computer vision community. We collect a new dataset of Adobe Photoshop video tutorials and annotate it with both low-level and high-level semantic labels. We introduce a bottom-up pipeline to understand Photoshop video tutorials. We leverage state-of-the-art object detection algorithms with domain specific visual cues to detect important events in a video tutorial and segment it into clips according to the detected events. We propose a visual cue reasoning algorithm for two high-level tasks: video retrieval and video captioning. We conduct extensive evaluations of the proposed pipeline. Experimental results show that it is effective in terms of understanding video tutorials. We believe our work will serves as a starting point for future research on this important application domain of video understanding.
更多
查看译文
关键词
screencast tutorials,software applications,tutorial media,visual guidance,visual understanding,computer vision community,Adobe Photoshop video tutorials,high-level semantic labels,object detection algorithms,domain specific visual cues,video tutorial,detected events,visual cue reasoning algorithm,high-level tasks,video retrieval,video captioning,screencast tutorial video understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要