Action Spotting in Soccer Videos Using Multiple Scene Encoders.

ICPR(2022)

引用 0|浏览14
暂无评分
摘要
Action spotting, which temporally localizes specific actions in a video, is an important task for understanding high-level semantic information. In this paper, we formulate the action spotting task to one of scene sequence recognition and propose a model with multiple scene encoders to capture scene changes around the timestamp where an action occurs. We divide the input into multiple subsets to reduce the influence of scene context that is temporally distant, and feed every subset into a scene encoder to learn scene context in every subset. Because the optimal temporal length for time windows (chunks) is different for each action, we analyze the influence of chunk sizes for action spotting. The experimental results on the public SoccerNet-v2 dataset demonstrate state-of-the-art accuracy. By using embedding features, our method obtains an Average-mAP of 75.3%. In addition, we confirm that the performance can be improved by using optimal chunk sizes for different actions.
更多
查看译文
关键词
action spotting,chunk sizes,embedding features,high-level semantic information,optimal temporal length,scene changes,scene encoder,scene sequence recognition,soccer videos,SoccerNet-v2 dataset,time windows
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要