Audio Visual Graph Attention Networks for Event Detection in Sports Video.

European Signal Processing Conference (EUSIPCO)(2022)

引用 0|浏览1
暂无评分
摘要
Interest in event detection for sports video analytics has been increasing because it enables teams to make game plans and can help to reduce the huge working costs of broadcast production. Sports event detection is a challenging task due to it requiring both audio and video features. Several recent methods have modified the transformer model to enable extraction of multi-modal features. These methods input the combined audio and video to the multi-modal transformer and therefore take into account multi-modal interactions. However, the transformer-based method does not focus on significant interactions from the modal interactions, which include ones with emphasis on audio, emphasis on video, or consideration of both audio and video, because it does not pay attention to specific interactions but rather all interactions at once. In this paper, we propose the Audio Visual Graph Attention Networks (AVGAT) model that prepares a different graphical structure for each modal interaction. Consequently, our model can pay attention to the significant modal interactions. Experiments on datasets of actual wheelchair-rugby games showed that AVGAT outperformed the transformer-based model. In addition, an ablation study showed that each component of our model empirically functions well at detecting individual event labels.
更多
查看译文
关键词
Sports Analytics,Event Detection,Audio Visual Processing,Graph Attention Networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要