P2ANet: A Large-Scale Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos

ACM Trans. Multim. Comput. Commun. Appl.(2024)

引用 0|浏览8
暂无评分
摘要
While deep learning has been widely used for video analytics, such as video classification and action detection, dense action detection with fast-moving subjects from sports videos is still challenging. In this work, we release yet another sports video benchmark P(2)ANet for Ping Pong-Action detection, which consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We work with a crew of table tennis professionals and referees on a specially designed annotation toolbox to obtain fine-grained action labels (in 14 classes) for every ping-pong action that appeared in the dataset, and formulate two sets of action detection problems-action localization and action recognition. We evaluate a number of commonly seen action recognition (e.g., TSM, TSN, Video SwinTransformer, and Slowfast) and action localization models (e.g., BSN, BSN++, BMN, TCANet), using P(2)ANet for both problems, under various settings. These models can only achieve 48% area under the AR-AN curve for localization and 82% top-one accuracy for recognition since the ping-pong actions are dense with fast-moving subjects but broadcasting videos are with only 25 FPS. The results confirm that P(2)ANet is still a challenging task and can be used as a special benchmark for dense action detection from videos. We invite readers to examine our dataset by visiting the following link: https://github.com/Fred1991/P2ANET.
更多
查看译文
关键词
Datasets,annotation toolbox,video analysis,action recognition and localization,table tennis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要