Action recognition in compressed domains: A survey

NEUROCOMPUTING(2024)

引用 0|浏览2
暂无评分
摘要
Human action recognition (HAR) refers to the process in which computers analyze and process video data to obtain the categories of action presented in the video. It has a wide range of applications, such as video surveillance, human-computer interaction, and autonomous driving. The spatio-temporal features required for video analysis are typically extracted from the RGB pixels, which entail significant computational complexity and make it challenging for real-time action recognition. However, in the compressed domain, sparse representations, such as motion vectors, quantization parameters, transform coefficients, and residuals provide the comparable scene semantic information with reduced complexity. This paper provides an overview of the research efforts in action recognition based on the compressed domain. It includes a comprehensive review of both traditional and deep learning-based methods published between 2000 and 2023, focusing on compression standards and compression parameters. Specifically, we first summarize the compression standards and compressed algorithms used for compressing videos. Furthermore, we classify compressed-domain action recognition methods into traditional and deep learning-based approaches. Thirdly, we introduce public datasets and evaluation metrics, analyze the characteristics of these methods and compare their performance. Finally, we highlight challenges and suggest future research directions.
更多
查看译文
关键词
Video action recognition,Compressed domain,MPEG4,Transform coefficients,Motion vectors,Residual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要