LSN: Long-Term Spatio-Temporal Network for Video Recognition.

International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE)(2022)

引用 0|浏览3
暂无评分
摘要
Although recurrent neural networks (RNNs) are widely leveraged to process temporal or sequential data, they have attracted too little attention in current video action recognition applications. Therefore, this work attempts to model the long-term spatio-temporal information of the video based on a variant of RNN, i.e., higher-order RNN. Moreover, we propose a novel long-term spatio-temporal network (LSN) for solving this video task, the core of which integrates the newly constructed high-order ConvLSTM (HO-ConvLSTM) modules with traditional 2D convolutional blocks. Specifically, each HO-ConvLSTM module consists of an accumulated temporary state (ATS) module as well as a standard ConvLSTM module, and several previous hidden states in the ATS module are accumulated to one temporary state that will enter the standard ConvLSTM to determine the out-put together with the current input. The HO-ConvLSTM module can be inserted into different stages of the 2D convolutional neural network (CNN) in a plug-andplay manner, thus well characterizing the long-term temporal evolution at various spatial resolutions. Experiment results on three commonly used video benchmarks demonstrate that the proposed LSN model can achieve competitive performance with the representative models.
更多
查看译文
关键词
Video action recognition,High-order RNN,Long-term spatio-temporal,ConvLSTM,HO-ConvLSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要