LCSNet: End-to-end Lipreading with Channel-aware Feature Selection.

ACM Trans. Multim. Comput. Commun. Appl.(2023)

引用 2|浏览52
暂无评分
摘要
Lipreading is a task of decoding the movement of the speaker’s lip region into text. In recent years, lipreading methods based on deep neural network have attracted widespread attention, and the accuracy has far surpassed that of experienced human lipreaders. The visual differences in some phonemes are extremely subtle and pose a great challenge to lipreading. Most of the lipreading existing methods do not process the extracted visual features, that mainly suffer from two problems. Firstly, the extracted features contain lot of useless information such as noise caused by differences in speech speed and lip shape, for example. In addition, the extracted features are not abstract enough to distinguish phonemes with similar pronunciation. These problems have a bad effect on the performance of lipreading. In order to extract features from the lip regions that are more distinguishable and more relevant to the speech content, this paper proposes an end-to-end deep neural network-based lipreading model (LCSNet). The proposed model extracts the short-term spatio-temporal features and the motion trajectory features from the lip region in the video clips. The extracted features are filtered by the channel attention module to eliminate the useless features, and then used as input to the proposed Selective Feature Fusion Module (SFFM) in order to extract the high-level abstract features. Afterwards, these features are used as input to the bidirectional GRU network in time order for temporal modeling, in order to obtain the long-term spatio-temporal features. Finally, a Connectionist Temporal Classification (CTC) decoder is used to generate the output text. The experimental results show that the proposed model achieves a 1.0% CER and 2.3% WER on the GRID corpus database, which respectively represents an improvement of 52% and 47% compared to LipNet.
更多
查看译文
关键词
Lipreading,deep neural network,channel attention mechanism,selective feature fusion module
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要