Attention Based Cldnns For Short-Duration Acoustic Scene Classification

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION(2017)

引用 57|浏览249
暂无评分
摘要
Recently, neural networks with deep architecture have been widely applied to acoustic scene classification. Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) have shown improvements over fully connected Deep Neural Networks (DNNs). Motivated by the fact that CNNs, LSTMs and DNNs are complimentary in their modeling capability, we apply the CLDNNs (Convolutional, Long Short-Term Memory. Deep Neural Networks) framework to short-duration acoustic scene classification in a unified architecture. The CLDNNs take advantage of frequency modeling with CNNs. temporal modeling with LSTM. and discriminative training with DNNs. Based on the CLDNN architecture, several novel attention-based mechanisms are proposed and applied on the LSTM layer to predict the importance of each time step. We evaluate the proposed method on the truncated version of the 2016 TUT acoustic scenes dataset which consists of recordings from 15 different scenes. By using CLDNNs with bidirectional LSTM, we achieve higher performance compared to the conventional neural network architectures. Moreover. by combining the attention-weighted output with LSTM final time step output, significant improvement can be further achieved.
更多
查看译文
关键词
acoustic scene classification, short-duration, CLDNNs, attention mechanisms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要