Soundtrack classification by transient events

Courtenay V. Cotton,Daniel P. W. Ellis,Alexander C. Loui

Acoustics, Speech and Signal Processing（2011）

引用 29|浏览29

暂无评分

摘要

We present a method for video classification based on information in the soundtrack. Unlike previous approaches which describe the audio via statistics of mel-frequency cepstral coefficient (MFCC) features calculated on uniformly-spaced frames, we investigate an approach to focusing our representation on audio transients corresponding to sound-track events. These event-related features can reflect the "foreground" of the soundtrack and capture its short-term temporal structure better than conventional frame-based statistics. We evaluate our method on a test set of 1873 YouTube videos labeled with 25 semantic concepts. Retrieval results based on transient features alone are comparable to an MFCC-based system, and fusing the two representations achieves a relative improvement of 7.5% in mean average precision (MAP).

查看译文

关键词

acoustic signal processing,audio signal processing,cepstral analysis,signal classification,statistical analysis,video databases,video retrieval,MAP,MFCC features,MFCC-based system,YouTube videos,audio transients,conventional frame-based statistics,event-related features,mean average precision,mel-frequency cepstral coefficient fetures,retrieval results,short-term temporal structure,sound-track events,soundtrack classification,soundtrack information,transient events,transient features,uniformly-spaced frames,video classification,Acoustic signal processing,Multimedia databases,Video indexing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要