Computationally Efficient Clustering Of Audio-Visual Meeting Data
MULTIMEDIA INTERACTION AND INTELLIGENT USER INTERFACES: PRINCIPLES, METHODS AND APPLICATIONS(2010)
摘要
This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audiovisual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
更多查看译文
关键词
video compression,speech processing,principal component,speaker diarization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络