Combined Gesture-Speech Analysis and Synthesis

Mehmet Emre Sargin,Ferda Ofli,Yelena Yasinnik,Oya Aran,Alexey Karpov,Stephen Wilson,Yucel Yemez,Engin Erzin,A. Murat Tekalp

msra（2005）

引用 28|浏览54

暂无评分

摘要

Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and arm gestures of a speaker have been studied extensively in (3)-(6) and these gestures were shown to carry linguistic information (7),(8). A typical example is the head gesture while saying "yes". In this project, correlation between gestures and speech is investigated. Speech features are selected as Mel Frequency Cepstrum Coefficients (MFCC). Gesture features are composed of positions of hand, elbow and global motion parameters calculated across the head region. In this sense, prior to the detection of gestures, discrete symbol sets for gesture is determined manually and for each symbol, based on the calculated features, model is generated. Using these models for symbol sets, sequence of gesture features is clustered and probable gestures is detected. The correlation between gestures and speech is modelled by examining the co- occurring speech and gesture patterns. This correlation is used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech.

查看译文

关键词

keyword spotting,audio- visual correlation analysis,index terms— gesture recognition,prosody analysis,gesture synthesis.,facial expression,indexing terms,gesture recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要