THE TIMING OF SPEECH-ACCOMPANYING GESTURES WITH RESPECT TO PROSODY

Journal of The Acoustical Society of America(2004)

引用 33|浏览7
暂无评分
摘要
The hypothesis that some of the hand and head movements produced during speaking are timed with respect to the prosodic structure of the utterance was tested, by comparing the timing of separately-labelled video and sound files from videotaped lectures by three male speakers of English. Word boundaries and prosody were labelled in the sound files using the ToBI system to specify location of pitch-accented words and syllables, and of intonational phrase boundaries. Gestures were labelled in the video files using a two-way distinction between DISCRETE movements (characterized by a sudden stop suggesting target attainment) and more CONTINUOUS movements. Gesture times were expressed in terms of frame location in a 30- frames-per-second video display, and the video frames corresponding to target attainment for the discrete gestures (2 speakers) or to onset and offset for all gestures (1 speaker) were aligned with the prosodic markings in the sound files. Preliminary analysis suggests that a) discrete gestures may be timed with respect to pitch accented syllables (or possibly the prominence-related constituents they define), and b) gestures which span the boundaries between adjacent intonational phrases may indicate larger structural groupings. If ongoing studies of additional utterances and speakers confirm these results, it will provide evidence that speech planning models need to generate a speaking plan and a gesturing plan in tandem.
更多
查看译文
关键词
speech production,frames per second
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要