High quality lip-sync animation for 3D photo-realistic talking head

ICASSP(2012)

引用 37|浏览51
暂无评分
摘要
We propose a new 3D photo-realistic talking head with high quality, lip-sync animation. It extends our prior high-quality 2D photo-realistic talking head to 3D. An a/v recording of a person speaking a set of prompted sentences with good phonetic coverage for ~20-minutes is first made. We then use a 2D-to-3D reconstruction algorithm to automatically adapt a general 3D head mesh model to the person. In training, super feature vectors consisting of 3D geometry, texture and speech are augmented together to train a statistical, multi-streamed, Hidden Markov Model (HMM). The HMM is then used to synthesize both the trajectories of head motion animation and the corresponding dynamics of texture. The resultant 3D talking head animation can be controlled by the model predicted geometric trajectory while the articulator movements, e.g., lips, are rendered with dynamic 2D texture image sequences. Head motions and facial expression can also be separately controlled by manipulating corresponding parameters. In a real-time demonstration, the life-like 3D talking head can take any input text, convert it into speech and render lip-synced speech animation photo-realistically.
更多
查看译文
关键词
speech processing,face recognition,talking head,computer animation,audio/visual synthesis,head motions,hmm,3d image texture,2d-to-3d reconstruction algorithm,video recording,motion estimation,phonetic coverage,dynamic 2d texture image sequences,facial expression,hidden markov model,3d photo-realistic talking head,image sequences,super feature vectors,high quality lip-sync animation,a-v recording,3d geometry,3d,3d speech processing,photorealistic,image texture,high-quality 2d photo-realistic talking head,lip-sync,hidden markov models,geometry,3d head mesh model,articulator movements,solid modeling,visualization,animation,face
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要