X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention
arxiv(2024)
摘要
We propose X-Portrait, an innovative conditional diffusion model tailored for
generating expressive and temporally coherent portrait animation. Specifically,
given a single portrait as appearance reference, we aim to animate it with
motion derived from a driving video, capturing both highly dynamic and subtle
facial expressions along with wide-range head movements. As its core, we
leverage the generative prior of a pre-trained diffusion model as the rendering
backbone, while achieve fine-grained head pose and expression control with
novel controlling signals within the framework of ControlNet. In contrast to
conventional coarse explicit controls such as facial landmarks, our motion
control module is learned to interpret the dynamics directly from the original
driving RGB inputs. The motion accuracy is further enhanced with a patch-based
local control module that effectively enhance the motion attention to
small-scale nuances like eyeball positions. Notably, to mitigate the identity
leakage from the driving signals, we train our motion control modules with
scaling-augmented cross-identity images, ensuring maximized disentanglement
from the appearance reference modules. Experimental results demonstrate the
universal effectiveness of X-Portrait across a diverse range of facial
portraits and expressive driving sequences, and showcase its proficiency in
generating captivating portrait animations with consistently maintained
identity characteristics.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要