SEEG: Semantic Energized Co-speech Gesture Generation

IEEE Conference on Computer Vision and Pattern Recognition(2022)

引用 38|浏览43
暂无评分
摘要
Talking gesture generation is a practical yet challenging task that aims to synthesize gestures in line with speech. Gestures with meaningful signs can better convey useful information and arouse sympathy in the audience. Current works focus on aligning gestures with the speech rhythms, which are difficult to mine the semantics and model semantic gestures explicitly. This paper proposes a novel semantic Energized Generation (SEEG) method for semantic-aware gesture generation. Our method contains two parts: DEcoupled Mining module (DEM) and Semantic Energizing Module (SEM). DEM decouples the semantic-irrelevant information from inputs and separately mines information for the beat and semantic gestures. SEM conducts semantic learning and produces semantic gestures. Apart from representational similarity, SEM requires the predictions to express the same semantics as the ground truth. Besides, a semantic prompter is designed in SEM to leverage the semantic-aware supervision to predictions. This promotes the networks to learn and generate semantic gestures. Experimental results reported in three metrics on different benchmarks prove that SEEG efficiently mines semantic cues and generates semantic gestures. SEEG outperforms other methods in all semantic-aware evaluations on different datasets. Qualitative evaluations also indicate the superiority of SEEG in semantic expressiveness. Code is available via https://github.com/akira-l/SEEG.
更多
查看译文
关键词
Vision + X, Face and gestures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要