InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions

Han Liang, Wenqian Zhang, Wenxuan Li,Jingyi Yu,Lan Xu

International Journal of Computer Vision(2024)

引用 0|浏览32
暂无评分
摘要
We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107 M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness of InterGen ( https://tr3e.github.io/intergen-page/ ). Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.
更多
查看译文
关键词
Motion synthesis,Multimodal generation,Diffusion model,Text-driven generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要