Volumetric video – acquisition, interaction, streaming and rendering

Immersive Video Technologies(2023)

引用 1|浏览0
暂无评分
摘要
Due to the rapid development and the advances in extended Reality (XR) technologies and the related devices (e.g., tablets, headsets, etc.), realistic 3D representations of humans have achieved significant attention, especially in use cases, where a convincing visualization of humans is essential. Relevant use cases include interactive teaching and training, new formats for film and cinema, immersive experiences with contemporary witnesses, meet-ups with celebrities, and many more. However, current character animation techniques often do not provide the necessary level of realism. The motion capture process is time-consuming and cannot represent all the detailed motions of a person, especially facial expressions and the motion of clothing and objects. Therefore volumetric video is regarded worldwide as the next important development in media production, especially in the context of rapidly evolving virtual and augmented reality markets, where volumetric video is becoming a key technology. Though volumetric video offers high visual quality and realism, interaction with the virtual humans is restricted to free viewpoint navigation through a 3D virtual scene. Therefore it would be highly desirable to enable manipulation and animation of the volumetric content similar to classical computer graphics models. We achieve that by automatic rigging of a computer graphics (CG) template mesh that serves as semantic annotation, and can be used as input interface to modify and animate body and face poses. In addition, facial expressions and lip movements can be synthesized from text or speech, allowing the virtual character to react on user behavior. This enables the creation of realistic virtual characters that can interact with potential users in novel application formats and fields, such as e-learning, tele-collaboration, or human machine interaction. The 3D representation of volumetric video, however, leads to high demands on storage and network capacities. This requires efficient coding and streaming of the 3D mesh and texture sequences. In addition, the content needs to be rendered on virtual (VR) and augmented (AR) reality end user devices with restricted computational capabilities. With our split rendering methods, we decode and render a stereoscopic view on a nearby server and stream these views as 2D videos to the glasses. In this way, even high-resolution content can be displayed on devices with limited computational resources.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要