Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT(2022)

引用 3|浏览10
暂无评分
摘要
Remote photoplethysmography (rPPG) is a technology that can estimate noncontact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for noncontact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end fusion video vision transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning (SSL) scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios was 14.86 of root mean squared error (RMSE), which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world.
更多
查看译文
关键词
Near-infrared (NIR),remote heart rate (HR) measurement,remote photoplethysmography (rPPG),RGB,self-supervised learning (SSL),video vision transformer (ViViT)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要