DSFNet: dynamic selection-fusion networks for video salient object detection
Multimedia Tools and Applications(2023)
摘要
How to effectively fuse spatiotemporal clues is the key to improve the accuracy of video salient object detection. Although most existing methods have achieved great success regarding fusion strategies, the issue of reliability of spatiotemporal clues needs further investigation, and the use of unreliable spatiotemporal clues can corrupt the final saliency results. In this work, we propose a novel dynamic selection-fusion network (DSFNet) for video salient object detection, and DSFNet is jointly constructed by two branches. The one is the spatial learning network, which completes the learning of video sequences to obtain the spatial saliency of images. The other is the spatiotemporal contrast network, which creatively obtains the dynamic spatiotemporal saliency in the synchronized state by learning the video sequence and the corresponding optical flow images. To further screen and fuse the spatiotemporal clues, a series of joint modules for selection were developed, mainly including contrast transformation module (CTM), contrast analysis module (CAM) and selection guidance module (SGM) which play an important role in selecting spatiotemporal features. In addition, a fusion refinement module (FRM) is designed to further refine and enhance the input features. The experimental results show that the proposed method is significantly better than other algorithms in solving the problem of motion information distortion and spatiotemporal salient irrelevance.
更多查看译文
关键词
Video salient object detection,Dynamic selection-fusion network,Spatiotemporal features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要