SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual Attention

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

Cited 0|Views20
No score
Abstract
Viewers of 360-degree videos are provided with both visual modality to characterize their surrounding views and audio modality to indicate the sound direction. Though both modalities are important for saliency prediction, little work has been done by jointly exploiting them, which is mainly due to the lack of audio-visual saliency datasets and insufficient exploitation of the multi-modality. In this article, we first construct an audio-visual saliency dataset with 57 360-degree videos watched by 63 viewers. Through a deep analysis of the constructed dataset, we find that the human gaze can be attracted by the auditory cues, resulting in a more concentrated saliency map if the sound source's location is further provided. To jointly exploit the visual and audio features and their correlation, we further design a saliency prediction network for 360-degree videos (SVGC-AVA) based on spherical vector-based graph convolution and audio-visual attention. The proposed spherical vector-based graph convolution can process visual and audio features directly in the sphere domain, thus avoiding projection distortion incurred by traditional CNN-based predictors. In addition, the audio-visual attention scheme explores self-modal and cross-modal correlation for both modalities, which are further hierarchically processed with the U-Net's multi-scale structure of SVGC-AVA. Evaluations on both our and public datasets validate that SVGC-AVA can achieve higher prediction accuracy, both qualitatively and subjectively.
More
Translated text
Key words
Visualization,Feature extraction,Convolution,Streaming media,Correlation,Position measurement,Kernel,360-degree videos,saliency prediction,spherical vector-based graph convolution,audio-visual attention
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined