Knowledge-integrated Multi-modal Movie Turning Point Identification

Depei Wang,Ruifeng Xu, Lianglun Cheng,Zhuowei Wang

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS(2024)

引用 0|浏览6
暂无评分
摘要
The rapid development of artificial intelligence provides rich technologies and tools for the automated understanding of literary works. As a comprehensive carrier of storylines, movies are natural multimodal data sources that provide sufficient data foundations, and how to fully leverage the benefits of data remains a sustainable research hotspot. In addition, the efficient representation of multi-source data also poses new challenges for information fusion technology. Therefore, we propose a knowledge-enhanced turning points identification (KTPi) method for multimodal scene recognition. First, the BiLSTM method is used to encode scene text and integrate contextual information into scene representations to complete text sequence modeling. Then, the graph structure is used to model all scenes, which strengthens long-range semantic dependencies between scenes and enhances scene representations using graph convolution network. After, the self-supervised method is used to obtain the optimal number of neighboring nodes in sparse graph. Next, actor and verb knowledge involved in the scene text are added to the multimodal data to enhance the diversity of scene feature expressions. Finally, the teacher-student network strategy is used to train the KTPi model. Experimental results show that KTPi outperforms baseline methods in scene role recognition tasks, and ablation experiments show that incorporating knowledge into multimodal model can improve its performance.
更多
查看译文
关键词
Knowledge enhance,multimodal representation,text tagging
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要