See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization

Knowledge-Based Systems(2021)

引用 11|浏览30
暂无评分
摘要
In recent years, abstractive text summarization with multimodal inputs has started drawing attention due to its ability to accumulate information from different source modalities and generate a fluent textual summary. However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly on lengthy videos and long ground-truth summary. Additionally, there exists no benchmark dataset to generalize this task on videos of varying lengths.
更多
查看译文
关键词
Abstractive text summarization,Multimodality,Attention,Factorized multimodal transformer,Language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要