Automatic ultrasound image report generation with adaptive multimodal attention mechanism

Neurocomputing(2021)

引用 12|浏览36
暂无评分
摘要
Text report writing for medical images is a fundamental task for diagnosis and treatment in clinical medicine. However, this work is tedious and time-consuming because of the special report features (e.g., boundary conditions and fixed templates). The existing works mainly adopt image captioning methods for medical report generation but the special report features are not fully considered in these models. This paper proposes an Adaptive Multimodal Attention network (AMAnet) to generate high-quality medical image reports. First, a Multi-Label Classification network is designed to predict the essential local properties. And then the word embedding vectors of these properties can serve as the semantic features to aid report generation. Second, we develop a semantic attention mechanism to imitate the spatial attention. Third, we introduce an adaptive attention mechanism with a sentinel gate to control the attention level at current visual features and language model memories when generating the next word. Experimental results demonstrate AMAnet outperforms the state-of-the-art image captioning methods with over 1 CIDEr score improvement.
更多
查看译文
关键词
Medical report generation,Spatial attention,Semantic attention,Adaptive attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要