Improving Radiology Report Generation with D2-Net: When Diffusion Meets Discriminator

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览3
暂无评分
摘要
Radiology report generation (RRG) aims to automatically provide observations and insight into a patient’s condition based on radiology images, which is able to greatly reduce the workload of physicians on the premise of ensuring the quality of medical treatment. Existing works leverage the Transformer decoder to generate reports word-by-wordly. However, unlike image captioning, radiology reports are long text containing many semantic words. The autoregressive method, such as the Transformer-base method, will accumulate errors in the generation process and generate unsatisfied reports. Benefiting from the recent success of Diffusion, we propose a novel Diffusion-based paradigm for RRG, which leverages visual information as a condition, making the generation process focus on pathological features within the radiology image. Meanwhile, we integrate a discriminator into each layer of the Diffusion to actively judge whether the generated words are meaningful, which, on the one hand, controls the length of predicted reports and, on the other hand, calibrates confidence scores and token generation results, improving the quality of the generated reports. Extensive experiment results demonstrate the superiority of our proposed method. Source code is available at: https://github.com/Yuda-Jin/D-2-Net.
更多
查看译文
关键词
Radiology Reports,Medical Imaging,Image Captioning,Semantic Word,Autoregressive Method,Transformer Decoder,Computational Cost,Chest X-ray,Visual Features,Autoregressive Model,Attention Mechanism,Reversible Process,Diffusion Model,Learnable Parameters,Hidden State,Inference Time,Decoder Layer,Text Generation,Transformer Encoder,Chest X-ray Images
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要