ControlCap: Controllable Captioning via No-Fuss Lexicon

Qiujie Xie, Qiming Feng,Yuejie Zhang,Rui Feng,Tao Zhang,Shang Gao

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览2
暂无评分
摘要
Controllable captioning has received much attention in recent years. Although substantial progress has been made, existing methods still face challenges such as high training costs, intricate control signals and limited control capabilities. To address these issues, we propose a straightforward and unified framework called ControlCap. It uses a no-fuss lexicon as control signal and controls the style and content of visual descriptions through Soft Guidance (a global guide to the caption distribution) and Hard Force (integrating signals without additional training). Extensive experiments, both quantitative and qualitative, have been conducted on three benchmark captioning tasks. Results demonstrate the control ability of ControlCap: it can produce controlled captions that are coherent and diverse while keeping the core content intact.
更多
查看译文
关键词
Captioning,Controllable Generation,No-fuss Lexicon
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要