Rate-Distortion Optimization for Cross Modal Compression

2023 Data Compression Conference (DCC)(2023)

引用 0|浏览11
暂无评分
摘要
Recently, cross modal compression (CMC) is proposed to compress highly redundant visual data into a compact, common, human-comprehensible domain (such as text) to preserve semantic fidelity for semantic-related applications. However, CMC only achieves a certain level of semantic fidelity at a constant rate, and the model aims to optimize the probability of the ground truth text but not directly semantic fidelity. To tackle the problems, we propose a novel scheme named rate-distortion optimized CMC (RDO-CMC). Specifically, we model the text generation process as a Markov decision process and propose rate-distortion reward which is used in reinforcement learning to optimize text generation. In rate-distortion reward, the distortion measures both the semantic fidelity and naturalness of the encoded text. The rate for the text is estimated by the sum of the amount of information of all the tokens in the text since the amount of information of each token is a lower bound of coding bits. Experimentally, RDO-CMC effectively controls the rate in the CMC framework and achieves competitive performance on MSCOCO dataset.
更多
查看译文
关键词
Cross modal compression,Rate distortion optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要