Achieving the Optimum Rate for Cross-Modal Source Coding

IEEE Transactions on Multimedia（2024）

引用 0|浏览2

暂无评分

摘要

Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC) has been proposed as a promising solution. However, there are still two main challenges: determining the optimum rate of CMSC considering delay and reliability constraints, and designing a practical CMSC near the optimum rate. To tackle these challenges, this paper focuses on studying the optimum source coding rate of CMSC and its practical implementation. On the theoretical side, an

$(n,\epsilon)$

-achievable rate region is derived, representing the source coding rates subject to a fixed blocklength

$n$

and the target error probability

$\epsilon$

. Additionally, the optimum source coding rate can be approximated by calculating the infimum of the

$(n,\epsilon)$

-achievable rate region with a rate dispersion function. On the technical side, a general implementation for CMSC is proposed, which fully leveraging channel coding and artificial intelligence (AI) semantic analysis to achieve the optimum rate. Numerical results demonstrate that CMSC can obtain 50% improvement in theory and 37.5% enhancement in practice against the baseline model abstracted from traditional schemes when multi-modal sources are semantically correlated.

查看译文

关键词

Cross-modal,source coding,semantic relevance,achievable rate region,optimum rate,video and haptic coding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要