Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
arxiv(2023)
摘要
We introduce a Depicted image Quality Assessment method (DepictQA),
overcoming the constraints of traditional score-based methods. DepictQA allows
for detailed, language-based, human-like evaluation of image quality by
leveraging Multi-modal Large Language Models (MLLMs). Unlike conventional Image
Quality Assessment (IQA) methods relying on scores, DepictQA interprets image
content and distortions descriptively and comparatively, aligning closely with
humans' reasoning process. To build the DepictQA model, we establish a
hierarchical task framework, and collect a multi-modal IQA training dataset. To
tackle the challenges of limited training data and multi-image processing, we
propose to use multi-source training data and specialized image tags. These
designs result in a better performance of DepictQA than score-based approaches
on multiple benchmarks. Moreover, compared with general MLLMs, DepictQA can
generate more accurate reasoning descriptive languages. Our work demonstrates
the utility of our full-reference dataset in non-reference applications, and
indicates that language-based IQA methods have the potential to be customized
for individual preferences.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要