On the Content Bias in Fréchet Video Distance
arxiv(2024)
摘要
Fréchet Video Distance (FVD), a prominent metric for evaluating video
generation models, is known to conflict with human perception occasionally. In
this paper, we aim to explore the extent of FVD's bias toward per-frame quality
over temporal realism and identify its sources. We first quantify the FVD's
sensitivity to the temporal axis by decoupling the frame and motion quality and
find that the FVD increases only slightly with large temporal corruption. We
then analyze the generated videos and show that via careful sampling from a
large set of generated videos that do not contain motions, one can drastically
decrease FVD without improving the temporal quality. Both studies suggest FVD's
bias towards the quality of individual frames. We further observe that the bias
can be attributed to the features extracted from a supervised video classifier
trained on the content-biased dataset. We show that FVD with features extracted
from the recent large-scale self-supervised video models is less biased toward
image quality. Finally, we revisit a few real-world examples to validate our
hypothesis.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要