Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images
CoRR(2024)
摘要
Capturing the diversity of people in images is challenging: recent literature
tends to focus on diversifying one or two attributes, requiring expensive
attribute labels or building classifiers. We introduce a diverse people image
ranking method which more flexibly aligns with human notions of people
diversity in a less prescriptive, label-free manner. The Perception-Aligned
Text-derived Human representation Space (PATHS) aims to capture all or many
relevant features of people-related diversity, and, when used as the
representation space in the standard Maximal Marginal Relevance (MMR) ranking
algorithm, is better able to surface a range of types of people-related
diversity (e.g. disability, cultural attire). PATHS is created in two stages.
First, a text-guided approach is used to extract a person-diversity
representation from a pre-trained image-text model. Then this representation is
fine-tuned on perception judgments from human annotators so that it captures
the aspects of people-related similarity that humans find most salient.
Empirical results show that the PATHS method achieves diversity better than
baseline methods, according to side-by-side ratings from human annotators.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要