SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
arxiv(2023)
摘要
While vision-language models (VLMs) have achieved remarkable performance
improvements recently, there is growing evidence that these models also posses
harmful biases with respect to social attributes such as gender and race. Prior
studies have primarily focused on probing such bias attributes individually
while ignoring biases associated with intersections between social attributes.
This could be due to the difficulty of collecting an exhaustive set of
image-text pairs for various combinations of social attributes. To address this
challenge, we employ text-to-image diffusion models to produce counterfactual
examples for probing intersectional social biases at scale. Our approach
utilizes Stable Diffusion with cross attention control to produce sets of
counterfactual image-text pairs that are highly similar in their depiction of a
subject (e.g., a given occupation) while differing only in their depiction of
intersectional social attributes (e.g., race gender). Through our
over-generate-then-filter methodology, we produce SocialCounterfactuals, a
high-quality dataset containing 171k image-text pairs for probing
intersectional biases related to gender, race, and physical characteristics. We
conduct extensive experiments to demonstrate the usefulness of our generated
dataset for probing and mitigating intersectional social biases in
state-of-the-art VLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要