Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities
CoRR(2024)
摘要
The potential harms of the under-representation of minorities in training
data, particularly in multi-modal settings, is a well-recognized concern. While
there has been extensive effort in detecting such under-representation,
resolution has remained a challenge. With recent advancements in generative AI,
large language models and foundation models have emerged as versatile tools
across various domains. In this paper, we propose Chameleon, a system that
efficiently utilizes these tools to augment a data set with a minimal addition
of synthetically generated tuples, in order to enhance the coverage of the
under-represented groups. Our system follows a rejection sampling approach to
ensure the generated tuples have a high quality and follow the underlying
distribution. In order to minimize the rejection chance of the generated
tuples, we propose multiple strategies for providing a guide for the foundation
model. Our experiment results, in addition to confirming the efficiency of our
proposed algorithms, illustrate the effectiveness of our approach, as the
unfairness of the model in a downstream task significantly dropped after data
repair using Chameleon.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要