Faster Algorithms for Fair Max-Min Diversification in ℝ^d
arxiv(2024)
摘要
The task of extracting a diverse subset from a dataset, often referred to as
maximum diversification, plays a pivotal role in various real-world
applications that have far-reaching consequences. In this work, we delve into
the realm of fairness-aware data subset selection, specifically focusing on the
problem of selecting a diverse set of size k from a large collection of n
data points (FairDiv).
The FairDiv problem is well-studied in the data management and theory
community. In this work, we develop the first constant approximation algorithm
for FairDiv that runs in near-linear time using only linear space. In contrast,
all previously known constant approximation algorithms run in super-linear time
(with respect to n or k) and use super-linear space. Our approach achieves
this efficiency by employing a novel combination of the Multiplicative Weight
Update method and advanced geometric data structures to implicitly and
approximately solve a linear program. Furthermore, we improve the efficiency of
our techniques by constructing a coreset. Using our coreset, we also propose
the first efficient streaming algorithm for the FairDiv problem whose
efficiency does not depend on the distribution of data points. Empirical
evaluation on million-sized datasets demonstrates that our algorithm achieves
the best diversity within a minute. All prior techniques are either highly
inefficient or do not generate a good solution.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要