Hardness of Learning Boolean Functions from Label Proportions
Foundations of Software Technology and Theoretical Computer Science(2024)
摘要
In recent years the framework of learning from label proportions (LLP) has
been gaining importance in machine learning. In this setting, the training
examples are aggregated into subsets or bags and only the average label per bag
is available for learning an example-level predictor. This generalizes
traditional PAC learning which is the special case of unit-sized bags. The
computational learning aspects of LLP were studied in recent works (Saket,
NeurIPS'21; Saket, NeurIPS'22) which showed algorithms and hardness for
learning halfspaces in the LLP setting. In this work we focus on the
intractability of LLP learning Boolean functions. Our first result shows that
given a collection of bags of size at most 2 which are consistent with an OR
function, it is NP-hard to find a CNF of constantly many clauses which
satisfies any constant-fraction of the bags. This is in contrast with the work
of (Saket, NeurIPS'21) which gave a (2/5)-approximation for learning ORs
using a halfspace. Thus, our result provides a separation between constant
clause CNFs and halfspaces as hypotheses for LLP learning ORs.
Next, we prove the hardness of satisfying more than 1/2 + o(1) fraction of
such bags using a t-DNF (i.e. DNF where each term has ≤ t literals) for
any constant t. In usual PAC learning such a hardness was known (Khot-Saket,
FOCS'08) only for learning noisy ORs. We also study the learnability of
parities and show that it is NP-hard to satisfy more than (q/2^q-1 +
o(1))-fraction of q-sized bags which are consistent with a parity using a
parity, while a random parity based algorithm achieves a
(1/2^q-2)-approximation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要