# Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning

2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), pp. 266-275, 2016.

EI

Keywords:

bounded storage cryptographytime space tradeoffsbranching programsparity learninglearning problemMore(9+)

Weibo:

Abstract:

We prove that any algorithm for learning parities requires either a memory of quadratic size or an exponential number of samples. This proves a recent conjecture of Steinhardt, Valiant and Wager [15] and shows that for some learning problems a large storage space is crucial. More formally, in the problem of parity learning, an unknown str...More

Code:

Data:

Introduction

- Parity learning can be solved in polynomial time, by Gaussian elimination, using O(n) samples and O(n2) memory bits.
- For a vertex v in a branching program for parity learning, the authors denote by Px|v the distribution of the random variable x, conditioned on the event that the vertex v was reached by the computation-path.

Highlights

- There was no non-trivial lower bound on the number of samples needed, for any learning problem, even if the allowed memory size is O(n)
- We show an encryption scheme that requires a private key of length n, as well as time complexity of n per encryption/decription of each bit, and is provenly and unconditionally secure as long as the attacker uses less than n2 25 memory bits and the scheme is used at most an exponential number of times
- Parity learning can be solved in polynomial time, by Gaussian elimination, using O(n) samples and O(n2) memory bits
- We prove that any algorithm for parity learning requires either n2 25 memory bits, or an exponential number of samples
- We show that there exist concept classes that can be efficiently learnt from a polynomial number of samples, if the learner has access to a quadratic-size memory, but require an exponential number of samples if the memory used by the learner is of less than quadratic size

Results

- Note that the edge (u, (v, s)) satisfies the soundness property in the definition of an affine branching program: If s = ∗, the vertex (v, s) is labeled by s = σv(w) and by Poperty 2 of Lemma 4.3, w ⊆ σv(w).
- For a vertex v in layer j of Pj−1, let σv : A(n) → A(n) be the partial function whose existence is guaranteed by Lemma 4.3, when applied on the random variable Wv = W |(V = v), and extend σv : A(n) → A(n) so that it outputs the special value ∗ on every element where it was previously undefined.
- The authors define Uj = (V, σV (W )) ∈ Lj. Let yj be a random variable uniformly distributed over the subspace w(Uj), and let Vj be the vertex in Lj, reached by the computation-path of Pj. The authors need to prove that
- The authors start by a lemma that will be used, in the proof of Theorem 2, to obtain time-space lower bounds for affine branching programs.
- Let P be a length m affine branching program for parity learning, such that, for every vertex u of P , dim(w(u)) ≥ k.
- 0, such that the following holds: Let be a branching program of length at most 2αn and width at most 2cn2 for parity learning, such that, the output of is always an affine subspace of dimension
- Be a branching program of length m = 2αn and width d = 2cn2 for parity learning, such that, the output of is always an affine subspace of dimension

Conclusion

- (Otherwise, the authors can just remove u as it is unreachable from the start vertex, since the authors defined all vertices labeled by subspaces of dimension k to be leaves and since by the soundness property in Definition 5.2, the dimensions along the computation-path can only decrease by
- By Lemma 7.1, and by substituting the values of m, d, k, r, the probability that the computation-path of P reaches some vertex that is labeled with an affine subspace of dimension k is at most

Summary

- Parity learning can be solved in polynomial time, by Gaussian elimination, using O(n) samples and O(n2) memory bits.
- For a vertex v in a branching program for parity learning, the authors denote by Px|v the distribution of the random variable x, conditioned on the event that the vertex v was reached by the computation-path.
- Note that the edge (u, (v, s)) satisfies the soundness property in the definition of an affine branching program: If s = ∗, the vertex (v, s) is labeled by s = σv(w) and by Poperty 2 of Lemma 4.3, w ⊆ σv(w).
- For a vertex v in layer j of Pj−1, let σv : A(n) → A(n) be the partial function whose existence is guaranteed by Lemma 4.3, when applied on the random variable Wv = W |(V = v), and extend σv : A(n) → A(n) so that it outputs the special value ∗ on every element where it was previously undefined.
- The authors define Uj = (V, σV (W )) ∈ Lj. Let yj be a random variable uniformly distributed over the subspace w(Uj), and let Vj be the vertex in Lj, reached by the computation-path of Pj. The authors need to prove that
- The authors start by a lemma that will be used, in the proof of Theorem 2, to obtain time-space lower bounds for affine branching programs.
- Let P be a length m affine branching program for parity learning, such that, for every vertex u of P , dim(w(u)) ≥ k.
- 0, such that the following holds: Let be a branching program of length at most 2αn and width at most 2cn2 for parity learning, such that, the output of is always an affine subspace of dimension
- Be a branching program of length m = 2αn and width d = 2cn2 for parity learning, such that, the output of is always an affine subspace of dimension
- (Otherwise, the authors can just remove u as it is unreachable from the start vertex, since the authors defined all vertices labeled by subspaces of dimension k to be leaves and since by the soundness property in Definition 5.2, the dimensions along the computation-path can only decrease by
- By Lemma 7.1, and by substituting the values of m, d, k, r, the probability that the computation-path of P reaches some vertex that is labeled with an affine subspace of dimension k is at most

Funding

- Research supported by the Israel Science Foundation grant No 1402/14, by the I-CORE Program of the Planning and Budgeting Committee and the Israel Science Foundation, by the Simons Collaboration on Algorithms and Geometry, by the Fund for Math at IAS, and by the National Science Foundation grant No CCF1412958

Reference

- [A99a] Miklos Ajtai: Determinism versus Non-Determinism for Linear Time RAMs. STOC 1999: 632-641 3
- [A99b] Miklos Ajtai: A Non-linear Time Lower Bound for Boolean Branching Programs. FOCS 1999: 60-70 3
- [ADR02] Yonatan Aumann, Yan Zong Ding, Michael O. Rabin: Everlasting security in the bounded storage model. IEEE Transactions on Information Theory 48(6): 1668-1680 (2002) 4
- [AR99] Yonatan Aumann, Michael O. Rabin: Information Theoretically Secure Communication in the Limited Storage Space Model. CRYPTO 1999: 65-79 4
- [B86] David A. Mix Barrington: Bounded-Width Polynomial-Size Branching Programs Recognize Exactly Those Languages in NC1. J. Comput. Syst. Sci. 38(1): 150-164 (1989) (also in STOC 1986) 3
- [BJS98] Paul Beame, T. S. Jayram, Michael E. Saks: Time-Space Tradeoffs for Branching Programs. J. Comput. Syst. Sci. (JCSS) 63(4):542-572 (2001) (also in FOCS 1998) 3
- [BSSV00] Paul Beame, Michael E. Saks, Xiaodong Sun, Erik Vee: Time-space trade-off lower bounds for randomized computation of decision problems. J. ACM (JACM) 50(2):154-195 (2003) (also in FOCS 2000) 3
- [CM97] Christian Cachin, Ueli M. Maurer: Unconditional Security Against MemoryBounded Adversaries. CRYPTO 1997: 292-306 4
- [DM04] Stefan Dziembowski, Ueli M. Maurer: On Generating the Initial Key in the Bounded-Storage Model. EUROCRYPT 2004: 126-137 4
- [F97] Lance Fortnow: Time-Space Tradeoffs for Satisfiability. J. Comput. Syst. Sci. 60(2): 337-353 (2000) (also in CCC 1997) 3
- [FLvMV05] Lance Fortnow, Richard J. Lipton, Dieter van Melkebeek, Anastasios Viglas: Time-space lower bounds for satisfiability. J. ACM 52(6): 835-865 (2005) 3
- [M92] Ueli M. Maurer: Conditionally-Perfect Secrecy and a Provably-Secure Randomized Cipher. J. Cryptology 5(1): 53-66 (1992) 4
- [vM07] Dieter van Melkebeek: A Survey of Lower Bounds for Satisfiability and Related Problems. Foundations and Trends in Theoretical Computer Science, 2: 197-303, 2007. 3
- [S14] Ohad Shamir: Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation. NIPS 2014: 163-171 2
- [SVW15] Jacob Steinhardt, Gregory Valiant, Stefan Wager: Memory, Communication, and Statistical Queries. Electronic Colloquium on Computational Complexity (ECCC) 22: 126 (2015) 1, 2, 6
- [V03] Salil P. Vadhan: Constructing Locally Computable Extractors and Cryptosystems in the Bounded-Storage Model. J. Cryptology 17(1): 43-77 (2004) (also in Crypto 2003) 4
- [W06] Ryan Williams: Inductive Time-Space Lower Bounds for Sat and Related Problems. Computational Complexity 15(4): 433-470 (2006) 3
- [W07] Ryan Williams: Time-Space Tradeoffs for Counting NP Solutions Modulo Integers. IEEE Conference on Computational Complexity 2007: 70-82 3

Best Paper

Best Paper of FOCS, 2016

Tags

Comments