Structure vs Combinatorics in Computational Complexity.

Bulletin of the EATCS(2014)

引用 26|浏览18
暂无评分
摘要
Some computational problems seem to have a certain “structure” that is manifested in non-trivial algorithmic properties, while others are more “unstructured” in the sense that they are either “very easy” or “very hard”. I survey some of the known results and open questions about this classification and its connections to phase transitions, average-case complexity, quantum computing and cryptography. Computational problems come in all different types and from all kinds of applications, arising from engineering as well the mathematical, natural, and social sciences, and involving abstractions such as graphs, strings, numbers, and more. The universe of potential algorithms is just as rich, and so a priori one would expect that the best algorithms for different problems would have all kinds of flavors and running times. However natural computational problems “observed in the wild” often display a curious dichotomy— either the running time of the fastest algorithm for the problem is some small polynomial in the input length (e.g., O(n) or O(n2)) or it is exponential (i.e., 2 n for some constant > 0). Moreover, while indeed there is a great variety of efficient algorithms for those problems that admit them, there are some general principles such as convexity (i.e., the ability to make local improvements to suboptimal solutions or local extensions to partial ones) that seem to underly a large number of these algorithms.1 This phenomenon is also related to the “unreasonable effectiveness” of the notion of NP-completeness in classifying the complexity of thousands of problems arising from dozens of fields. While a priori you would expect problems in the class NP (i.e., those whose solution can be efficiently certified) to have all types of complexities, for natural problems it is often the case that they are either ∗This is an adaptation of the blog post http://windowsontheory.org/2013/10/07/ structure-vs-combinatorics-in-computational-complexity/ 1The standard definition of “convexity” of the solution space of some problem only applies to continuous problems and means that any weighted average of two solutions is also a solution. However, I use “convexity” here in a broad sense meaning having some non-trivial ways to combine several (full or partial) solutions to create another solution; for example having a matroid structure, or what’s known as “polymorphisms” in the constraint-satisfaction literature [5, 25, 31]. in P (i.e., efficiently solveable) or are NP-hard (i.e., as hard as any other problem in NP, which often means complexity of 2 n, or at least 2n for some constant > 0). To be sure, none of these observations are universal laws. In fact there are theorems showing exceptions to such dichotomies: the Time Hierarchy Theorem [20] says that for essentially any time-complexity function T (·) there is a problem whose fastest algorithm runs in time (essentially) T (n). Also, Ladner’s Theorem [27] says that, assuming P,NP, there are problems that are neither in P nor are NP-complete. Moreover, there are some natural problems with apparent “intermediate complexity”. Perhaps the most well known example is the Integer Factoring problem mentioned below. Nevertheless, the phenomenon of dichotomy, and the related phenomenon of recurring algorithmic principles across many problems, seem far too prevalent to be just an accident, and it is these phenomena that are the topic of this essay. I believe that one reason underlying this pattern is that many computational problems, in particular those arising from combinatorial optimization, are unstructured. The lack of structure means that there is not much for an algorithm to exploit and so the problem is either “very easy”— e.g., the solution space is simple enough so that the problem can be solved by local search or convex optimization2— or it is “very hard”— e.g., it is NP-hard and one can’t do much better than exhaustive search. On the other hand there are some problems that posses a certain (often algebraic) structure, which typically is exploitable in some non-trivial algorithmic way. These structured problems are hence never “extremely hard”, but they are also typically not “extremely easy” since the algorithms solving them tend to be more specialized, taking advantage of their unique properties. In particular, it is harder to understand the complexity of these algebraic problems, and they are more likely to yield algorithmic surprises. I do not know of a good way to formally classify computational tasks into combinatorial/unstructured vs. algebraic/structured ones, but in the rest of this essay I try to use some examples to get a better sense of the two sides of this divide. The observations below are not novel, though I am not aware of explicit expositions of such a classification (and would appreciate any pointers, as well as any other questions or critique). As argued below, more study into these questions would be of significant interest, in particular for cryptography and average-case complexity. 1 Combinatorial/Unstructured problems The canonical example of an unstructured combinatorial problem is SAT— the task of determining, given a Boolean formula φ in variables x1, . . . , xn with the operators ¬,∧,∨, whether there exists an assignment x to the variables that makes φ(x) true. SAT is an NP2Of course even if the algorithm is simple, analyzing it can be quite challenging, and actually obtaining the fastest algorithm, as opposed to simply one that runs in polynomial time, often requires additional highly non-trivial ideas. Figure 1: An illustration of the solution space geometry of a random SAT formula, where each point corresponds to an assignment with height being the number of constraints violated by the assignment. The left figure depicts the “ball” regime, where a satisfying assignment can be found at the bottom of a smooth “valley” and hence local algorithms will quickly converge to it. The right figure depicts the “shattered” regime where the surface is very ragged, with an exponential number of crevices and local optima, thus local algorithms (and as far as we know any algorithm) will likely fail to find a satisfying assignment. Figures courtesy of Amin Coja-Oghlan. complete problem, which means it cannot be solved efficiently unless P=NP. In fact, the Exponential Time Hypothesis [21] posits that every algorithm solving SAT must take at least 2 n time for some > 0. SAT illustrates the above dichotomy in the sense that its natural restrictions are either as hard as the general, or become easily easily solvable, as in the case of the 2SAT problem (where the formula is in conjunctive normal form with each clause of arity 2) that can be solved efficiently via a simple propagation algorithm. This observation applies much more generally than SAT. In particular the widely believed FederVardi dichotomy conjecture [17] states that every constraint satisfaction problem (CSP) is either NP hard or in P. In fact, researchers conjecture [5] (and have partially confirmed) the stronger statement that every CSP can either be solved by some specific algorithms of low polynomial-time (such as propagation or generalizations of Gaussian elimination) or is NP hard via a linear blowup reduction from SAT, and hence (under the Exponential Time Hypothesis) cannot be solved faster than 2 n time for some > 0.3 Random SAT formulas also display a similar type of dichotomy. Recent research into random k-SAT (based also on tools from statistical physics) suggests that they have multiple thresholds where the problem changes its nature (see, e.g. [12, 16, 14] and the references within). When the density α (i.e., ratio of constraints to variables) of the formula is larger than some number αs (equal roughly to 2k ln 2) then with high probability the formula is “overconstrained” and no satisfying assignment exists. There is some number αd < αs (equal roughly to 2k ln k/k), such that for α < αd, the space of satisfying assignments 3The main stumbling block for completing the proof is dealing with those CSPs that require a Gaussianelimination type algorithm to solve; one can make the argument that those CSP’s actually belong to the algebraic side of our classification, further demonstrating that obtaining precise definitions of these notions is still a work in progress. Depending on how it will be resolved, the Unique Games Conjecture, discussed here [3], might also give rise to CSP’s with “intermediate complexity” in the realm of approximation algorithms. Interestingly, both these issues go away when considering random, noisy, CSP’s, as in this case solving linear equations becomes hard, and solving Unique Games becomes easy. for a random formula looks roughly like a discrete ball, and, due to this relatively simple geometry some local-search type algorithms can succeed in finding satisfying assignments. However for α ∈ (αd, αs), satisfying assignments still exist, but the geometry of the solution space becomes vastly different, as it shatters into exponentially many clusters, each such cluster separated from the others by a sea of assignments that violate a large number of the constraints, see Figure 1. In this regime no efficient algorithm is known to find the satisfying assignment, and it is possible that this is inherently hard [2, 37].4 Dichotomy means that when combinatorial problems are hard, then they are typically very hard, not just in the sense of not having a subexponential algorithm, but they also can’t be solved non-trivially in some intermediate computational models that are stronger than P but cannot solve all of NP such as quantum computers, statistical zero knowledge, and others. In particular it’s been observed by several people that for combinatorial problems the existence of a good characterization (i.e., the ability to efficiently verify both the existence and non-existence of a solution) goes hand-in-hand with the existence of a good algorithm. Using complexity jargon, in
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要