Towards a Tight Hardness-Randomness Connection Between ...

Towards a Tight Hardness-Randomness Connection Between Permanent and Arithmetic Circuit Identity Testing Maurice Jansen LFCS, School of Informatics The University of Edinburgh [email protected]

April 22, 2011

Abstract In this paper we make progress on establishing a tight connection between the problem of derandomization of arithmetic circuit identity P testing Qn (ACIT), and the arithmetic circuit complexity of the permanent defined by pern = σ∈Sn i=1 xiσ(i) . We develop an ACIT-based derandomization hypothesis, and show this is a necessary condition for proving that permanent has super-polynomial arithmetic circuits over F, for fields F of characteristic zero. Informally, this hypothesis poses the existence of a subexponential size hitting set1 Hn computable by subexponential size uniform TC0 circuits against size n arithmetic circuits with m ≤ n variables whose output is multilinear. Assuming the Generalized Riemann Hypothesis (GRH), it can be shown that this hypothesis is sufficient for showing that either permanent does not have polynomial size (nonuniform) arithmetic circuits, or that the Boolean circuit class uniform TC0 is strictly contained in uniform NC2 . Without (GRH), the hypothesis implies such a disjunction, but with the first item stating permanent does not have polynomial size constant-free2 arithmetic circuits. In this setting the converse also goes through, but based on the slightly stronger assumption that all constant multiples an · pern require super-polynomial constant-free arithmetic circuits, for an ∈ Z/{0} computable by poly(n) size constant-free circuits.

1

Introduction

Let F[X] denote the polynomial ring over some field F of characteristic zero and a set X = {x1 , x2 , . . . , xn } of indeterminates. An arithmetic circuit Φ over F is given by a directed acyclic graph where nodes of in-degree zero are labeled by elements of F ∪ X, and where all other nodes are labeled by + and ×. To each node in Φ, we can associate a polynomial in F[X] in the obvious way. The polynomial computed by Φ is given to be the polynomial associated to some designated output node. The size of Φ, denoted by |Φ|, is measured by counting edges. For a polynomial f , its arithmetic circuit size LF (f ) is the minimum size of an arithmetic circuit computing f . We consider 1 A set Hn ⊆ Fn is a hitting set against some class of arithmetic circuits C of size n, if for every nonzero polynomial f (x1 , x2 , . . . , xm ) in m ≤ n variables computed by a circuit in C, there exists (h1 , h2 , . . . , hn ) ∈ Hn with f (h1 , h2 , . . . , hm ) 6= 0. 2 For a constant-free arithmetic circuit the only allowed constant labels are in {−1, 1} (and we disallow divisions).

1

the problem of proving arithmetic circuit size lower bounds for the permanent P super-polynomial Qn polynomial pern = σ∈Sn i=1 xiσ(i) . Permanent is complete for Valiant’s algebraic complexity class VNPF , which is the nondeterminstic counterpart of the class VPF of poly degree polynomials computable by poly size arithmetic circuits over F. Proving VPF 6= VNPF is the central open problem in algebraic complexity theory, and this is equivalent to showing that LF (pern ) 6= nO(1) , cf. [BCS97]. Arithmetic circuit identity testing (ACIT) over F is the problem of deciding for a given arithmetic circuit Φ over F, whether the polynomial computed by Φ is identical to the zero polynomial of F[X]. This problem is efficiently solvable using randomization. Using the Schwartz-ZippeldeMillo-Lipton Lemma [DL78, Sch80, Zip79], Ibarra and Moran [IM83] show the problem is in coRP. Derandomization of ACIT and the problem of proving explicit circuit lower bounds are closely connected. This connection already appears in 1980s work of Heintz and Schnorr [HS80]. In recent years our understanding of this connection has improved remarkably. In a landmark paper, Kabanets and Impagliazzo [KI04] show that giving an NSUBEXP time algorithm for ACIT(Q) implies that either the permanent polynomial requires super-polynomial arithmetic circuits over Q, or that NEXP 6⊆ P/poly. Agrawal [Agr05] shows that explicitly constructing a poly(n) size hitting set against the class of size n arithmetic circuits yields an exponential arithmetic circuit size lower bound for a polynomial with coefficients computable in PSPACE. Going in the converse direction, Ref. [KI04] shows how arithmetic lower bounds can be leveraged to yield low-degree identity testing by using Nisan-Wigderson designs [NW94] and a result about polynomial factorization independently obtained by Kaltofen [Kal89] and B¨ urgisser [B¨ ur04]. The above mentioned results make it conceivable that the statement LF (pern ) 6= nO(1) is equivalent to some ACIT-based derandomization assumption. In this paper we make progress towards this aim. We establish a necessary derandomization condition for proving super-polynomial lower bounds for permanent. For the converse, we prove a result similar in flavor to Ref. [KI04], but for the black-box3 setting. Namely, we show that our derandomization condition is sufficient to imply that either permanent requires super-polynomial (nonuniform) arithmetic circuit, or that the Boolean circuit class uniform TC0 is strictly contained in uniform NC2 .

1.1

Results

Before we state our results formally we need some definitions, in particular the notion of succinctness from [JS11], which is a relaxation of uniformity. For a family of arithmetic circuits {Φn } and functions a(n) and b(n), one says {Φn } is (a(n), b(n))-succinct, if there exists a non-uniform family of Boolean circuits {Cn } of size O(b(n)) with a(n) inputs, such that Cn can correctly answer direct connection language queries about Φn (See Section 2 for formal definitions). DLOGTIME uniformity implies (O(log n), O(log n)-succinctness. Given an integer sequence an (i, j) with 0 ≤ i < n, 0 ≤ j < p(n) for some function p(n), we think of this as defining a collection {Hn } of subsets Hn ⊆ Zn with |Hn | = p(n), where Hn = {(an (0, j), an (1, j), . . . , an (n − 1, j)) : 0 ≤ j < p(n)}. Say t(n) bounds the bit size of an (i, j). The sequence an (i, j) is computed by a circuit family {Cn }, if for every n, Cn takes i and j as input in binary and outputs the t(n) bits of an (i, j). In this case we say Cn encodes the set Hn , and more generally that {Cn } encodes {Hn }. The following is our central hypothesis: 3

In the black-box setting the testing algorithm does not get to inspect the circuit computing the polynomial f that is being tested. Only query access to f is available.

2

Hypothesis 1. Let d ∈ N be a constant. Let γ = γ(n) be a monotone function such that γ = ω(1) and γ = O(log log n). Suppose there exists a family {Hn } of subsets of Zn encoded by (n1/γ , n1/γ )1/γ succinct4 TC0 circuits of size 2n and depth d, where for all but finitely many n, coordinate-wise 1/γ bit sizes of elements of Hn and |Hn | are bounded by 2n , and where the following condition is satisfied: • for any arithmetic circuit Φ over F of size n over m ≤ n variables, if Φ(x1 , x2 , . . . , xm ) computes a nonzero multilinear polynomial, then there exist (h1 , h2 , . . . , hn ) ∈ Hn such that Φ(h1 , h2 , . . . , hm ) 6= 0. Informally, Hypothesis 1 asserts the existence of ‘close to uniform’ TC0 circuits of subexponential size, computing a map whose image is a hitting set, with a subexponential bound on its size and bit sizes of its elements, against the class of arithmetic circuits over F of size n with multilinear output. The above should be contrasted with Ref. [Agr05], which proposes an approach towards VPF 6= VNPF . There, hitting sets are requested5 to be of polynomial size, but this is not supported as a necessary condition by any known hardness-to-randomness result. Our hypothesis is a necessary condition for super-polynomial lower bounds for permanent in the following precise sense: Theorem 1. Let γ 0 = γ 0 (n) be a monotone function such that γ 0 = ω(1) and γ 0 = O(log log n). If 0 for all large enough n, LF (pern ) ≥ nγ (n) , then Hypothesis 1 is satisfied with γ = γ 0 /c, for some constant c. For the converse, we do not know of any derandomization assumption for multivariate ACIT that implies super-polynomial arithmetic circuit lower bounds for permanent. We do know such a connection for the univariate case, due to Koiran [Koi10], but there the hardness-to-randomness direction is open. This raises the question whether ideas of Ref. [Koi10] can be used for the multivariate case, which is exactly what our approach does. Unfortunately, at the present we cannot prove that Hypothesis 1 implies super-polynomial lower bounds for the permanent, but nevertheless we establish the following interesting connection: Theorem 2. 6 We assume the Generalized Riemann Hypothesis (GRH). Hypothesis 1 implies that at least one of the following must hold: 1. LF (pern ) is not polynomially bounded. 2. uniform TC0 is strictly contained in uniform NC2 . We remark that a similar randomness-to-hardness theorem (Theorem 3) can be established without assuming (GRH) for the constant-free model. Actually, most of our technical work will be done in this direction for the constant-free model first, and then we use the result of Ref. [B¨ ur09], which requires (GRH), to scale up to the general setting. We will also observe that one can establish a hardness-to-randomness result for the constant-free model, similar to Theorem 1. 4

We prefer to use the notion of succinctness for technical convenience. It is not an essential ingredient. Clearly for the randomness-to-hardness direction one may replace this by DLOGTIME uniformity. 5 Step 3. page 10. Ref. [Agr05]. 6 We will obtain this theorem as a corollary to a stronger statement (Theorem 4).

3

1.2

Techniques

For Theorem 1 we will use Viola’s [Vio04] result for computing Nisan-Wigderson designs [NW94] by uniform AC0 circuits, and the algebraic hardness to randomness approach of [KI04]. We will observe that results of [Kal89, B¨ ur04] can be omitted, and replace this by an elementary argument using formal power series. For the technical part of the randomness-to-hardness direction we work in the constant-free setting, and afterwards generalize using Ref. [B¨ ur00], which assumes (GRH). We use a ‘compression argument’. Such arguments have been instrumental previously in works by several authors [B¨ ur09, Koi10, JS11]. In Ref. [JS11] a randomness-to-hardness theorem (Theorem 2) is given for the succinct constant depth arithmetic circuit model. For our Theorem 2 we show that the proof of (Theorem 2, [JS11]) can be optimized to work with the more appealing derandomization assumption Hypothesis 27 , rather than the provisional (Working Hypothesis 1, [JS11]). The proof in Ref. [JS11], which goes by contradiction, at a crucial juncture leverages the assumption that permanent can be computed by succinct circuits, to succinctly/implicitly solve a ‘large’ systems of linear equations. What we will prove here instead, is that we can leverage the assumption TC0 = NC2 (even a weaker assumption in terms of succinct circuits). Since the other parts of the proof (there) do not require the succinctness assumption for the circuits for permanent, one gets into a win-win situation that leads to our result. Namely, if TC0 6= NC2 , we are done, otherwise we will use the assumption that TC0 = NC2 to deal with solving ’large’ systems of equations more or less implicitly to conclude that permanent does not have (constant-free) arithmetic circuits of polynomial size. To give some perspective, in a way getting an equivalent formulation of VPF 6= VNPF in terms of derandomization of ACIT requires getting the right notion of what is an ‘algebraic pseudorandom generator’. Both Ref. [Agr05] and [Koi10] contain their proposals. Our approach is closest to Ref. [Koi10], but also contains elements of Ref. [Agr05]. However, it seems that in the algebraic setting we are still trying to figure out this basic notion. This is in strong contrast to the Boolean case where the salient notions have been worked out, together with tight hardness-randomness relations8 . It is plausible a refinement of our methods can result in erasing the second condition from Theorem 2 in order to arrive at such a more desirable state of knowledge.

2

Preliminaries

We first introduce some additional measures for arithmetic circuits. The formal degree of gates in an arithmetic circuit is defined by induction induction. Gates with labels in X ∪ F have formal degree 1. For a +-gate we take the maximum degree of of its inputs. For a ×-gate add the degrees of its inputs. For a constant-free arithmetic circuits constant gate labels must be in {−1, 1}. The τ -complexity of f , denoted by τ (f ), is defined to be the size of any smallest constant-free arithmetic circuit computing f , cf. [B¨ ur09, KP07]. A family {fn } of polynomials 0 belongs to the class VP if there exists a family of constant-free arithmetic circuits {Φn } with size and formal degrees nO(1) , where for all n, Φn computes fn . In case the circuits {Φn } are over F, we obtain the class VPF . The classes VNP0 and VNPF are defined as follows. For polynomials s(n), t(n), VNP0 is the class of polynomials {fn }, for which there exists {gn } ∈ VP0 such that 7 8

This is the analog of Hypothesis 1 for the constant-free model. For example, the equivalence between existence of one-way function and existence of cryptographic PRGs.

4

P fn = e∈{0,1}s(n)−t(n) gn (x1 , . . . , xt(n) , e1 , . . . , es(n)−t(n) ). Similarly, if the family {gn } ∈ VPF , we obtain VNPF . The following two results will be of interest to us: Proposition 1 (Proposition 2.10 in [B¨ ur09]). If τ (pern ) = nO(1) , then for any (hn ) ∈ VNP0 , there exists a polynomial p(n) such that τ (2p(n) hn = nO(1) . Lemma 1 (Valiant’s Criterion, cf. [Koi10]). Suppose that p(n) is a polynomial, and that for f : N × N → Z the map 1n 0j 7→ f (j, n), where n is given in unary and j in binary is in GapP/poly. Then the family of polynomials {gn } defined by gn (x1 , x2 , . . . , xp(n) ) = P jp(n) j1 j2 0 j∈{0,1}p(n) f (j, n)x1 , x2 , . . . , xp(n) is in VNP , where jk is the kth bit of j. Next we define some Boolean complexity classes. The collection of functions f : {0, 1}∗ → {0, 1}∗ for which there exists a language A ∈ P and a polynomial p(n) such that f (x) = |{w ∈ {0, 1}p(|x|) : (x, w) ∈ A}| is denoted by #P. The collection of functions of the form f − g with f, g ∈ #P is denoted by GapP. We define the majority operator C. acting on a complexity class. Given a class C, C.C is the class of all languages A for which there exists A0 ∈ C and a 0 p(|x|)−1 . The counting polynomial p(n) such that x ∈ A ⇔ |{w ∈ {0, 1}p(|x|) : (x, w) S ∈ A }| > 2 hierarchy defined by Wagner [Wag86], is given to be CH := i≥0 Ci P, where C0 P = P, and for all i ≥ 1, Ci P = C.Ci−1 P. The first level C1 P equals the standard complexity classPP. Tor´ an Ci P [Tor91] characterizes the counting hierarchy by Ci+1 P = PP , for all i ≥ 0. An advice function is a function of type a : N → {0, 1}∗ . For a complexity class C, define C/poly to be the class of languages for which there exists L0 ∈ C, and advice function h with |a(n)| = nO(1) , such that x ∈ L ⇔ (x, a(|x|)) ∈ L0 . We need the following lemma: Lemma 2 ([B¨ ur09]). If τ (pern ) = nO(1) , then CH/poly = P/poly. AC0 is the class of all languages decidable by polynomial size constant depth circuits with unbounded fan-in gates in {∨, ∧, ¬}. TC0 is the class of all languages that are decidable by polynomial size constant depth unbounded fan-in threshold circuits. For the latter gates are negation, or the majority. NC1 is the class of all languages that can be computed by polynomial size O(log n) depth circuits of bounded fan-in. We have AC0 ⊆ TC0 ⊆ NC1 . We import some definitions from Ref. [JS11]. An integer sequence of bit size q(n) is given by a function an (k1 , k2 ), such that there exist polynomials p(n) and so that an (k1 , k2 ) ∈ Z is defined for all n ≥ 0, and 0 ≤ k1 , k2 < 2p(n) , and where the bit size of an (k1 , k2 ) is bounded by q(n). The language uBit(a) is the set of all tuples (1n , k1 , k2 , j, b) such that the jth bit of an (k1 , k2 ) equals b. Here k1 , k2 and j are encoded in binary, while 1n is a unary encoding of n. For a sequence an (k1 , k2 ) and a complexity class C, if uBit(a) ∈ C, then we say that the sequence an (k1 , k2 ) is weakly-definable in C. We import a definition from Ref. [JS11]. For the set {x1 , x2 , . . . , xn } ∪ {−1, 1} ∪ {+, ×}, it is assumed we have fixed a way of assigning O(log n) bit binary string, which is called a type. Circuit gates are assumed to labeled by unique binary strings, part of which contains the type. A representation of a constant-free arithmetic circuit Φ is given by a Boolean circuit Cn that accepts precisely all tuples (t, a, b, q) such that 1) In case q = 1 (connection query), a and b are numbers of gates in Φ, b is a child of a, and a has type t. 2) In case q = 0 (type query only), a is a number of a gate in Φ, and a is of type t. Let a(n), b(n) be two functions. For a family of arithmetic circuits {Φn }, we say it is (a(n), b(n))succinct, if there exists a non-uniform family of Boolean {∨, ∧, ¬}-circuits {Cn }, such that Cn 5

represents Φn , where for all large enough n, Cn has ≤ a(n) inputs and is of size ≤ b(n). As a matter of convention, if a(n) = O(log n), we drop it from the notation, and just write b(n)-succinct. We define (a(n), b(n))-succinct Boolean circuits is analogously, by letting types names refer to elements of {x1 , x2 , . . . , xn } ∪ {0, 1} ∪ {∨, ∧, ¬, MAJ}. A poly size Boolean circuit family {Cn } is DLOGTIME-uniform, if given (n, t, a, b, q) with n in binary, we can answer the connection query in time O(log n) on a Turning machine. We observe that DLOGTIME-uniform implies O(log n)-succinctness. In this paper, uniformity in conjunction with a complexity class will always refer to DLOGTIME-uniformity.

3

Randomness to Hardness

First we extract out the subroutine F from the proof of Theorem 2 in [JS11] for finding solutions to systems of linear equations. Let < x, y > be a pairing function. We let F : {0, 1}∗ → {0, 1}∗ be m the following mapping. On input x of length n, try to parse x as x =< 12 01r 0y, < e, j >>, for some integer r, m and y ∈ {0, 1}∗ with |y| = 22m r, and with e ∈ {0, 1}m . If x is not of this form, output 0. Otherwise, construct the 2m × 2m matrix M whose (left-to-right, top-to-bottom) entries are given by consecutive r bit blocks of y. Next we try to compute a nonzero integer 2m -vector v such that M v = 0. If no such v can be found, output 0. Otherwise, index this 2m -vector v by e ∈ {0, 1}m , and output the jth bit of the eth component of v. Lemma 3. F can be computed by uniform NC2 circuits. Proof. Decoding the input and checking the format can easily be done in uniform NC2 , so let us consider we have found the matrix M as described above. We want to do the following: find a maximal set of independent rows r10 , r20 , . . . , rk0 of M , and extend this with a set of 2m − k standard basis row vectors to form a nonsingular matrix M 0 of order 2m . Then we can compute the adjugate Adj(M 0 ). For this matrix we have Adj(M 0 )M 0 = det(M 0 )I. Observe that we can obtain an integer solution v to M v = 0, by selecting a column of Adj(M 0 ), in case k 0 < 2m (and if k 0 = 2m the only solution is v = 0). We need to argue this computation can be done in uniform NC2 . The rank of an integer matrix can be computed in logspace uniform NC2 , cf. [ABO99]. Due to [Ruz81], logspace-uniform NC2 is known to equal DLOGTIME-uniform NC2 . Let r1 , r2 , . . . , r2m be the rows of M , and let r2m +1 , r2m +2 , . . . , r2m+1 be standard basis vectors. For each i in parallel we can check whether rank(r1 , . . . , ri ) < rank(r1 , . . . , ri , ri+1 ). This gives us a bit vector w of length 2m+1 which specifies exactly which are the rows of M 0 . We conclude we can obtain M 0 in uniform NC2 . Computing Adj(M 0 ) can be done in uniform NC2 , due to [Ber84] (To argue the extra stringent uniformity one may use [Ruz81] as before). By checking w, if no standard basis vectors where added return the zero vector. Otherwise, letting i be an index in M 0 of a selected standard basis vector, we return the ith column of Adj(M 0 ). Putting all these circuits parts together yields a uniform NC2 circuit computing F . The follow lemma will be applied contrapositively in the case TC0 = NC2 (which we have when reasoning by contradiction in the main theorem). This will be a key step to use F to solve implicitly ‘large’ systems of equations (which will also require compounding other elements of the proof there). Lemma 4. If ∀d, F cannot be computed by size 2polylog(n) TC0 of depth d that are (polylog(n), polylog(n))-succinct, then uniform − T C 0 ( uniform − N C 2 .

6

Proof. If F can be computed by uniform TC0 , then there exists some d and family {Cn } of (O(log n))-succinct TC0 -circuits of depth d and size nO(1) computing F , which contradicts the assumption of the lemma. This separates the function classes TC0 and NC2 , due to Lemma 3. By an easy reduction this implies the separation of the language classes TC0 and NC2 as well. Namely, if a function f (x) computed by NC2 circuits does not have TC0 circuits, then single output function g(x, i) := ‘ith bit of f (x)’ corresponds to a language in NC2 − TC0 . For the constant-free setting we have the following analogue of Hypothesis 1: Hypothesis 2. Let d ∈ N be a constant. Let γ = γ(n) be a monotone function such that γ = ω(1) and γ = O(log log n). Suppose there exists a family {Hn } of subsets of Zn encoded by (n1/γ , n1/γ )1/γ succinct TC0 circuits of size 2n and depth d, where for all but finitely many n, coordinate-wise bit 1/γ sizes of elements of Hn and |Hn | are bounded by 2n , and where for any constant-free arithmetic circuit Φ of size n over m ≤ n variables, if Φ(x1 , x2 , . . . , xm ) computes a nonzero multilinear polynomial, then there exist (h1 , h2 , . . . , hn ) ∈ Hn such that Φ(h1 , h2 , . . . , hm ) 6= 0. Theorem 3. If Hypothesis 2 is true, then one of the following must hold: 1. τ (pern ) is not polynomially bounded. 2. ∀d00 , the function F cannot be computed by size 2polylog(n) TC0 of depth d0 that are (polylog(n), polylog(n))-succinct. Proof. We will show that we can leverage the second item of the theorem statement in a proof similar to the proof of Theorem 2 in [JS11], instead of using the assumption that permanent has succinct circuits as it was made there. We also tighten some of the parameters for as to have the randomness-to-hardness result work with the more appealing derandomization Hypothesis 2. The proof proceeds by contradiction. Hence, we assume that τ ({pern }) = nO(1) . Furthermore, assume also the second item does not holds, i.e. for some d00 it holds that there exists a family {CnF } of size at most 2polylog(n) and depth d00 computing F for input size n that is (polylog(n), polylog(n))succinct. Let an (i, j) be the integer sequence given by Hypothesis 2. Let {Hn } be the associated hitting P2n+1 −1 en+1 cn+1 (e)xe11 xe22 . . . xn+1 set (See remark before Hypothesis 1). Let N = bnγ c. Let fn+1 = e=0 (3), where ej denotes the jth bit of e. To obtain a hard function given a hitting set one uses the simple but elegant idea of defining a function fn+1 that vanishes on the hitting set, but that is not identically zero. For us this means solving a system on linear equations, which is an idea going back to Ref. [Agr05]. Namely, we want to find cn+1 (e) that is a nonzero integer solution to the 1/γ following system: fn+1 (b) = 0, for all b ∈ HN . These are at most 2N ≤ 2n linear equations in 2n+1 variables, so we can get a nonzero solution. We encode this system as an integer sequence. For this, we think of it as given by a 2n+1 × 2n+1 matrix M . Let An be the integer represented by the n+1 1/γ binary string 12 01r 0list(M ), where r := 2N (n + 1) is an upper bounds the bit length of entries of M , and list(M ) is the concatenation of length r bit entries of M in left-to-right, top-to-bottom order. Define `(n) to be the bit length of An . For the bit size of An we can give an upper bound 1/γ 1/γ of 2n+1 + 1 + 2N (n + 1) + 22n+2 2N (n + 1). This is at most 24n , provided n is large enough, which gives that `(n) ≤ 24n , provided n is large enough. Next we want to apply F . We let cn+1 (e) be the integer whose jth bit equals F (< An , < e, j >>). Furthermore, we let fn+1 be the polynomial given by fixing these integer coefficients in Equation (3). We have well-defined a nonzero multilinear polynomial fn+1 such that fn+1 (b) = 7

0, for all b ∈ HN . Note that this implies that fn+1 does not have constant-free arithmetic circuits of size N . Next we want to obtain Boolean formulas computing the coefficients of fn+1 . We will do this in two stages. First we will construct exponential size TC0 circuits that are (nO(1) , nO(1) )-succinct. Then in a ‘first compression step’ we show that, under our assumptions, we get TC0 -circuits to compute the coefficients of fn+1 that are of polynomial size. This can be done by trading in the succinctness property of the previously constructed exponential size TC0 circuits. Once we have small formulas for computing the coefficients of fn+1 , we will be able to write fn+1 as a projection of the permanent polynomial. This will allow us to derive a lower bound for permanent. The solution finding subroutine of F runs in polynomial time. This subroutine is applied to an input of length `(n). Hence for some absolute integer constant α, we get that the output is at most `(n)α ≤ 24αn bits long. This gives that cn+1 (e) has bit length at most 24αn , where we define α to be some absolute integer constant. The following lemma completes the first stage of above mentioned plan. This is where we leverage the assumption regarding F . O(1)

Lemma 5. There exists TC0 -circuit family {Dn } such that 1) |Dn | = 2n , 2) Dn has depth d0 + d00 + O(1), 3) {Dn } is (nO(1) , nO(1) )-succinct, 4) Dn has input gates so it can be given e ∈ {0, 1}n+1 , j ∈ {0, 1}4αn , and 5) Dn (e, j) outputs the jth bit of cn+1 (e). Q 0 Proof. For e0 ∈ {0, 1}n+1 , 0 ≤ j 0 < 2n , let dN (e0 , j 0 ) = np=0 aN (p, j 0 )ep−1 . Claim 1. There exist a TC0 -circuit family {Dn0 } such that • |Dn0 | = 2O(n) . • d(Dn0 ) = d0 + O(1). • The family {Dn0 } is (O(n), 3O(n))-succinct. • Dn0 has input gates so it can be given e0 ∈ {0, 1}n+1 , 0 ≤ j 0 < 2n in binary. • Dn0 (e0 , j 0 ) outputs the r bits of dN (e0 , j 0 ). Proof. Hypothesis 2 gives that we have an (n1/γ , n1/γ )-succinct family {En } of TC0 circuits of size 1/γ 2n and depth d0 , such that En (i, j) computes an (i, j), for 0 ≤ i < n, and 0 ≤ j < 2n . The circuit Dn0 looks as follows. It has n + 1 copies of EN , for all 0 ≤ i ≤ n. Each i is hardcoded into the corresponding copy. We have that , j 0 is used as input for j in each 1/γ copy. One single copy of EN is of size 2N ≤ 2n . We have that it is represented by Boolean circuits of size N 1/γ ≤ n. The (n + 1)-fold duplication can be achieved by adding O(log n) bits to gate names, and adding polylog(n) to the Boolean circuit that represents the individual copy. Hence we can get TC0 circuits that are (2n, 2n)-succinct that computes n + 1 many bit sequences encoding aN (0, j 0 ), aN (1, j 0 ), . . . , aN (n, j 0 ). We obtain n + 1 many bit strings giving 0 0 0 aN (0, j 0 )e0 , aN (1, j 0 )e1 , . . . , aN (n, j 0 )en , by using the bits e00 , e01 , . . . , e0n as mask. This can be done with the required succinctness. We can bound the size of the TC0 circuitry constructed so far by O(n · 2n ). Next we add the uniform TC0 circuits for iterated multiplication from Ref. [HAB01]. The input size for these circuits is is O(n2n ). Hence we have a Boolean circuit of size O(n) representing the circuit for iterated multiplication. We conclude that we have a (O(n), O(n))-succinct 8

TC0 circuit family of size at most 2O(n) and depth d0 + O(1), by simply merging the representation for part computing iterated multiplication, with the representation of the circuit computing aN (0, j 0 ), aN (1, j 0 ), . . . , aN (m − 1, j 0 ). The circuit Dn is formed by taking 2n+1 · 2n copies of Dn0 by ranging over all e0 ∈ {0, 1}n+1 and 0 ≤ j 0 < 2n . The values of e0 and j 0 are hardcoded in these copies. This circuitry computed n+1 list(Ms ). By adding constants gates computing 12 01r , and padding out with 0 gates, we obtain circuitry computing An . The size can be bounded by 22n+1 · |Dn0 | = 2O(n) . Similarly as we have seen before with duplication of this kind, this circuitry it is easily seen to be (O(n), O(n))-succinct. Dn receives inputs e ∈ {0, 1}m and j ∈ {0, 1}4αn . We now use our assumption regarding F . Using the family {CnF }, we add circuitry to Dn so that F (< An , < e, j >>) is computed. This means we apply F with input size n0 = 2O(αn) , and we add CnF0 to achieve the computation. The 0 O(1) latter circuit is of size 2polylog(n ) = 2n and has depth d00 . Furthermore, CnF0 is represented by Boolean circuits with size bounded by polylog(n0 ) = nO(1) . Merging the two representations, gives us that the circuit family {Dn } we have just constructed O(1) has size 2n and depth d0 + d00 + O(1) . Furthermore, we have that {Dn } is (nO(1) , nO(1) )succinct. Next we prove a lemma where we trade-in the succinctness of the family {Dn }. This will give poly size nonuniform Boolean circuits that compute the coefficient of fm . The proof follows as the scaling-up to CH part of the proof of Lemma 5 of [JS11]. Lemma 6 (“First Compression Step”). There exists a Boolean circuit family {Dn00 } such that 1) |Dn00 | = nO(1) , 2) Dn00 has input gates so it can be given e ∈ {0, 1}n+1 , j ∈ {0, 1}4αn , and 3) Dn00 (e, j) outputs the jth bit of cn+1 (e). Proof. Let {Dn } be the circuit family provided by Lemma 5. The family {Dn } is (nO(1) , nO(1) )succinct. Let {Bn } be the corresponding family of Boolean circuits of size nO(1) , where Bn represents Dn . In this representation names of gates in Dn are nO(1) bits long. Let d000 = d0 + d00 + O(1) be the depth of Dn . For 0 ≤ i ≤ d000 , let Li be the language of tuples (G, 1n , e, j, b) for which it holds that • G is the name of a gate on level i in Dn . It outputs b when Dn is given input e and j. • e ∈ {0, 1}n+1 , j ∈ {0, 1}4αn . Claim 2. Under our assumptions, for each 0 ≤ i ≤ d00 , Li ∈ CH/poly. Proof. We will prove the claim by induction on i. We assume that Bn is given as advice. For technical convenience we can assume proper length of gates names in Dn (which is nO(1) ) are given as advice in unary. We can easily check that e and j are of length n + 1 and 4αn respectively. Similarly, we can discard any inputs where G does not have the proper length. Hence, for the below argument, let us consider the case where we have a well-formed inputs input (G, 1n , e, j, b). Consider the base case i = 0. One can easily verify which variable the gate G is labeled with. Namely, for a gate labeled with a variable x` , ` in binary is a substring of the name for the gate. Fetch the `th bit of (e, j) can easily be done. Gates labeled by Boolean constants are also trivial to deal with. To check whether G is on level 0 we can assume wlog. this information can be obtained

9

from the gate name, or add another level of querying. All of the above computation can be done within nO(1) time. For the induction hypothesis, assume that the claim hold for Li . By Tor´an’s [Tor91] characterization of the counting hierarchy, it is sufficient to show that Li+1 ∈ PPLi /poly. Given input (G, 1n , e, j, b), we assume the gate G is of majority type. Negation gates can dealt with by a similar argument. Let N be a NTM that on input (G, 1n , e, j, b) nondeterministically guesses the nO(1) size name of a gate H, uses the advice Bn to check that H → G is a wire in Dn . If this is not true, N will nondeterministically flip a bit b0 and accept if b0 = 1, reject if b0 = 0. Otherwise, N queries ?

(H, 1n , e, j, b) ∈ Li . N accept if the answer to this query is yes, and rejects otherwise. We have that N accepts precisely on the majority of its nondeterministic guesses iff the majority of the inputs to G are outputting b in Dn (e, j). We have show that Li+1 ∈ PPLi /poly. Since we are assuming that τ (pern ) = nO(1) , Lemma 2 gives that CH/poly = P/poly, and hence the lemma follows from the above claim. We now finish the proof using Valiant’s Criterion and the completeness of the permanent. Re4αn call the bit sizes of cn+1 (e) are bounded by  2 . Let cn+1 (e)i denote the ith bit of cn+1 (e). P2n+1 −1 P24αn −1 en+1 cn+1 (e)i 2i xe11 xe22 . . . xn+1 . By Lemma 6, there exists Boolean Write fn+1 = e=0 i=0 O(1) . D 00 (e, j) outputs the jth bit of c circuit family {Dn00 } such that |Dn00 | = n n+1 (e). Let  P2n+1 −1 P24αn −1n en+1 i1 i2 i4αn e1 e2 cn+1 (e)i yi1 yi2 . . . yi4αn x1 x2 . . . xn+1 gn+1 (x1 , . . . , xn+1 , y1 , . . . , y4αn ) = e=0 . Note i=0 0

1

4αn−1

that fn+1 = gn+1 (x1 , . . . , xn+1 , 22 , 22 , . . . , 22 ). We now apply ‘second compression step’, i where we leverage a collapse result for VNP0 , and use the fact that large powers like x2 are succinctly representable by size O(i) circuits computing repeated squaring. By Valiant’s Criterion 1 we have that {gn+1 } ∈ VNP0 . By Proposition 1, and since we are assuming that τ (pern ) = nO(1) , 0 := 2p(n) fn+1 , then τ (fn+1 ) = nO(1) , for some polynomial p(n), τ (2p(n) gn+1 ) = nO(1) . letting fn+1 since the needed powers of 2 can be computed by a constant-free arithmetic circuit of size O(n). 0 Observe that fn+1 is a nonzero multilinear polynomial in n + 1 = o(N ) variables that vanishes on 0 HN . Hence we have that τ (fn+1 ) ≥ N . Recall that N = bnγ c and that γ = ω(1). We have arrived at a contradiction. Using Lemma 4 the following corollary is immediate: Corollary 1. If Hypothesis 2 is true, then at least one of the following must hold: 1) τ (pern ) is not polynomially bounded, or 2) uniform TC0 is strictly contained in uniform NC2 . We note that Hypothesis 2 is satisfied if the hitting set Hn mentioned therein can be encoded by uniform TC0 circuits. Hence, our results also implies that if we can encode a hitting set against size n constant-free arithmetic computing multilinear polynomials in m ≤ n variables in uniform TC0 , then either 1) τ (pern ) is not polynomially bounded, or 2) uniform TC0 is strictly contained in uniform NC2 . The generalization to Circuits over F can be obtained by an easy adaption. Theorem 4. We assume (GRH). Hypothesis 1 implies that either {pern } 6∈ VPF , or ∀d00 , the 1/γ function F cannot be computed by size 2n TC0 of depth d0 that are (n1/γ , n1/γ )-succinct, (or both). Proof. We sketch an easy adaptation of the proof of Theorem 3. Suppose that Hypothesis 1, but that both of the conditions mentioned in the theorem are false. From the first item, we have 10

that {pern } ∈ VPF . By Corollary 1.2 in [B¨ ur00], this means #P/poly = FP/poly. Therefore, CH/poly = P/poly. We can now invoke the proof of Theorem 3 to define fn+1 in n + 1 that requires circuits of size nγ over F. We use the collapse CH/poly = P/poly at the end of the scalingup a argument. This yields that the coefficients of fn+1 are integers computable by Boolean circuits of polynomial size. By Valiant’s Criterion over F, this puts fn+1 ∈ VNPF . Since we assume that {pern } ∈ VPF , we get that VNPF = VPF . Hence we get polynomial size circuits for fn+1 over F, which is a contradiction. We immediately get the following (which in turn implies Theorem 2): Corollary 2. We assume (GRH). Hypothesis 1 implies that either VPF 6= VNPF , or uniform TC0 is strictly contained in uniform NC2 , (or both).

4

Hardness to Randomness: Proof of Theorem 1 0

0

Suppose9 that for all large enough n, LF (pern ) ≥ n3γ (n) . Let m = m(n) = bn1/γ c2 . One may verify the approach in Ref. [Vio04] for computing Nisan-Wigderson designs [NW94] can be adapted to obtain a set system S1 , S2 , . . . , Sn ⊂ [m4 ] satisfying: 1) For every i, |Si | = m, and 2) For every i < j, |Si ∩ Sj | ≤ log n. This system can be constructed by uniform AC0 circuits of size 2/γ 0

2O(n ) , if m in unary is constructible by such circuits. The latter is not an issue, as we only 0 0 require (nO(1/γ ) , nO(1/γ ) )-succinct circuits circuits computing the design. So we may assume that m in unary is hardcoded in the representation. This yields that we can compute the design by 0 0 O(1/γ 0 ) (nO(1/γ ) , nO(1/γ ) )-succinct AC0 circuits of size 2n . We think of [m4 ] as corresponding to the set of variables z1 , z2 , . . . , zm4 . For 1 ≤ i ≤ n, let pi = per√m (Si ), where the latter means taking √ the permanent of a matrix of order m with variables corresponding to Si (put in the matrix in some fixed order). Claim 3. For large enough n, for any nonzero multilinear polynomial f (x1 , . . . , xm ) with m ≤ n computed by a circuit of size n, f (p1 , . . . , pm ) 6≡ 0. Proof. Suppose the claim is false. Let i be the smallest number so that f 0 := f (p1 , . . . , pi , xi+1 , xi+2 , . . . , xm ) 6≡ 0, but f (p1 , . . . , pi , pi+1 , xi+2 , . . . , xm ) ≡ 0. Let g be a nonzero polynomial obtained by substitution of field constants in f 0 for all z-variables not in Si+1 and variables in {xi+2 , xi+3 , . . . , xm } (such field constants must exist). Note that g is of the form g1 ·xi+1 +g2 , where g1 , g2 are over z-variables corresponding to Si+1 . Since for i < j, |Si ∩ Sj | ≤ log n, after substitution, for each j 6= i + 1, pj can be computed by arithmetic circuits of size O(n). This means that g can be computed by arithmetic circuit of size O(n2 ), which in turn implies both g1 and g2 are computable by size O(n2 ) circuits. We have that g(xi+1 := pi+1 ) ≡ 0, so pi+1 = g2 /g1 . We assume that g1 has a constant term (in case this is not true one can deal with it by applying a 1 coordinate shift at ignorable loss), i.e. assume ξ := g1 (0) 6= 0. Write 1/g1 = 1ξ · 1−h . We have that h √ P m has O(n2 ) size circuits. Using formal power series, we get that pi+1 = 1ξ · g2 · i=0 hi . We conclude 0

that we have O(n2 ) + O(m) = O(n2 ) circuits for pi+1 . Translating back by setting n := N γ , we 0 0 get that LF (perN ) = O(N 2γ ). This is a contradiction, since we assumed LF (pern ) ≥ n3γ . 9

Wlog. we start with this assumption rather than LF (pern ) ≥ nγ constant c in the statement of Theorem 1.

11

0

(n)

, since we do not care about the value of the

Note that for f as in the claim, f (p1 , p2 , . . . , pm ) is a polynomial in m4 many zi , where the individual degrees are bounded by n. This implies there exists10 assignments to the variables 4 a ∈ [n + 1]m such that f (p1 , p2 , . . . , pm )(a) 6= 0. In other words, {(p1 (a|S1 ), p2 (a|S2 ), . . . , pn (a|Sn )) : 4 a ∈ [n + 1]m } is a hitting set against multilinear polynomials computed by size n circuits, where a|Si means restriction to indices in Si . By Using Ryser’s Formula for permanent, and using that both iterated addition (cf. [Vol99]) and iterated multiplication is in uniform TC0 [HAB01], we get 0 0 that per√m on inputs of 1 + blog nc bits can be computed by (nO(1/γ ) , nO(1/γ ) )-succinct TC0 of √ √ O(1/γ 0 ) size 2n . We use that for inputs in [n + 1], per√m has magnitude at most ( m)!(n + 1) m , √ √ √ which is representable by O( m log( m) + m log(n + 1)) = o(n) bits, so we can truncated the arithmetic to n bit numbers. Finally, we must put together the circuits computing the bit vectors of the design (S1 , S2 , . . . , Sn ) √ and circuits for computing permanent on order m matrices with n-bit entries. For this we use n copies of such a circuit Φi computing permanent. The input x to the circuit is given by m4 blocks of 1 + blog nc bits each . For each i, Φi must receive inputs selected from x according to the bit 0 0 vector Si . We can implement such a gadget in by (nO(1/γ ) , nO(1/γ ) )-succinct TC0 circuits. Namely, when dealing with Si , to determine the jth input to permanent (say left-to-right, top-to-bottom), 4 for each bit in b ∈ {0, 1}m , we compute the number of bits set to 1 in Si with lower index than b using the fact iterated addition can be computed in uniform TC0 , cf. [Vol99]. One next compares whether this count equals (hardcoded) j in binary. There is exactly one such bit b for which this is true together with b = 1. This signals that the corresponding 1 + blog nc-bit block of the input with the same index as b must be fed into the jth input of Φi . Since this is true for a unique b, 4 using masking we can do this for all b ∈ {0, 1}m in parallel and compute a bitwise ∨. This ends our description. We conclude Hypothesis 1 is satisfied with γ = γ 0 /c, for some constant c. We remark that due to 1ξ appearing in the above proof, there is an issue extending this to the constant-free setting. To resolve this, one can make the stronger assumption that for each 0 integer e 6= 0, τ (e · pern ) ≥ n3γ (n) . A straightforward analysis of bit sizes of constants we need in our construction yields that we may weaken this to assume this only for integers e with τ (e) = O(n2 log n). The latter is significant as it shows the construction does not crucially depend on artifacts related to representation of integers in the constant-free model.

References [ABO99] E. Allender, R. Beals, and M. Ogihara. The complexity of matrix rank and feasible systems of linear equations. Comp. Complexity, 8:99–126, 1999. [Agr05]

M. Agrawal. Proving lower bounds via pseudo-random generators. In Proc. 25th FSTTCS, pages 92–105, 2005.

[Alo99]

N. Alon. Combinatorial nullstellensatz. Combinatorics, Probability and Computing, 8(1– 2):7–29, 1999.

[BCS97] P. B¨ urgisser, M. Claussen, and M.A. Shokrollahi. Algebraic Complexity Theory. Springer Verlag, 1997. 10

e.g. use the combinatorial nullstellensatz by Alon [Alo99].

12

[Ber84]

S. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Inf. Proc. Lett., 18:147–150, 1984.

[B¨ ur00]

Peter B¨ urgisser. Cook’s versus Valiant’s hypothesis. TCS, 235:71–88, 2000.

[B¨ ur04]

P. B¨ urgisser. The complexity of factors of multivariate polynomials. Found. Comput. Math., 4(4):369–396, 2004.

[B¨ ur09]

P. B¨ urgisser. On defining integers and proving arithmetic circuit lower bounds. Computational Complexity, 18:81–103, 2009.

[DL78]

R. DeMillo and R. Lipton. A probabilistic remark on algebraic program testing. Inf. Proc. Lett., 7:193–195, 1978.

[HAB01] W. Hesse, E. Allender, and D.A.M. Barrington. Uniform constant-depth threshold circuits for division and iterated multiplication. JCSS, 64(4):695–716, 2001. [HS80]

J. Heintz and C.P. Schnorr. Testing polynomials which are easy to compute (extended abstract). In Proc. 12th STOC, pages 262–272, 1980.

[IM83]

O. Ibarra and S. Moran. Probabilistic algorithms for deciding equivalence of straight-line programs. J. Assn. Comp. Mach., 30:217–228, 1983.

[JS11]

M. Jansen and R. Santhanam. Permanent does not have succinct polynomial size arithmetic circuits of constant depth, To appear, ICALP 2011.

[Kal89]

Erich Kaltofen. Factorization of polynomials given by straight-line programs. In Randomness and Computation, pages 375–412. JAI Press, 1989.

[KI04]

V. Kabanets and R. Impagliazzo. Derandomizing polynomial identity testing means proving circuit lower bounds. Comp. Complexity, 13(1–2):1–44, 2004.

[Koi10]

P. Koiran. Shallow circuits with high powered inputs. In In proc. 2nd Symp. on Innovations in Computer Science, 2010.

[KP07]

P. Koiran and S. Perifel. Interpolation in valiant’s theory, 2007. To Appear.

[NW94] N. Nisan and A. Wigderson. Hardness versus randomness. J. Comp. Sys. Sci., 49:149–167, 1994. [Ruz81] W. Ruzzo. On uniform circuit complexity. JCSS, 22:365–383, 1981. [Sch80]

J.T. Schwartz. Fast probabilistic algorithms for polynomial identities. J. Assn. Comp. Mach., 27:701–717, 1980.

[Tor91]

J. Tor´ an. Complexity classes defined by counting quantifiers. J. Assn. Comp. Mach., 38(3):753–774, 1991.

[Vio04]

E. Viola. The complexity of constructing pseudorandom generators from hard functions. Computational Complexity, 13(3–4):147–188, 2004.

[Vol99]

H. Vollmer. Introduction to Circuit Complexity. Springer-Verlag, 1999. 13

[Wag86] K. Wagner. The complexity of combinatorial problems with succinct input representation. Acta Informatica, 23:325–356, 1986. [Zip79]

R. Zippel. Probabilistic algorithms for sparse polynomials. In Proc. ISSAM (EUROSAM ’79), volume 72 of Lect. Notes in Comp. Sci., pages 216–226.

14