Approximating the Little Grothendieck Problem over the Orthogonal Group Afonso S. Bandeira∗
Christopher Kennedy†
Amit Singer‡
arXiv:1308.5207v1 [cs.DS] 23 Aug 2013
August 26, 2013
Abstract The little Grothendieck problem (a special case of Boolean quadratic optimization) conP sists of maximizing C xi xj over binary variables xi ∈ {±1}, where C is a positive ij ij semidefinite matrix. In this paper we focus on a natural generalization of this problem, dn×dn the little Grothendieck problem over the orthogonal a positive C ∈ R P group. TGiven T semidefinite matrix, the objective is to maximize ij tr Cij Oi Oj restricting Oi to take values in the group of orthogonal matrices O(d), where Cij denotes the (ij)-th d × d block of C. We propose an approximation algorithm, to which we refer as Orthogonal-Cut, to solve the little Grothendieck problem over the group of orthogonal matrices O(d) and show a constant approximation ratio. Our method is based on semidefinite programming where the relaxation is inspired by the work of Goemans and Williamson in the context of the MaxCut problem. For a given d ≥ 1, we show a constant approximation ratio of αd2 , where αd is the expected average singular value of a d × d matrix with random Gaussian N 0, d1 i.i.d. entries. For d = 1 we recover the known α12 = 2/π approximation guarantee for the classical little Grothendieck problem. Orthogonal-Cut also serves as an approximation algorithm for several applications including the Procrustes problem where it improves over the best 1 previously known approximation ratio of 2√ . The little Grothendieck problem falls under 2 the larger class of problems approximated by an algorithm recently proposed in the context of the non-commutative Grothendieck inequality. Nonetheless, our approach is simpler and gives a better approximation ratio.
Keywords: Approximation algorithms, Procrustes problem, Semidefinite programming, Max-Cut. ∗
Program in Applied and Computational Mathematics (PACM), Princeton University, Princeton, New Jersey 08544, USA (
[email protected]). † Department of Mathematics, The University of Texas at Austin, Austin, Texas 78712, USA (
[email protected]). ‡ Department of Mathematics and PACM, Princeton University, Princeton, New Jersey 08544, USA (
[email protected]).
1
1
Introduction
The Grothendieck problem [AN04] in combinatorial optimization is written as max
n X n X
xi ,yi ∈{±1}
Cij xi yj ,
i=1 j=1
where C is a n × n real matrix. If C is a positive semidefinite matrix, then there is an optimal solution for which x = y. This special case is called the little Grothendieck problem and can be written as n X n X max Cij xi xj , (1) xi ∈{±1}
i=1 j=1
where C is a positive semidefinite matrix. In this paper we focus on a natural generalization of problem (1), the little Grothendieck problem over the orthogonal group, where the variables are now elements of the orthogonal group O(d), instead of {±1}. More precisely, given C ∈ Rdn×dn a positive semidefinite matrix, we consider the problem n X n X T max tr Cij Oi OjT , (2) O1 ,...,On ∈O(d)
i=1 j=1
where Cij denotes the (i, j)-th d × d block of C, and O(d) is the group of d × d orthogonal matrices (i.e., O ∈ O(d) iff OOT = OT O = Id×d ). As we will see in Section 3, several problems can be written in the form (2), such as the Procrustes problem [Sch66, Nem07, So11], Synchronization [BSS12, Sin11], and Global Registration [CKSC13]. Moreover, the approximation ratio we obtain for (2) translates into the same approximation ratio for these applications, improving over the best previously known approxi1 mation ratio of 2√ , given by [NRV13] for these problems. 2 Problem (2) belongs to a wider class of problems considered by Nemirovsi [Nem07] called QO-QC (Quadratic Optimization under Orthogonality Constraints), which itself is a subclass of QC-QP (Quadratically Constrainted Quadratic Programs). When d = 1, problem (2) reduces to problem (1), which is a special case of Boolean quadratic optimization [Nes98]. If furthermore C is a Laplacian then (2) reduces to the Max-Cut problem [GW95]. In a seminal paper, Goemans and Williamson [GW95] provide a semidefinite relaxation for the Max-Cut instances of (1) and showed that a simple rounding technique is guaranteed to produce a solution whose objective θ value is, in expectation, at least π2 min0≤θ≤π 1−cos θ ≈ 0.878 of the optimum. Nesterov [Nes98] 2 showed an approximation ratio of π for d = 1 and C 0 using the same relaxation as [GW95]. The work of Nesterov was extended [SZY07] to the complex plane (corresponding to the special orthogonal group SO(2)) with an approximation ratio of π4 for C 0. In fact, one of the main ideas used to show our main result is inspired by the techniques in [SZY07]. More recently, Naor et al. [NRV13] propose an efficient rounding for the non commutative Grothendieck inequality that provides an approximation algorithm for a vast set of problems involving orthogonality constraints, including problems of the form of (2). Although the little Grothendieck problem does not encode as many problems as the method in [NRV13] addresses, it does encode several important problems (see Section 3), and we show that it can be tackled using a simpler approach that also has a better approximation ratio than [NRV13] (see Section 1.1).
2
Using a simple generalization of the relaxation and rounding in [GW95], we show, for any d ≥ 1, a constant factor approximation of αd2 (where αd is the expected average singular value 1 of a d × d matrix with random Gaussian N 0, d i.i.d entries, see Definition 2) for a relaxation of problem (2). For d = 1 we recover the α12 = π2 result of Nesterov [Nes98]. Similarly to the Max-Cut relaxation we formulate a semidefinite program where the variables are vectorized versions of the original variables. We call this relaxation the Orthogonal-Cut SDP 1 : max
n X n X
Ui UiT =Id×d i=1 j=1 Ui ∈Rd×dn
T tr Cij Ui UjT .
(3)
As problem (3) is equivalent to the Semidefinite program max
Gii =Id×d G∈Rdn×dn G0
tr(CG),
(4)
and can be solved, to arbitrary precision, in polynomial time [VB96]. The main contribution of this paper is showing that Algorithm 3 (see Section 2) gives a constant factor approximation to (2). Theorem 1 Let C 0. Let V1 , . . . , Vn ∈ O(d) be the (random) output of Algorithm 3. Then n X n n X n X X T T max tr Cij Oi OjT , E tr Cij Vi VjT ≥ αd2 i=1 j=1
O1 ,...,On ∈O(d)
i=1 j=1
where αd is the constant defined below. Definition 2 Let S ∈ Rd×d be a gaussian random matrix with i.i.d entries N 0, d−1 . We define d X 1 αd := E σj (S) , d j=1
where σj (S) is the jth singular value of S. Although we do not have a complete understanding of the behavior of αd as a function of d, we can, for each d separately, compute a closed form expression (see Section 5). One 8 2 can also show that limd→∞ αd2 = 3π > π2 . Our computations strongly suggest that αd2 is monotonically increasing although we were unable to provide a proof for this fact (Conjecture 5). 1 Nevertheless, we can show that αd2 is uniformly bounded below by a constant larger than 2√ , 2 the approximation ratio provided in the approach based on the non commutative Grothendieck Inequality [NRV13]. 1
The name was inspired by the term “PhaseCut” used in [WdM12] in the context of a C version of Max-Cut
3
1.1
Relation to non-commutative Grothendieck inequality
The approximation algorithm proposed in [NRV13] can also be used to approximate problem (2). In fact, the method in [NRV13] deals with problems of the form X sup Mpqkl Xpq Ykl , (5) X,Y ∈O(N ) pqkl
where M is a N × N × N × N 4-tensor. Problem (2) can be encoded in the form of (5) by taking N = dn and making, for each i, j, the d × d block of M , obtained by having the first two indices range from (i − 1)d + 1 to id and the last two from (j −1)d+1 to jd, equal to Cij and the rest of the tensor equal to zero [NRV13]. Note that since C is positive semidefinite problem (2) is equivalent to its bipartite counterpart. In order to describe the relaxation one needs to first define the space of vector-valued orTX = I T thogonal matrices O(N ; m) = {X ∈ RN ×N ×m : XX TP= XP N ×N } where XX and N m T T T X X are N × N matrices defined as XX pq = k=1 r=1 Xpkr Xqkr and X X pq = PN Pm k=1 r=1 Xkpr Xkqr . The relaxation proposed in [NRV13] is given by X Mpqkl Upq Vkl , (6) sup sup m∈N U,V ∈O(N ;m) pqkl 1 and there exists a rounding procedure [NRV13] that achieves an approximation ratio of 2√ , 2 2 which is smaller than αd for all d ≥ 1 (see Section 5). Note also that to approximate (2) with this approach one needs to have N = dn in (6).
2
Algorithm
We now present the (randomized) approximation algorithm that we propose to solve (2). Algorithm 3 Compute G, a solution of the semidefinite program (4). Since G 0, its Cholesky decomposition can be written as G = U U T . Let us write U as −U1 − −U2 − U = . ∈ Rnd×nd , Ui ∈ Rd×nd . .. −Un − Let R ∈ Rnd×d be a gaussian random matrix whose entries are i.i.d N (0, d1 ). The candidate solution for (2) is now computed as Vi = P(Ui R), where P(X) = argminY ∈O(d) kY − XkF , for any X ∈ Rd×d . Note that the semidefinite program (4) has a positive semidefinite matrix variable of size dn × dn and d2 n linear constraints. It can be solved, to arbitrary precision, in polynomial time [VB96] and the Cholesky decomposition of the solution produces a solution to problem (3). In fact, the semidefinite program (4) has a very similar structure to the classical MaxCut SDP. 4
This potentially allows one to adapt specific methods designed to solve the MaxCut SDP such as, for example, the Row-by-row method [WGS12]. Also, given X ∈ Rd×d , P(X) can be easily computed via the Singular Value Decomposition of X = U ΣV T as P(X) = U V T (see [FH55, Kel75, Hig86]).
3
Applications
Problem (2) can describe several problems of interest. As examples, we describe below how it encodes the orthogonal Procrustes problem, Global Registration over Euclidean Transforms, and the Synchronization problem.
3.1
Orthogonal Procrustes
Given n point clouds in Rd of k points each, the orthogonal Procrustes problem [Sch66] consists of finding n orthogonal transformations that best simultaneously align the point clouds. If the points are represented as the columns of matrices A1 , . . . , An , where Ai ∈ Rd×k then the orthogonal Procrustes problem consists of solving n X min ||OiT Ai − OjT Aj ||2F . (7) O1 ,...,On ∈O(d)
i,j=1
Since ||OiT Ai − OjT Aj ||2F = kAi k2F + kAj k2F − 2 tr (Ai ATj )T Oi OjT , (7) has the same solution as n X
max
O1 ,...,On ∈O(d)
tr (Ai ATj )T Oi OjT .
(8)
i,j=1
Since C ∈ Rdn×dn given by Cij = Ai ATj is positive semidefinite, problem (8) is encoded by (2) and Algorithm 3 provides a solution with an approximation ratio guaranteed (Theorem 1) to be at least αd2 . As discussed above, Naor et al. [NRV13] recently proposed an approximation algorithm for a wide class of problems that includes problem (8), providing for it an approximation ratio of 1 1 √ . We show in Section 5 that our approximation ratio of αd2 is larger than 2√ for all d ≥ 1. 2 2 2 Also, our approach is considerably simpler than the one in [NRV13] (see Section 1.1 for more details). Nemirovski [Nem07] proposed a different semidefinite relaxation (with a variable matrix of size d2 n × d2 n instead of dn × dn as in the Orthogonal-Cut) for the orthogonal Procrustes problem. In fact, his algorithm approximates the slightly different problem X max tr (Ai ATj )T Oi OjT , (9) O1 ,...,On ∈O(d)
i6=j
which is an additive constant (independent of O1 , . . . , On ) smaller than (8). The best known ap1 proximation ratio for this semidefinite relaxation, due to So [So11], is O log(n+k+d) . Although an approximation to (9) would technically be stronger than an approximation to (8), the two quantities are essentially the same provided that the point clouds are indeed perturbations of orthogonal transformations of the same original point cloud, which is the case in most applications (see [NRV13] for a more thorough discussion on the differences between formulations (8) and (9)). 5
3.2
Global Registration over Euclidean Transforms
The problem of global registration over Euclidean rigid motions is an extension of the generalized Procrustes problem. In global registration, one is required to estimate the positions x1 , . . . , xk of k points in Rd and the unknown rigid transforms of n local coordinate systems given (perhaps noisy) measurements of the local coordinates of each point in some (though not necessarily all) of the local coordinate systems. The problem differs from Procrustes in two aspects: First, for each local coordinate system, we need to estimate not only an orthogonal transformation but also a translation in Rd ; Second, each point may appear in only a subset of the coordinate systems. Despite those differences, it is shown in [CKSC13] that global registration can also be reduced to the form (2) with a matrix C that is positive semidefinite.
3.3
Synchronization
The Synchronization problem [BSS12, Sin11] consists of estimating orthogonal transformations Oi ∈ O(d) from (potentially noisy) pairwise ratio measurements ρij = Oi OjT for some pairs (i, j) that are represented as the directed edge set E of a graph. One attempts to find orthogonal matrices Oi that best match the edge measurements by solving X min kOi − ρij Oj k2F . (10) (i,j)∈E 2 2 2 Similarly to the problem, since kOi − ρij Oj kF = kOi kF + kρij Oj kF − orthogonal Procrustes 2 tr ρTij Oi OjT = 2d − 2 tr ρTij Oi OjT , the solution to (10) is the same as the solution to
max
X
tr ρTij Oi OjT .
(11)
(i,j)∈E
If C ∈ Rdn×dn , whose (i, j)-block is given by Cij = ρij if (i, j) ∈ E and Cij = 0 otherwise, is positive semidefinite 2 then (11) is of the form of (2) and Theorem 1 guarantees that Algorithm 3 gives an approximation ratio of αd2 to problem (11). An approximation ratio of π4 is known [SZY07] for the case where the transformations are in SO(2) and the matrix is positive semidefinite. When the noise in the pairwise measurements is stochastic, both the semidefinite relaxation corresponding to Orthogonal-Cut and a simple spectral relaxation [BSS12] are known to perform well [DJ13, Sin11, WS12].
4
Proof of the Main Result
In this Section we prove our main result, Theorem 1. As (3) is a relaxation of problem (2) its maximum is necessarily at least as large as the one of (2). This means that Theorem 1 is a direct consequence of the following Theorem. 2
One can always make C positive semidefinite by making ρii as large as needed, which will not affect the solution of (11).
6
Theorem 4 Let C 0. Let U1 , . . . , Un be a feasible solution to (3). Let V1 , . . . , Vn ∈ O(d) be the output of the (random) rounding procedure described in Algorithm 3. Then n X n n X n X X T T 2 T E tr Cij Vi Vj ≥ αd tr Cij Ui UjT , i=1 j=1
i=1 j=1
where αd is the constant in Definition 2. Proof. Let R ∈ Rnd×d be a gaussian random matrix with i.i.d entries N 0, d1 . We want to lower bound n X n n X n X X T T E tr Cij Vi VjT = E tr Cij P(Ui R)P(Uj R)T . i=1 j=1
i=1 j=1
One of the main ingredients of the proof is Lemma 8 which states that, for any B ∈ Rd×d and M, N ∈ Rd×dn such that M M T = N N T = Id×d , we have E tr(BP(M R)(N R)T ) = αd tr(BM N T ). Define S ∈ Rdn×dn such that the (i, j)-th block is given by Sij = Ui R − αd−1 P(Ui R)
Uj R − αd−1 P(Uj R)
T
.
We have ESij
= E Ui R(Uj R)T − αd−1 P(Ui R)(Uj R)T − αd−1 Ui RP(Uj R)T + αd−2 P(Ui R)P(Uj R)T = Ui E RRT UjT − αd−1 E Ui RP(Uj R)T − αd−1 E P(Ui R)(Uj R)T + αd−2 E Vi VjT = Ui UjT − 2Ui UjT + αd−2 E Vi VjT = αd−2 E Vi VjT − Ui UjT .
By construction S 0. Since C 0, tr(CS) ≥ 0 which means 0 ≤ E [tr (CS)] = tr (CE[S]) =
n X n X
T tr Cij αd−2 E Vi VjT − Ui UjT .
i=1 j=1
Thus, n X n n X n X X T T E tr Cij Vi VjT ≥ αd2 tr Cij Ui UjT . i=1 j=1
i=1 j=1
5
The approximation ratio αd2
Theorem 1 states that the approximation ratio for Algorithm 3 is given by αd2 where αd is the average singular value of a d × d Gaussian matrix S with i.i.d N (0, d1 ) entries . These singular 7
values correspond to the square root of the eigenvalues of a Wishart matrix W = SS T , which are well-studied objects (see [She01]). In fact, their joint probability distribution is known to Pd Q −1/2 Q be given by pd (λ1 , ..., λd ) = Cd e− i=1 λi /2 di=1 λi i<j |λi − λj | dλ1 · · · dλd , where Cd is a normalization constant (so that ithe probability integrates to 1). h Pd 1/2 −1 Since αd = E d , we can write i=1 λi Z αd =
d−1
Rd+
d X
! 1/2
λi
pd (λ1 , . . . , λd )dλ1 · · · dλd .
(12)
i=1
For d = 1, the singular value is simply the absolute value of a standard gaussian random variable. Thus r Z ∞ Z ∞ −x2 /2 1 −x2 /2 xe 2 √ |x| √ e α1 = dx = 2 dx = . π 2π 2π −∞ 0 This means that, for d = 1, we obtain an approximation ratio of α12 = π2 ≈ 0.63662, which matches the result of Nesterov [Nes98]. It is also easy to evaluate the limit limd→∞ αd . In fact, the distribution of the d eigenvalues of the Wishart matrix we are interested in, as d → ∞, converges in probability to the Marchenko Pastur distribution [She01] with density mp(x) =
1 p x(4 − x)χ[0,4] . 2πx
This means that the average of the singular values (which corresponds to the average of the square root of the eigenvalues of the Wishart matrix) converges to Z 4 √ 1 p 8 x x(4 − x)dx = , 2πx 3π 0 which implies that 8 . d→∞ 3π For any d, the integral (12) can be written analytically in terms of integrals involving Laguerre polynomials (we omit the details and formulas due to their length, but we direct the reader to [GL11] for the tools to obtain computable expressions to integrals of the form of (12)). These integrals can then be computed, for each d separately, using Mathematica. In fact, we computed the values of αd for all d ≤ 44 (See table below and Figure 1). lim αd =
d 1 2 3 ∞
αd q
2 π √ 2 2−1 √ π 4 √ 2 √ 2+3π 6 3π 8 3π
αd ≈
αd2 ≈
0.7979
0.6366
0.8102 0.8188 0.8488
0.6564 0.6704 0.7205
Recall that the approximation ratio obtained in the context of the non commutative Grothendieck 1 inequality [NRV13] is 2√ ≈ 0.3536, which is smaller than any of the computed values of αd2 . 2 Our calculations suggest the monotonicity of αd , however we were not able to provide a proof for this fact and leave it as a Conjecture. 8
Conjecture 5 Let αd be the expected average singular value of d × d matrix with random Gaussian N 0, d1 i.i.d entries (see Definition 2). Then, for any d ≥ 1, αd+1 ≥ αd . Motivated by doing comparisons between our approximation ratio, αd2 and obtained in [NRV13]) we show a lower bound for αd .
1 √ 2 2
(the one
Lemma 6 For d ≥ 1, let αd be defined as in Definition 2, and let χk be the χ distribution with k degrees of freedom (defined to be the square root of the χ2k distribution), then αd ≥
d 1 X
d3/2
Eχk
k=1
Proof. Let S be a d × d Gaussian matrix with i.i.d entries N 0, d1 . Let S = U ΣV T be the SVD decomposition of S; it is easy to verify that maxO∈O(d) tr(OΣ) = tr(Σ). This means that max tr(OS) = max tr(OU ΣV T ) = tr(Σ). O∈O(d)
(13)
O∈O(d)
Let S1 , . . . , Sn denote the columns of S and Sk0 denote the projection of the k-th column to the orthogonal complement of S1 , . . . , Sk−1 . Consider the orthogonal transformation O that sends Sk0 to the k-th element of the canonical basis. It is easy to see that tr(OS) =
n X
kSk0 k,
k=1
which together with (13) implies gives
Pn
0 k=1 kSk k
≤ tr(Σ). Taking expectation on both sides, then
n h √ i X E k dSk0 k ≤ d3/2 αd . k=1
√ Since dSk are Gaussian vectors in Rd with i.i.d entries N (0, 1), and Sk0 are projections onto a k-dimensional subspace (chosen independently of Sk ) we have kSk0 k ∼ χk which implies n X
Eχk ≤ d3/2 αd .
k=1
Lemmas 6 and 11 imply that, for all d > 40, αd2 ≥ verification for all d ≤ 40 (see Figure 1) gives
1 √ , 2 2
which together with computer
1 αd2 > √ . 2 2 This confirms that the Orthogonal-Cut approximation ratio is larger than the one in [NRV13] for all d ≥ 1.
9
Figure 1: Plot showing the computed values of αd2 , for d ≤ 44, the limit of αd2 as d → ∞, the 1 lower bound for αd2 given by Lemma 11, and the approximation ratio of 2√ obtained in [NRV13]. 2
6
Open Problems and Future Work
Besides Conjecture 5, there are several extensions of this work that the authors consider to be interesting directions for future work. A natural extension is to consider the little Grothendieck problem (2) over other groups of matrices. One example is the group U (d) of unitary matrices in Cd×d . Foran Hermitian P P ∗ U U ∗ . Since U (1) PSD matrix C it can be formulated as maxU1 ,...,Un ∈U (d) ni=1 nj=1 tr Cij i j is isomorphic to SO(2) the case d = 1 corresponds to PhaseCut (see [WdM12]), for which a simple relaxation is shown to have an approximation ratio of π4 [SZY07] (the unitary group case is also considered in [NRV13]). Another interesting extension would be to consider the special orthogonal group SO(d) and the special unitary group SU (d), these seem more difficult since they are not described by quadratic constraints. 3 In some applications, like Synchronization, the positive semidefiniteness condition is not natural. It would be useful to better understand approximation algorithms for a version of (2) where C is not assumed to be positive semidefinite. Previous work in the special case d = 1 [NRT99, CW04, AMMN05] and for the SO(2) case [SZY07] suggest that it is possible to obtain an approximation ratio for (2) depending logarithmically on the size of the problem. It would also be interesting to understand whether the technics in [AN04] can be adapted to obtain an approximation algorithm to the bipartite Grothendieck problem over the orthogonal group, this would be closer in spirit to the non commutative Grothendieck inequality [NRV13]. The common-lines problem arising in molecular structure determination from cryo-electron microscopy [SS11], has a very similar formulation to (2) with the distinction that the variables Oi are 2 × 3 matrices satisfying Oi OiT = I2×2 . This problem requires a different rounding of the solution of the SDP into a rank 3 matrix rather than a rank 2 matrix. In the context of the 3
The additional constraint that forces a matrix to be in the special orthogonal or unitary group is having determinant equal to 1 which is not quadratic.
10
classical Grothendieck inequality there has been work on how this can be exploited in order to improve the approximation ratio [BFV12].
Acknowledgments A. S. Bandeira thanks Moses Charikar, Alexander Iriza, Dustin Mixon, and Zhizhen Zhao for insightful discussions on the topic of this paper. A. S. Bandeira was supported by AFOSR Grant No. FA9550-12-1-0317. A. Singer was partially supported by Award Number FA9550-12-1-0317 and FA9550-13-1-0076 from AFOSR, by Award Number R01GM090200 from the NIGMS, and by Award Number LTR DTD 06-05-2012 from the Simons Foundation. Parts of this work have appeared in C. Kennedy’s senior thesis at Princeton University.
References [AMMN05] N. Alon, K. Makarychev, Y. Makarychev, and A. Naor. Quadratic forms on graphs. Invent. Math, 163:486–493, 2005. [AN04]
N. Alon and A. Naor. Approximating the cut-norm via Grothendiecks inequality. In Proc. of the 36 th ACM STOC, pages 72–80. ACM Press, 2004.
[AS64]
M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964.
[BFV12]
J. Briet, F. M O. Filho, and F. Vallentin. Grothendieck inequalities for semidefinite programs with rank constraint. Available Online at arXiv:1011.1754v2 [math.OC], 2012.
[BSS12]
A. S. Bandeira, A. Singer, and D. Spielman. A Cheeger inequality for the graph connection laplacian. Available online at arXiv:1204.3873 [math.SP], 2012.
[CKSC13]
K. N. Chaudhury, Y. Khoo, A. Singer, and D. Cowburn. Global registration of multiple point clouds using semidefinite programming. arXiv:1306.5226 [cs.CV], 2013.
[CW04]
M. Charikar and A. Wirth. Maximizing quadratic programs: Extending grothendieck’s inequality. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’04, pages 54–60, Washington, DC, USA, 2004. IEEE Computer Society.
[DJ13]
L. Demanet and V. Jugnon. Convex recovery from interferometric measurements. Available online at arXiv:1307.6864 [math.NA], 2013.
[FH55]
K. Fan and A. J. Hoffman. Some metric inequalities in the space of matrices. Proceedings of the American Mathematical Society, 6(1):pp. 111–116, 1955.
[GL11]
P. Vivo G. Livan. Moments of Wishart-Laguerre and Jacobi ensembles of random matrices: application to the quantum transport problem in chaotic cavities. Acta Physica Polonica B, 42:1081, 2011.
11
[GW95]
M. X. Goemans and D. P. Williamson. Improved apprximation algorithms for maximum cut and satisfiability problems using semidefine programming. Journal of the Association for Computing Machinery, 42:1115–1145, 1995.
[Hig86]
N. J. Higham. Computing the polar decomposition – with applications. SIAM J. Sci. Stat. Comput., 7:1160–1174, October 1986.
[Kel75]
J. B. Keller. Closest unitary, orthogonal and hermitian operators to a given operator. Mathematics Magazine, 48(4):pp. 192–197, 1975.
[Nem07]
A. Nemirovski. Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program., 109(2-3):283–317, 2007.
[Nes98]
Y. Nesterov. Semidefinite relaxation and nonconvex quadratic optimization. Optimization Methods and Software, 9(1-3):141–160, 1998.
[NRT99]
A. Nemirovski, C. Roos, and T. Terlaky. On maximization of quadratic form over intersection of ellipsoids with common center. Mathematical Programming, 86(3):463– 473, 1999.
[NRV13]
A. Naor, O. Regev, and T. Vidick. Efficient rounding for the noncommutative grothendieck inequality. In Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, STOC ’13, pages 71–80, New York, NY, USA, 2013. ACM.
[Rob55]
H. Robbins. A remark on stirling’s formula. The American Mathematical Monthly, 62(1):pp. 26–29, 1955.
[Sch66]
P. H. Schonemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
[She01]
J. Shen. On the singular values of gaussian random matrices. Linear Algebra and its Applications, 326(13):1 – 14, 2001.
[Sin11]
A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal., 30(1):20 – 36, 2011.
[So11]
A.-C. So. Moment inequalities for sums of random matrices and their applications in optimization. Mathematical Programming, 130(1):125–151, 2011.
[SS11]
A. Singer and Y. Shkolnisky. Three-dimensional structure determination from common lines in cryo-em by eigenvectors and semidefinite programming. SIAM J. Imaging Sciences, 4(2):543–572, 2011.
[SZY07]
A. So, J. Zhang, and Y. Ye. On approximating complex quadratic optimization problems via semidefinite programming relaxations. Math. Program. Ser. B, 2007.
[TV04]
Antonio M. Tulino and Sergio Verd´ u. Random matrix theory and wireless communications. Commun. Inf. Theory, 1(1):1–182, June 2004.
[VB96]
L. Vanderberghe and S. Boyd. Semidefinite programming. SIAM Review, 38:49–95, 1996. 12
[WdM12]
I. Waldspurger, A. d’Aspremont, and S. Mallat. Phase recovery, maxcut and complex semidefinite programming. arXiv:1206.0102, 2012.
[WGS12]
Z. Wen, D. Goldfarb, and K. Scheinberg. Block coordinate descent methods for semidefinite programming. In Handbook on Semidefinite, Conic and Polynomial Optimization, volume 166 of International Series in Operations Research & Management Science, pages 533–564. Springer US, 2012.
[WS12]
L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA, Accepted for publication. Available online at arXiv:1211.2441v4 [cs.IT], 2012.
A
Technical proofs
Lemma 7 Let S be a d × d random matrix with i.i.d N 0, d1 entries and let αd as defined in Definition 2. Then E P(S)S T = αd Id×d . Proof. Let S = U ΣV T be the singular value decomposition. Since SS T = U Σ2 U T is a Wishart matrix, its eigenvalues and eigenvectors are independent and U is distributed according to the Haar measure in O(d) [TV04]. To resolve ambiguities, we consider Σ ordered such that Σ11 ≥ Σ22 ≥ ... ≥ Σnn and the columns of U have random sign. Let Y = P(S)S T . Since P(S) = P(U ΣV T ) = U V T , we have Y = P(U ΣV T )(U ΣV T )T = U V T V ΣU T = U ΣU T . Since Yij = ui ΣuTj where u1 , . . . , ud are the rows of U and U is distributed according to the Haar measure, we have uj ∼ −uj conditioned in any ui and Σ. This implies that, if i 6= j, Yij = ui ΣuTj is a symmetric random variable, and so EYij = 0. Also, ui ∼ uj implies that Yii ∼ Yjj . This means that EY = cId×d for some constant c. To obtain c, 1 1 1 1 1 c = c tr(Id×d ) = E tr(Y ) = E tr(U ΣU T ) = E tr(U T U Σ) = E tr(Σ) = αd , d d d d d which shows the Lemma.
Lemma 8 Let C ∈ Rd×d and M, N ∈ Rd×nd such thatM M T = N N T = Id×d . Let R ∈ Rnd×d be a Gaussian random matrix with i.i.d entries N 0, d1 . Then E tr(CP(M R)(N R)T ) = αd tr(CM N T ), where αd is the constant in Definition 2. Proof. Let A = M T N T ∈ Rdn×2d and A = QB be the QR decomposition of A with Q ∈ Rnd×nd an orthogonal matrix and B ∈ Rnd×2d upper triangular with non-negative diagonal entries; note that only the first 2d rows of B are nonzero. We can write B11 B12 0d B22 0d QT A = B = 0d ∈ Rdn×2d , .. .. . . 0d 0d 13
where B11 ∈ Rd×d and B22 ∈ Rd×d are upper triangular matrices with non-negative diagonal entries. Since (QT M T )T11 (QT M T )11 = (QT M T )T (QT M T ) = M QQT M T = M Ind×nd M T = M M T = Id×d , B11 = (QT M T )11 is an orthogonal matrix, which together with the non-negativity of the diagonal entries (and the fact that B11 is upper-triangular) forces B11 = Id×d . Since R has i.i.d gaussian entries and Q is an orthogonal matrix, QR ∼ R which implies E tr(CP(M R)(N R)T ) = E tr(CP(M QR)(N QR)T . T ,0 T T Since M Q = [B11 d×d , · · · , 0d×d ] = [Id×d , 0d×d , · · · , 0d×d ] and N Q = [B12 , B22 , 0d×d , · · · , 0d×d ], T T E tr(CP(M R)(N R)T ) = tr CE P(R1 )(B12 R1 + B22 R2 )T ,
where R1 and R2 are the first two d × d blocks of R. Since these blocks are independent, the second term vanishes and we have E tr(CP(M R)(N R)T ) = tr CE P(R1 )R1T B12 . The Lemma now follows from using Lemma 7 to obtain E P(R1 )R1T = αd Id×d and noting that B12 = (QT M T )T (QT N T ) = M N T . Lemma 9 Let d ≥ 2 and χk be the χ distribution with k degrees of freedom, then d 1 X
d3/2
k=1
2 Eχk ≥ 3
3 1− d
3/2 .
Before proving Lemma 9 we need the following Proposition. Proposition 10 For every j ≥ 0, Eχ2j+1 ≥
p 2j,
where χk is the χ distribution with k degrees of freedom. Proof. The expected values of a χ-distribution is given in terms of the Gamma function as (2j+1)+1 √ Γ √ Γ (j + 1) 2 = 2 . Eχ2j+1 = 2 Γ j + 12 Γ (2j+1) 2 Since Γ j +
1 2
=
(2j)! √ π 4j j!
(see [AS64]), we have Eχ2j+1 =
√
j! 2 (2j)! √ = π 4j j!
r
2 (j!)2 4j . π (2j)!
Recall Stirling’s formula [Rob55] which states that, for an integer n, n! =
√
1
2πnn+ 2 e−n ern , 14
(14)
where
1 12n+1
1 12n .
≤ rn ≤ r
Eχ2j+1 = Since 2rj − r2j ≥
Plugging this into (14), we get √
1
2πj j+ 2 e−j erj
2
2 j = 4 √ 1 π 2π(2j)2j+ 2 e−2j er2j
2 12j+1
−
1 24j
r
p 1 2rj −r2j 4j 2√ e 2 2π = 2je2rj −r2j . 1 j π 22j+ 2
> 0, we get Eχ2j+1 ≥
p 2j.
Proof. [of Lemma 9] For k ≥ 1, let βk =
2k X
Eχi .
i=1
We have βk =
k−1 X
Eχ2j+1 + Eχ2j+2 ≥ 2
j=0
Since
√
k−1 X
Eχ2j+1
k−1 p √ X ≥2 2 j.
j=0
j=0
x is an increasing function, Z k−1 p k−1 p X X j= j≥ j=0
k−1 √
0
j=1
tdt =
2 (k − 1)3/2 . 3
If d = 2k + 1, d 1 X
d3/2
2k X √ 2 1 2 d − 3 3/2 1 3/2 Eχk ≥ 2 2 (k − 1) ≥ , Eχi ≥ 3 (2k + 1)3/2 3 d (2k + 1)3/2 k=1 i=1
and if d = 2k, d 1 X
d3/2
i=1
Eχi ≥
2k X √ 1 2 d − 2 3/2 2 1 3/2 Eχ ≥ 2 2 (k − 1) ≥ . k 3 (2k + 1)3/2 3 d+1 (2k + 1)3/2 k=1
This means that, for any d ≥ 2, d 1 X
d3/2
i=1
2 Eχi ≥ 3
3 3/2 1− . d
Lemma 9 together with the fact that, for d > 40 we have
2 3
1−
2 3 3/2 d
≥
1 √ , 2 2
Lemma 11 Let d > 40 and χk be the χ distribution with k degrees of freedom, then s d 1 X 1 √ . Eχk ≥ 3/2 d 2 2 k=1
15
gives: