Invertibility of symmetric random matrices Roman Vershynin University of Michigan
Workshop on Random Matrices Bonn, May 30, 2012
The Invertibility Problem for random matrices For an n × n random matrix H with a given distribution: 1
What is the singularity probability P{H is singular}?
2
What is the typical value of the spectral norm of the inverse, kH −1 k?
Part 2 is equivalent to estimating the smallest singular value smin (H) = 1/kH −1 k. smin (H) is the smallest number s such that kHxk2 ≥ s · kxk2
for all x.
The Invertibility Problem for Random Matrices
The invertibility problem has been studied for several distributions of H: general Ginibre ensembles: all entries of H are iid zero mean general Wigner ensembles: H is symmetric, above-diagonal entries are iid zero mean (this talk) general Wishart ensembles: H = XX T , where X is a rectangular random matrix with iid zero mean entries unitary perturbations: H = D + U, where D is fixed (deterministic) and U ∈ U(n) is random uniformly distributed. (Mark Rudelson’s talk) etc.
Invertibility of Ginibre matrices 1940–2010+: Goldstine-von Neumann, Smale, Edelman, Szarek, Komlos, Kahn-Komlos-Szemeredi, Tao-Vu, Rudelson-V, Bourgain-Vu-Wood . . . [Rudelson-V ’08]: if H has subgaussian entries, then 1 2
P{H is singular} ≤ c n where c ∈ (0, 1) is a constant; √ √ kH −1 k ∼ n with high probability. Equivalently, smin (H) ∼ 1/ n.
More precisely,
n √ o P smin (H) ≤ ε/ n ≤ C ε + c n ,
ε > 0.
√ Much simpler is to see that kHk ∼ n. Then the result above complies with the heuristic: “the average gap between n singular values is √ √ ∼ n/n ∼ 1/ n. ”
Invertibility of Wigner matrices 2006–2011: Costello-Tao-Vu, Erd¨os-Schlein-Yau, Tao-Vu, V., Nguyen, . . .
Theorem (V’11) Let H be a symmetric random matrix whose above-diagonal entries are iid rv’s with mean zero, unit variance, and subgaussian. Then for every z ∈ R, the eigenvalues λk (H) satisfy: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ), ε ≥ 0. k
In terms of the invertibility problem, this yields: 1
2
P{H is singular} ≤ exp(−nc ). Previously known: . n−1/8 for symmetric Bernoulli matrix [Costello-Tao-Vu’10] √ √ kH −1 k ∼ n with high probability. Equivalently, λmin (H) ∼ 1/ n.
Invertibility of Wigner matrices n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k
Related results: [Erd¨os-Schlein-Yau’10] For continuous distributions, for z in the bulk: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε. k
[Nguyen’11] - independent, simultaneous: ∀B > 0 ∃A > 0: n o P min |λk (H) − z| ≤ n−A ≤ n−B . k
[Tao-Vu’11]: Universality. If the first few (3 or 4) moments of the entries of H and G match, then n n √ o √ o P min |λk (H)| ≤ ε/ n = P min |λk (G )| ≤ (ε±n−c )/ n ±Cn−c . k
k
Universality allows one to transfers EST result to discrete distributions, at the cost of polynomial errors O(n−c ) in the magnitude of λk and in probability.
Proof
n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k
For simplicity, assume z = 0. Variational characterization: min |λk (H)| = k
inf kHxk.
x∈S n−1
So we need, with high probability, a uniform lower bound √ inf kHxk & 1/ n. x∈S n−1
This is a geometric problem.
Proof. Step 1: Decomposition of the sphere √ Problem: inf x∈S n−1 kHxk & 1/ n ? General architecture of proof, [Rudelson-V ’08]: 1. Decompose S n−1 into compressible and incompressible vectors: S n−1 = Comp ∪ Incomp. A vector is compressible if 99% of its energy (`2 norm) is supported by 0.01n coordinates. Incompressible vectors are the rest of the sphere. Incompressible ≈ “delocalized”. 2. Prove the lower bound (invertibility) for Comp and Incomp separately.
Proof. Step 2: Compressible vectors
√ Problem: inf x∈Comp kHxk & 1/ n ?
Compressible vectors are simpler to control, as there are not too many of them. The metric entropy of Comp is small: there exists an δ-net of Comp of cardinality (C /δ)0.1n . Union bound + approximation argument reduce the problem to a lower bound for a single vector x.
Proof. Step 2: Compressible vectors √ Problem: inf x∈Comp kHxk & 1/ n ? Decompose H into
n 2
×
n 2
minors: D G H= , GT E
u x= . z
G has independent entries. Condition on D, E and write 2
2
kHxk ≥ kDu + G zk =
n/2 X
2 di + hGi , zi
i=1
where Gi are the rows of G , and di are some fixed numbers. Thus kAxk22 is a sum of n/2 independent random variables. A deviation inequality gives P{kHxk2 ≤ εn} ≤ (C /ε)n/2 . Combining with a union bound, we get a (too) strong conclusion: √ inf kHxk & n with probability 1 − c n . x∈Comp
Intermission: delocalization of eigenvectors As a by-product, we obtain a delocalization of the eigenvectors of H. Indeed, the argument above for H − λI instead of H gives √ inf kHx − λxk & n with probability 1 − c n . x∈Comp
One more ε-net argument yields uniformity over λ ∈ R: √ inf kHx − λxk & n with probability 1 − c n . x∈Comp, λ∈R
Therefore all eigenvectors of H are incompressible; they are not too close to sparse vectors. [Erd¨os, Schlein, Yau] proved a more difficult version of delocalization: √ all eigenvectors x satisfy kxk∞ /kxk2 ≤ logC n/ n.
Proof. Step 3: Incompressible vectors Problem: inf x∈Incomp kHxk & n−1/2 ? Proving invertibility on incompressible vectors is more difficult – there are too many of them (no small ε-net). Alternative, geometric argument from [Rudelson-V ’08]: Denoting the columns of H by Hi , we have kHxk ≥ dist(Hx, E1 ) (where E1 := span(Hi )i>1 ) n X = dist( xi Hi , E ) = dist(x1 H1 , E ) = |x1 | · dist(H1 , E1 ). i=1
Condition on all columns but H1 , this fixes the subspace E1 .
Proof. Step 3: Incompressible vectors √ Problem: inf x∈Incomp kHxk & 1/ n ? We have shown: kHxk ≥ |x1 | · dist(H1 , E1 ). The same can be done for any coordinate xi . √ Since x ∈ Incomp, at least 0.1n coordinates |xi | & 1/ n. Therefore, the proof reduces to showing that dist(Hi , Ei ) & 1
with high probability.
Proof. Step 3: Incompressible vectors We have reduced the invertibility problem to: The Distance Problem. Estimate the distance between a random vector X and a random hyperplane E in Rn . Specifically, show that dist(X , E ) & 1
with high probability,
where X = a column of H, and E = span of the other columns.
For Gaussian distribution (Ginibre H) the solution is trivial, since dist(X , E ) = |N(0, 1)|. For general Ginibre matrices, a solution in [Tao-Vu], [Rudelson-V ’08]. But here we have an extra difficulty: X and E are not independent.
Proof. Step 4: Distance problem The Distance Theorem. Let X = first column of a symmetric random matrix H, and E = span of the other columns. Then dist(X , E ) & 1 w.h.p. Precisely, P dist(X , E ) ≤ ε . ε1/9 + exp(−nc ). To prove this result, decompose
Use linear algebra to express −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2
Proof. Step 5: Concentration of quadratic forms −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2 Here B is a symmetric random matrix (similar to H); Z is an independent random vector with iid coordinates. Thus E |hB −1 Z , Z i|2 = E kB −1 Z k2 = kB −1 k2HS . Ignoring 1 in the denominator, we reduced the problem to showing that −1 hB Z , Z i − h & kB −1 kHS . This is a problem on concentration of quadratic forms.
Proof. Step 5: Concentration of quadratic forms
Problem (Concentration of quadratic forms). Let B = symmetric random matrix, X = independent random vector with iid coordinates. Show that the distribution of the quadratic form hB −1 X , X i is spread. Specifically, show that for every u ∈ R, n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε2 + c n . We can only prove . ε1/9 + exp(−nc ). The invertibility theorem follows from this result.
Proof. Step 6: Decoupling Theorem (Concentration of quadratic forms) Let B = symmetric random matrix, X = independent random vector with iid coordinates. Then for every u ∈ R one has n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε1/9 + exp(−nc ). Proof. A decoupling argument replaces the quadratic form by the bilinear form hB −1 Y , X i where Y is an independent copy of X . Since as we know, kB −1 kHS ∼ kB −1 Y k w.h.p., this reduces the problem to concentration of a linear form: n o B −1 Y P ha, X i − u ≤ ε where a = . kB −1 Y k Condition on B and Y . Now a becomes a fixed vector.
Proof. Step 7: Littlewood-Offord Problem n o P ha, X i − u ≤ ε ≤ ?
S := ha, X i =
n X
ai Xi
where a =
B −1 Y is a fixed vector kB −1 Y k
is a sum of independent random variables.
i=1
We need to show that the distribution of S is spread. This is known as the Littlewood-Offord Problem. 1936–2010+: Littlewood-Offord, P. Erd¨ os, Erd¨os-Moser, Komlos, Tao-Vu, Rudelson-V, . . . Littlewood-Offord type theorems: The spread of S depends on the amount of additive structure of the coefficient vector a. “The less structure in a, the more S is spread.” Formalized by [Tao-Vu]; they measure structure in terms of geometric progressions. [Rudelson-V. ’08] measure structure in terms of Diophantine approximation.
Proof. Step 7: Littlewood-Offord Problem P{|S − u| ≤ ε} ≤ ?
where S =
X
ai Xi .
The spread of S is captured by the L´evy concentration function: L(S, ε) = sup P |S − u| ≤ ε , ε ≥ 0. u∈R
The additive structure of a is captured by the least common denominator (LCD): p D(a) = inf θ > 0 : dist(θx, Zn ) ≤ 10 log+ θ .
Proof. Step 7: Littlewood-Offord Problem
Theorem of Littlewood-Offord type (Rudelson-V, see Friedland-Sodin) A sum of ind. r.v’s S =
P
ai Xi satisfies
L(S, ε) . ε + 1/D(a),
ε ≥ 0.
If a is unstructured (D(a) 1), then S is well spread (L(S, ε) . ε). Back to our problem – we were working with a=
B −1 Y , kB −1 Y k2
B = symm. rand. matrix,
Y = ind. rand. vector.
We have thus reduced the problem to showing that a is unstructured. Want to show: the action of the random matrix B −1 on a fixed vector Y destroys additive structure.
Proof. Step 8: Structure of the Inverse Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, and a=
B −1 y . kB −1 y k2
Then, with high probability 1 − c n , a is unstructured. Conjecture: D(a) ≥ e cn . b b What is proved: for every λ ∈ (0, 1), we have D(a) ≥ nc/λ , where D(a) captures the most unstructured λn coefficients of a: b D(a) = max D(aI ). |I |=λn
b Idea: If D(a) is large, then a has some unstructured part aI , and the previous Littlewood-Offord type arguments still apply. b If D(a) is small, then all subsets of λn coordinates are structured, i.e. a is highly structured. This improves the metric entropy estimates.
Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, a=
B −1 y , kB −1 y k2
b D(a) = max D(aI ). |I |=λn
b Then, with high probability 1 − c n , we have D(a) ≥ nc/λ . Proof. Fix a level D < nc/λ and consider the level set b SD := {x ∈ S n−1 : D(x) ∼ D}. Note that Ba is colinear with the fixed vector y ; for simplicity Ba = y . We want to show that P{∃x ∈ SD : Bx = y } ≤ c n . This will be done by a covering argument.
Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn
b SD := {x ∈ S n−1 : D(x) ∼ D}.
Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 1. Let us fix x ∈ SD and estimate P{Bx = y }. Apply the decomposition argument (used for compressible vectors), but for [n] = I c ∪ I , where I is the most unstructured set of λn coefficients of x. kBx − y k2 ≥ kDu + Gz − y k2 =
X
2 di + hGi , zi .
i∈I c
Hence Bx = y implies that all di + hGi , zi = 0, i ∈ I c .
Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn
b SD := {x ∈ S n−1 : D(x) ∼ D}.
Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. But for each i, by the Littlewood-Offord type theorem, P{di + hGi , zi = 0} .
1 1 ∼ . D(z) D
Hence by independence P{Bx = y } ≤
Y i∈I c
P{di + hGi , zi = 0} .
1 n−λn D
.
Proof. Step 9: Proof of the Structure Theorem b D(a) = max D(aI ), |I |=λn
b SD := {x ∈ S n−1 : D(x) ∼ D}.
Proof ctd. 2. “How many” are there x ∈ SD ? What is the metric entropy of SD ? Start with the level set for the usual LCD, TD := {x ∈ S n−1 : D(x) ∼ D}. The cardinality of a fine net of TD is the same as the number of √ integer points in the ball of radius D in Rn , which is ∼ (D/ n)n . Pass from TD to SD : decompose [n] into 1/λ intervals of λn coord’s. b Since D(x) ∼ D, all these restrictions satisfy√D(xI ) . D. Choose nets for each restriction of size (D/ λn)λn (as above). Take the product ⇒ get a net of SD of size D λn 1/λ D n √ = √ . λn λn
Proof. Step 9: Proof of the Structure Theorem
Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn
b SD := {x ∈ S n−1 : D(x) ∼ D}.
Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 3. Take a union bound over all x ∈ SD (actually, over a net): D n 1 n−λn D λ n · . P{∃x ∈ SD : Bx = y } ≤ √ = √ D λn λn This ≤ c n if D n1/2λ , as we claimed.
References Ginibre matrices: M. Rudelson, R. Vershynin, The Littlewood-Offord Problem and invertibility of random matrices, Advances in Mathematics 218 (2008), 600–633. + M. Rudelson, R. Vershynin, Smallest singular value of a random rectangular matrix, Communications on Pure and Applied Mathematics, 62 (2009), 1707–1739. Wigner matrices: R. Vershynin, Invertibility of symmetric random matrices, Random Structures and Algorithms, to appear. Survey: M. Rudelson, R. Vershynin, Non-asymptotic theory of random matrices: extreme singular values, Proceedings of ICM 2010. Volume III, 1576–1602, Hindustan Book Agency, New Delhi, 2010. Tutorial: R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing, Theory and Applications, ed. Y. Eldar and G. Kutyniok. Cambridge University Press, 2012. pp. 210–268.