Invertibility of symmetric random matrices - CiteSeerX

Report 2 Downloads 151 Views
Invertibility of symmetric random matrices Roman Vershynin University of Michigan

Workshop on Random Matrices Bonn, May 30, 2012

The Invertibility Problem for random matrices For an n × n random matrix H with a given distribution: 1

What is the singularity probability P{H is singular}?

2

What is the typical value of the spectral norm of the inverse, kH −1 k?

Part 2 is equivalent to estimating the smallest singular value smin (H) = 1/kH −1 k. smin (H) is the smallest number s such that kHxk2 ≥ s · kxk2

for all x.

The Invertibility Problem for Random Matrices

The invertibility problem has been studied for several distributions of H: general Ginibre ensembles: all entries of H are iid zero mean general Wigner ensembles: H is symmetric, above-diagonal entries are iid zero mean (this talk) general Wishart ensembles: H = XX T , where X is a rectangular random matrix with iid zero mean entries unitary perturbations: H = D + U, where D is fixed (deterministic) and U ∈ U(n) is random uniformly distributed. (Mark Rudelson’s talk) etc.

Invertibility of Ginibre matrices 1940–2010+: Goldstine-von Neumann, Smale, Edelman, Szarek, Komlos, Kahn-Komlos-Szemeredi, Tao-Vu, Rudelson-V, Bourgain-Vu-Wood . . . [Rudelson-V ’08]: if H has subgaussian entries, then 1 2

P{H is singular} ≤ c n where c ∈ (0, 1) is a constant; √ √ kH −1 k ∼ n with high probability. Equivalently, smin (H) ∼ 1/ n.

More precisely,

n √ o P smin (H) ≤ ε/ n ≤ C ε + c n ,

ε > 0.

√ Much simpler is to see that kHk ∼ n. Then the result above complies with the heuristic: “the average gap between n singular values is √ √ ∼ n/n ∼ 1/ n. ”

Invertibility of Wigner matrices 2006–2011: Costello-Tao-Vu, Erd¨os-Schlein-Yau, Tao-Vu, V., Nguyen, . . .

Theorem (V’11) Let H be a symmetric random matrix whose above-diagonal entries are iid rv’s with mean zero, unit variance, and subgaussian. Then for every z ∈ R, the eigenvalues λk (H) satisfy: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ), ε ≥ 0. k

In terms of the invertibility problem, this yields: 1

2

P{H is singular} ≤ exp(−nc ). Previously known: . n−1/8 for symmetric Bernoulli matrix [Costello-Tao-Vu’10] √ √ kH −1 k ∼ n with high probability. Equivalently, λmin (H) ∼ 1/ n.

Invertibility of Wigner matrices n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k

Related results: [Erd¨os-Schlein-Yau’10] For continuous distributions, for z in the bulk: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε. k

[Nguyen’11] - independent, simultaneous: ∀B > 0 ∃A > 0: n o P min |λk (H) − z| ≤ n−A ≤ n−B . k

[Tao-Vu’11]: Universality. If the first few (3 or 4) moments of the entries of H and G match, then n n √ o √ o P min |λk (H)| ≤ ε/ n = P min |λk (G )| ≤ (ε±n−c )/ n ±Cn−c . k

k

Universality allows one to transfers EST result to discrete distributions, at the cost of polynomial errors O(n−c ) in the magnitude of λk and in probability.

Proof

n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k

For simplicity, assume z = 0. Variational characterization: min |λk (H)| = k

inf kHxk.

x∈S n−1

So we need, with high probability, a uniform lower bound √ inf kHxk & 1/ n. x∈S n−1

This is a geometric problem.

Proof. Step 1: Decomposition of the sphere √ Problem: inf x∈S n−1 kHxk & 1/ n ? General architecture of proof, [Rudelson-V ’08]: 1. Decompose S n−1 into compressible and incompressible vectors: S n−1 = Comp ∪ Incomp. A vector is compressible if 99% of its energy (`2 norm) is supported by 0.01n coordinates. Incompressible vectors are the rest of the sphere. Incompressible ≈ “delocalized”. 2. Prove the lower bound (invertibility) for Comp and Incomp separately.

Proof. Step 2: Compressible vectors

√ Problem: inf x∈Comp kHxk & 1/ n ?

Compressible vectors are simpler to control, as there are not too many of them. The metric entropy of Comp is small: there exists an δ-net of Comp of cardinality (C /δ)0.1n . Union bound + approximation argument reduce the problem to a lower bound for a single vector x.

Proof. Step 2: Compressible vectors √ Problem: inf x∈Comp kHxk & 1/ n ? Decompose H into

n 2

×

n 2

minors:   D G H= , GT E

  u x= . z

G has independent entries. Condition on D, E and write 2

2

kHxk ≥ kDu + G zk =

n/2 X

2 di + hGi , zi

i=1

where Gi are the rows of G , and di are some fixed numbers. Thus kAxk22 is a sum of n/2 independent random variables. A deviation inequality gives P{kHxk2 ≤ εn} ≤ (C /ε)n/2 . Combining with a union bound, we get a (too) strong conclusion: √ inf kHxk & n with probability 1 − c n . x∈Comp

Intermission: delocalization of eigenvectors As a by-product, we obtain a delocalization of the eigenvectors of H. Indeed, the argument above for H − λI instead of H gives √ inf kHx − λxk & n with probability 1 − c n . x∈Comp

One more ε-net argument yields uniformity over λ ∈ R: √ inf kHx − λxk & n with probability 1 − c n . x∈Comp, λ∈R

Therefore all eigenvectors of H are incompressible; they are not too close to sparse vectors. [Erd¨os, Schlein, Yau] proved a more difficult version of delocalization: √ all eigenvectors x satisfy kxk∞ /kxk2 ≤ logC n/ n.

Proof. Step 3: Incompressible vectors Problem: inf x∈Incomp kHxk & n−1/2 ? Proving invertibility on incompressible vectors is more difficult – there are too many of them (no small ε-net). Alternative, geometric argument from [Rudelson-V ’08]: Denoting the columns of H by Hi , we have kHxk ≥ dist(Hx, E1 ) (where E1 := span(Hi )i>1 ) n X = dist( xi Hi , E ) = dist(x1 H1 , E ) = |x1 | · dist(H1 , E1 ). i=1

Condition on all columns but H1 , this fixes the subspace E1 .

Proof. Step 3: Incompressible vectors √ Problem: inf x∈Incomp kHxk & 1/ n ? We have shown: kHxk ≥ |x1 | · dist(H1 , E1 ). The same can be done for any coordinate xi . √ Since x ∈ Incomp, at least 0.1n coordinates |xi | & 1/ n. Therefore, the proof reduces to showing that dist(Hi , Ei ) & 1

with high probability.

Proof. Step 3: Incompressible vectors We have reduced the invertibility problem to: The Distance Problem. Estimate the distance between a random vector X and a random hyperplane E in Rn . Specifically, show that dist(X , E ) & 1

with high probability,

where X = a column of H, and E = span of the other columns.

For Gaussian distribution (Ginibre H) the solution is trivial, since dist(X , E ) = |N(0, 1)|. For general Ginibre matrices, a solution in [Tao-Vu], [Rudelson-V ’08]. But here we have an extra difficulty: X and E are not independent.

Proof. Step 4: Distance problem The Distance Theorem. Let X = first column of a symmetric random matrix H, and E = span of the other columns. Then dist(X , E ) & 1 w.h.p. Precisely,  P dist(X , E ) ≤ ε . ε1/9 + exp(−nc ). To prove this result, decompose

Use linear algebra to express −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2

Proof. Step 5: Concentration of quadratic forms −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2 Here B is a symmetric random matrix (similar to H); Z is an independent random vector with iid coordinates. Thus E |hB −1 Z , Z i|2 = E kB −1 Z k2 = kB −1 k2HS . Ignoring 1 in the denominator, we reduced the problem to showing that −1 hB Z , Z i − h & kB −1 kHS . This is a problem on concentration of quadratic forms.

Proof. Step 5: Concentration of quadratic forms

Problem (Concentration of quadratic forms). Let B = symmetric random matrix, X = independent random vector with iid coordinates. Show that the distribution of the quadratic form hB −1 X , X i is spread. Specifically, show that for every u ∈ R, n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε2 + c n . We can only prove . ε1/9 + exp(−nc ). The invertibility theorem follows from this result.

Proof. Step 6: Decoupling Theorem (Concentration of quadratic forms) Let B = symmetric random matrix, X = independent random vector with iid coordinates. Then for every u ∈ R one has n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε1/9 + exp(−nc ). Proof. A decoupling argument replaces the quadratic form by the bilinear form hB −1 Y , X i where Y is an independent copy of X . Since as we know, kB −1 kHS ∼ kB −1 Y k w.h.p., this reduces the problem to concentration of a linear form: n o B −1 Y P ha, X i − u ≤ ε where a = . kB −1 Y k Condition on B and Y . Now a becomes a fixed vector.

Proof. Step 7: Littlewood-Offord Problem n o P ha, X i − u ≤ ε ≤ ?

S := ha, X i =

n X

ai Xi

where a =

B −1 Y is a fixed vector kB −1 Y k

is a sum of independent random variables.

i=1

We need to show that the distribution of S is spread. This is known as the Littlewood-Offord Problem. 1936–2010+: Littlewood-Offord, P. Erd¨ os, Erd¨os-Moser, Komlos, Tao-Vu, Rudelson-V, . . . Littlewood-Offord type theorems: The spread of S depends on the amount of additive structure of the coefficient vector a. “The less structure in a, the more S is spread.” Formalized by [Tao-Vu]; they measure structure in terms of geometric progressions. [Rudelson-V. ’08] measure structure in terms of Diophantine approximation.

Proof. Step 7: Littlewood-Offord Problem P{|S − u| ≤ ε} ≤ ?

where S =

X

ai Xi .

The spread of S is captured by the L´evy concentration function:  L(S, ε) = sup P |S − u| ≤ ε , ε ≥ 0. u∈R

The additive structure of a is captured by the least common denominator (LCD): p  D(a) = inf θ > 0 : dist(θx, Zn ) ≤ 10 log+ θ .

Proof. Step 7: Littlewood-Offord Problem

Theorem of Littlewood-Offord type (Rudelson-V, see Friedland-Sodin) A sum of ind. r.v’s S =

P

ai Xi satisfies

L(S, ε) . ε + 1/D(a),

ε ≥ 0.

If a is unstructured (D(a)  1), then S is well spread (L(S, ε) . ε). Back to our problem – we were working with a=

B −1 Y , kB −1 Y k2

B = symm. rand. matrix,

Y = ind. rand. vector.

We have thus reduced the problem to showing that a is unstructured. Want to show: the action of the random matrix B −1 on a fixed vector Y destroys additive structure.

Proof. Step 8: Structure of the Inverse Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, and a=

B −1 y . kB −1 y k2

Then, with high probability 1 − c n , a is unstructured. Conjecture: D(a) ≥ e cn . b b What is proved: for every λ ∈ (0, 1), we have D(a) ≥ nc/λ , where D(a) captures the most unstructured λn coefficients of a: b D(a) = max D(aI ). |I |=λn

b Idea: If D(a) is large, then a has some unstructured part aI , and the previous Littlewood-Offord type arguments still apply. b If D(a) is small, then all subsets of λn coordinates are structured, i.e. a is highly structured. This improves the metric entropy estimates.

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, a=

B −1 y , kB −1 y k2

b D(a) = max D(aI ). |I |=λn

b Then, with high probability 1 − c n , we have D(a) ≥ nc/λ . Proof. Fix a level D < nc/λ and consider the level set b SD := {x ∈ S n−1 : D(x) ∼ D}. Note that Ba is colinear with the fixed vector y ; for simplicity Ba = y . We want to show that P{∃x ∈ SD : Bx = y } ≤ c n . This will be done by a covering argument.

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 1. Let us fix x ∈ SD and estimate P{Bx = y }. Apply the decomposition argument (used for compressible vectors), but for [n] = I c ∪ I , where I is the most unstructured set of λn coefficients of x. kBx − y k2 ≥ kDu + Gz − y k2 =

X

2 di + hGi , zi .

i∈I c

Hence Bx = y implies that all di + hGi , zi = 0, i ∈ I c .

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. But for each i, by the Littlewood-Offord type theorem, P{di + hGi , zi = 0} .

1 1 ∼ . D(z) D

Hence by independence P{Bx = y } ≤

Y i∈I c

P{di + hGi , zi = 0} .

 1 n−λn D

.

Proof. Step 9: Proof of the Structure Theorem b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Proof ctd. 2. “How many” are there x ∈ SD ? What is the metric entropy of SD ? Start with the level set for the usual LCD, TD := {x ∈ S n−1 : D(x) ∼ D}. The cardinality of a fine net of TD is the same as the number of √ integer points in the ball of radius D in Rn , which is ∼ (D/ n)n . Pass from TD to SD : decompose [n] into 1/λ intervals of λn coord’s. b Since D(x) ∼ D, all these restrictions satisfy√D(xI ) . D. Choose nets for each restriction of size (D/ λn)λn (as above). Take the product ⇒ get a net of SD of size  D λn 1/λ  D n √ = √ . λn λn

Proof. Step 9: Proof of the Structure Theorem

Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 3. Take a union bound over all x ∈ SD (actually, over a net):  D n  1 n−λn  D λ n · . P{∃x ∈ SD : Bx = y } ≤ √ = √ D λn λn This ≤ c n if D  n1/2λ , as we claimed.

References Ginibre matrices: M. Rudelson, R. Vershynin, The Littlewood-Offord Problem and invertibility of random matrices, Advances in Mathematics 218 (2008), 600–633. + M. Rudelson, R. Vershynin, Smallest singular value of a random rectangular matrix, Communications on Pure and Applied Mathematics, 62 (2009), 1707–1739. Wigner matrices: R. Vershynin, Invertibility of symmetric random matrices, Random Structures and Algorithms, to appear. Survey: M. Rudelson, R. Vershynin, Non-asymptotic theory of random matrices: extreme singular values, Proceedings of ICM 2010. Volume III, 1576–1602, Hindustan Book Agency, New Delhi, 2010. Tutorial: R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing, Theory and Applications, ed. Y. Eldar and G. Kutyniok. Cambridge University Press, 2012. pp. 210–268.