Invertibility of symmetric random matrices - CiteSeerX

Comment

Report 2 Downloads 151 Views

Invertibility of symmetric random matrices Roman Vershynin University of Michigan

Workshop on Random Matrices Bonn, May 30, 2012

The Invertibility Problem for random matrices For an n × n random matrix H with a given distribution: 1

What is the singularity probability P{H is singular}?

2

What is the typical value of the spectral norm of the inverse, kH −1 k?

Part 2 is equivalent to estimating the smallest singular value smin (H) = 1/kH −1 k. smin (H) is the smallest number s such that kHxk2 ≥ s · kxk2

for all x.

The Invertibility Problem for Random Matrices

The invertibility problem has been studied for several distributions of H: general Ginibre ensembles: all entries of H are iid zero mean general Wigner ensembles: H is symmetric, above-diagonal entries are iid zero mean (this talk) general Wishart ensembles: H = XX T , where X is a rectangular random matrix with iid zero mean entries unitary perturbations: H = D + U, where D is fixed (deterministic) and U ∈ U(n) is random uniformly distributed. (Mark Rudelson’s talk) etc.

Invertibility of Ginibre matrices 1940–2010+: Goldstine-von Neumann, Smale, Edelman, Szarek, Komlos, Kahn-Komlos-Szemeredi, Tao-Vu, Rudelson-V, Bourgain-Vu-Wood . . . [Rudelson-V ’08]: if H has subgaussian entries, then 1 2

P{H is singular} ≤ c n where c ∈ (0, 1) is a constant; √ √ kH −1 k ∼ n with high probability. Equivalently, smin (H) ∼ 1/ n.

More precisely,

n √ o P smin (H) ≤ ε/ n ≤ C ε + c n ,

ε > 0.

√ Much simpler is to see that kHk ∼ n. Then the result above complies with the heuristic: “the average gap between n singular values is √ √ ∼ n/n ∼ 1/ n. ”

Invertibility of Wigner matrices 2006–2011: Costello-Tao-Vu, Erd¨os-Schlein-Yau, Tao-Vu, V., Nguyen, . . .

Theorem (V’11) Let H be a symmetric random matrix whose above-diagonal entries are iid rv’s with mean zero, unit variance, and subgaussian. Then for every z ∈ R, the eigenvalues λk (H) satisfy: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ), ε ≥ 0. k

In terms of the invertibility problem, this yields: 1

2

P{H is singular} ≤ exp(−nc ). Previously known: . n−1/8 for symmetric Bernoulli matrix [Costello-Tao-Vu’10] √ √ kH −1 k ∼ n with high probability. Equivalently, λmin (H) ∼ 1/ n.

Invertibility of Wigner matrices n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k

Related results: [Erd¨os-Schlein-Yau’10] For continuous distributions, for z in the bulk: n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε. k

[Nguyen’11] - independent, simultaneous: ∀B > 0 ∃A > 0: n o P min |λk (H) − z| ≤ n−A ≤ n−B . k

[Tao-Vu’11]: Universality. If the first few (3 or 4) moments of the entries of H and G match, then n n √ o √ o P min |λk (H)| ≤ ε/ n = P min |λk (G )| ≤ (ε±n−c )/ n ±Cn−c . k

k

Universality allows one to transfers EST result to discrete distributions, at the cost of polynomial errors O(n−c ) in the magnitude of λk and in probability.

Proof

n √ o P min |λk (H) − z| ≤ ε/ n ≤ C ε1/9 + exp(−nc ). k

For simplicity, assume z = 0. Variational characterization: min |λk (H)| = k

inf kHxk.

x∈S n−1

So we need, with high probability, a uniform lower bound √ inf kHxk & 1/ n. x∈S n−1

This is a geometric problem.

Proof. Step 1: Decomposition of the sphere √ Problem: inf x∈S n−1 kHxk & 1/ n ? General architecture of proof, [Rudelson-V ’08]: 1. Decompose S n−1 into compressible and incompressible vectors: S n−1 = Comp ∪ Incomp. A vector is compressible if 99% of its energy (`2 norm) is supported by 0.01n coordinates. Incompressible vectors are the rest of the sphere. Incompressible ≈ “delocalized”. 2. Prove the lower bound (invertibility) for Comp and Incomp separately.

Proof. Step 2: Compressible vectors

√ Problem: inf x∈Comp kHxk & 1/ n ?

Compressible vectors are simpler to control, as there are not too many of them. The metric entropy of Comp is small: there exists an δ-net of Comp of cardinality (C /δ)0.1n . Union bound + approximation argument reduce the problem to a lower bound for a single vector x.

Proof. Step 2: Compressible vectors √ Problem: inf x∈Comp kHxk & 1/ n ? Decompose H into

n 2

×

n 2

minors: D G H= , GT E

u x= . z

G has independent entries. Condition on D, E and write 2

2

kHxk ≥ kDu + G zk =

n/2 X

2 di + hGi , zi

i=1

where Gi are the rows of G , and di are some fixed numbers. Thus kAxk22 is a sum of n/2 independent random variables. A deviation inequality gives P{kHxk2 ≤ εn} ≤ (C /ε)n/2 . Combining with a union bound, we get a (too) strong conclusion: √ inf kHxk & n with probability 1 − c n . x∈Comp

Intermission: delocalization of eigenvectors As a by-product, we obtain a delocalization of the eigenvectors of H. Indeed, the argument above for H − λI instead of H gives √ inf kHx − λxk & n with probability 1 − c n . x∈Comp

One more ε-net argument yields uniformity over λ ∈ R: √ inf kHx − λxk & n with probability 1 − c n . x∈Comp, λ∈R

Therefore all eigenvectors of H are incompressible; they are not too close to sparse vectors. [Erd¨os, Schlein, Yau] proved a more difficult version of delocalization: √ all eigenvectors x satisfy kxk∞ /kxk2 ≤ logC n/ n.

Proof. Step 3: Incompressible vectors Problem: inf x∈Incomp kHxk & n−1/2 ? Proving invertibility on incompressible vectors is more difficult – there are too many of them (no small ε-net). Alternative, geometric argument from [Rudelson-V ’08]: Denoting the columns of H by Hi , we have kHxk ≥ dist(Hx, E1 ) (where E1 := span(Hi )i>1 ) n X = dist( xi Hi , E ) = dist(x1 H1 , E ) = |x1 | · dist(H1 , E1 ). i=1

Condition on all columns but H1 , this fixes the subspace E1 .

Proof. Step 3: Incompressible vectors √ Problem: inf x∈Incomp kHxk & 1/ n ? We have shown: kHxk ≥ |x1 | · dist(H1 , E1 ). The same can be done for any coordinate xi . √ Since x ∈ Incomp, at least 0.1n coordinates |xi | & 1/ n. Therefore, the proof reduces to showing that dist(Hi , Ei ) & 1

with high probability.

Proof. Step 3: Incompressible vectors We have reduced the invertibility problem to: The Distance Problem. Estimate the distance between a random vector X and a random hyperplane E in Rn . Specifically, show that dist(X , E ) & 1

with high probability,

where X = a column of H, and E = span of the other columns.

For Gaussian distribution (Ginibre H) the solution is trivial, since dist(X , E ) = |N(0, 1)|. For general Ginibre matrices, a solution in [Tao-Vu], [Rudelson-V ’08]. But here we have an extra difficulty: X and E are not independent.

Proof. Step 4: Distance problem The Distance Theorem. Let X = first column of a symmetric random matrix H, and E = span of the other columns. Then dist(X , E ) & 1 w.h.p. Precisely, P dist(X , E ) ≤ ε . ε1/9 + exp(−nc ). To prove this result, decompose

Use linear algebra to express −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2

Proof. Step 5: Concentration of quadratic forms −1 hB Z , Z i − h dist(X , E ) = p . 1 + kB −1 Z k2 Here B is a symmetric random matrix (similar to H); Z is an independent random vector with iid coordinates. Thus E |hB −1 Z , Z i|2 = E kB −1 Z k2 = kB −1 k2HS . Ignoring 1 in the denominator, we reduced the problem to showing that −1 hB Z , Z i − h & kB −1 kHS . This is a problem on concentration of quadratic forms.

Proof. Step 5: Concentration of quadratic forms

Problem (Concentration of quadratic forms). Let B = symmetric random matrix, X = independent random vector with iid coordinates. Show that the distribution of the quadratic form hB −1 X , X i is spread. Specifically, show that for every u ∈ R, n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε2 + c n . We can only prove . ε1/9 + exp(−nc ). The invertibility theorem follows from this result.

Proof. Step 6: Decoupling Theorem (Concentration of quadratic forms) Let B = symmetric random matrix, X = independent random vector with iid coordinates. Then for every u ∈ R one has n o P hB −1 X , X i − u ≤ εkB −1 kHS . ε1/9 + exp(−nc ). Proof. A decoupling argument replaces the quadratic form by the bilinear form hB −1 Y , X i where Y is an independent copy of X . Since as we know, kB −1 kHS ∼ kB −1 Y k w.h.p., this reduces the problem to concentration of a linear form: n o B −1 Y P ha, X i − u ≤ ε where a = . kB −1 Y k Condition on B and Y . Now a becomes a fixed vector.

Proof. Step 7: Littlewood-Offord Problem n o P ha, X i − u ≤ ε ≤ ?

S := ha, X i =

n X

ai Xi

where a =

B −1 Y is a fixed vector kB −1 Y k

is a sum of independent random variables.

i=1

We need to show that the distribution of S is spread. This is known as the Littlewood-Offord Problem. 1936–2010+: Littlewood-Offord, P. Erd¨ os, Erd¨os-Moser, Komlos, Tao-Vu, Rudelson-V, . . . Littlewood-Offord type theorems: The spread of S depends on the amount of additive structure of the coefficient vector a. “The less structure in a, the more S is spread.” Formalized by [Tao-Vu]; they measure structure in terms of geometric progressions. [Rudelson-V. ’08] measure structure in terms of Diophantine approximation.

Proof. Step 7: Littlewood-Offord Problem P{|S − u| ≤ ε} ≤ ?

where S =

X

ai Xi .

The spread of S is captured by the L´evy concentration function: L(S, ε) = sup P |S − u| ≤ ε , ε ≥ 0. u∈R

The additive structure of a is captured by the least common denominator (LCD): p D(a) = inf θ > 0 : dist(θx, Zn ) ≤ 10 log+ θ .

Proof. Step 7: Littlewood-Offord Problem

Theorem of Littlewood-Offord type (Rudelson-V, see Friedland-Sodin) A sum of ind. r.v’s S =

P

ai Xi satisfies

L(S, ε) . ε + 1/D(a),

ε ≥ 0.

If a is unstructured (D(a) 1), then S is well spread (L(S, ε) . ε). Back to our problem – we were working with a=

B −1 Y , kB −1 Y k2

B = symm. rand. matrix,

Y = ind. rand. vector.

We have thus reduced the problem to showing that a is unstructured. Want to show: the action of the random matrix B −1 on a fixed vector Y destroys additive structure.

Proof. Step 8: Structure of the Inverse Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, and a=

B −1 y . kB −1 y k2

Then, with high probability 1 − c n , a is unstructured. Conjecture: D(a) ≥ e cn . b b What is proved: for every λ ∈ (0, 1), we have D(a) ≥ nc/λ , where D(a) captures the most unstructured λn coefficients of a: b D(a) = max D(aI ). |I |=λn

b Idea: If D(a) is large, then a has some unstructured part aI , and the previous Littlewood-Offord type arguments still apply. b If D(a) is small, then all subsets of λn coordinates are structured, i.e. a is highly structured. This improves the metric entropy estimates.

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, a=

B −1 y , kB −1 y k2

b D(a) = max D(aI ). |I |=λn

b Then, with high probability 1 − c n , we have D(a) ≥ nc/λ . Proof. Fix a level D < nc/λ and consider the level set b SD := {x ∈ S n−1 : D(x) ∼ D}. Note that Ba is colinear with the fixed vector y ; for simplicity Ba = y . We want to show that P{∃x ∈ SD : Bx = y } ≤ c n . This will be done by a covering argument.

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 1. Let us fix x ∈ SD and estimate P{Bx = y }. Apply the decomposition argument (used for compressible vectors), but for [n] = I c ∪ I , where I is the most unstructured set of λn coefficients of x. kBx − y k2 ≥ kDu + Gz − y k2 =

X

2 di + hGi , zi .

i∈I c

Hence Bx = y implies that all di + hGi , zi = 0, i ∈ I c .

Proof. Step 9: Proof of the Structure Theorem Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. But for each i, by the Littlewood-Offord type theorem, P{di + hGi , zi = 0} .

1 1 ∼ . D(z) D

Hence by independence P{Bx = y } ≤

Y i∈I c

P{di + hGi , zi = 0} .

1 n−λn D

.

Proof. Step 9: Proof of the Structure Theorem b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Proof ctd. 2. “How many” are there x ∈ SD ? What is the metric entropy of SD ? Start with the level set for the usual LCD, TD := {x ∈ S n−1 : D(x) ∼ D}. The cardinality of a fine net of TD is the same as the number of √ integer points in the ball of radius D in Rn , which is ∼ (D/ n)n . Pass from TD to SD : decompose [n] into 1/λ intervals of λn coord’s. b Since D(x) ∼ D, all these restrictions satisfy√D(xI ) . D. Choose nets for each restriction of size (D/ λn)λn (as above). Take the product ⇒ get a net of SD of size D λn 1/λ D n √ = √ . λn λn

Proof. Step 9: Proof of the Structure Theorem

Theorem (Structure of the Inverse) Let B be an n × n symmetric random matrix, y be a fixed vector, b D(a) = max D(aI ), |I |=λn

b SD := {x ∈ S n−1 : D(x) ∼ D}.

Then P{∃x ∈ SD : Bx = y } ≤ c n . Proof ctd. 3. Take a union bound over all x ∈ SD (actually, over a net): D n 1 n−λn D λ n · . P{∃x ∈ SD : Bx = y } ≤ √ = √ D λn λn This ≤ c n if D n1/2λ , as we claimed.

References Ginibre matrices: M. Rudelson, R. Vershynin, The Littlewood-Offord Problem and invertibility of random matrices, Advances in Mathematics 218 (2008), 600–633. + M. Rudelson, R. Vershynin, Smallest singular value of a random rectangular matrix, Communications on Pure and Applied Mathematics, 62 (2009), 1707–1739. Wigner matrices: R. Vershynin, Invertibility of symmetric random matrices, Random Structures and Algorithms, to appear. Survey: M. Rudelson, R. Vershynin, Non-asymptotic theory of random matrices: extreme singular values, Proceedings of ICM 2010. Volume III, 1576–1602, Hindustan Book Agency, New Delhi, 2010. Tutorial: R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing, Theory and Applications, ed. Y. Eldar and G. Kutyniok. Cambridge University Press, 2012. pp. 210–268.

Recommend Documents

APPROXIMATION OF SEQUENCES OF SYMMETRIC MATRICES ...

symmetric generalized low rank approximations of matrices