Regular Banach Spaces and Large Deviations of ... - Semantic Scholar

Report 10 Downloads 136 Views
Regular Banach Spaces and Large Deviations of Random Sums Arkadi Nemirovski∗ Working paper, version of December 13, 2004

1

Overview

A typical result on large deviations of sums with random terms states that if ξt are independent scalar random variables with zero means and such that “ξt has as light tails as a Gaussian N (0, 4σt2 ) random variable”, specifically, © ª E exp{ξt2 /σt2 } ≤ O(1), (1) and

N X

SN =

ξt ,

t=1

then

½ ¾ q 2 Prob |SN | > t σ12 + ... + σN ≤ O(1) exp{−O(1)t2 }

(2)

(from now on, all O(1)’s are appropriate positive absolute constants). Our goal is to get similar results for the case when ξt are independent random vectors with zero means in a finite-dimensional vector space N P E equipped with norm k · k, SN = ξt and the “light tail” condition (1) is stated as t=1

© ª E exp{kξt k2 /σt2 } ≤ exp{1}.

(3)

Note that a straightforward guess E{ξt } = 0∀t & (3) & {ξ o n t } are independent p 2 ≤ O(1) exp{−O(1)t2 } ⇒ Prob kSN k > t σ12 + ... + σN

(4)

is not true, as it is shown by the following example: P • E = Rn , kxk = kxk1 ≡ |xj |, j

½

²t , j = t(mod n) , where ²1 , ²2 , ... are independent random variables taking values ±1 0, otherwise with probability 1/2,

• (ξt )j =

• σt = 1, i ≥ 1. We have kSn k1 ≡ n =

√ q 2 n σ1 + ... + σn2 ,

∗ Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology, Technion City, Haifa 32000, Israel, [email protected]

1

so that all we can hope for is a relation of the type E{ξt } = 0∀t & (3) & {ξ n t } are independent o p 2 ⇒ Prob kSN k > tΘ σ12 + ... + σN ≤ O(1) exp{−O(1)t2 }

(5)

√ with depending on (E, k · k) factor Θ which, as our example clearly shows, can be as large as dim E. Our major goal is to demonstrate that (5) indeed holds true, provided that Θ2 is an upper bound on the “constant of regularity” κ(E, k · k) of (E, k · k), with the latter notion defined as follows: Definition 1.1 Let (E, k · k) be a Banach space, and let κ ≥ 1. (i) Space (E, k · k) is called κ-smooth, if the function p(x) = kxk2 is continuously differentiable and ∀x, y ∈ E : p(x + y) ≤ p(x) + Dp(x)[y] + κp(y).

(6)

(ii) Space (E, k · k) (and the norm k · k on E) is called κ-regular, if there exists κ+ ∈ [1, κ] and a norm k · k+ on E such that (E, k · k+ ) is κ+ -smooth and k · k+ is κ/κ+ -compatible with k · k, that is, ∀x ∈ E : kxk2 ≤ kxk2+ ≤

κ kxk2 . κ+

(7)

The constant κ(E, k·k) of regularity of E, k·k is the infinum of those κ ≥ 1 for which (E, k·k) is κ-regular. In the sequel we 1. Provide a number of interesting examples of normed spaces with “nearly dimension-independent” constants of regularity. Specifically, we demonstrate in Section 2.1 that if p ≥ 2, then the norm |X|p on the space of m × n matrices X: [σ(X) is the vector of singular values of X]

|X|p = kσ(X)kp

is κ-regular with κ = min[p + 1, (2 ln(min[m, n]) + 1) exp{1}]. 2. Demonstrate (Section 3) that (5) indeed is true, provided that Θ ≥

p

κ(E, k · k). In particular,

(!!) If ξt are independent random m × n matrices with zero mean such that © ª E exp{|ξt |2∞ σt−2 } ≤ exp{1}, where |X|∞ = kσ(X)k∞ is the usual matrix norm of X, then ( N ) q X p 2 2 Prob | ξt |∞ ≥ t ln(min[m + 1, n + 1]) σ1 + ... + σN ≤ O(1) exp{−O(1)t2 }. t=1

Note also that every norm k · k on Rn is n-compatible with Euclidean norm, and the latter is, of course, 1-smooth. √ Thus, every norm on Rn is n-regular; in other words, the (Rn , k · k1 )-example, where the factor Θ = n in (5) is a must, is exactly the worst case. Major part of the results to follow were announced in [3].

2 2.1

Regular Banach spaces Basic examples

Example 2.1 Let 2 ≤ p ≤ ∞. The space (Rn , k · kp ) with n ≥ 3 is κp (n)-regular with 2

2

κp (n) = min (ρ − 1)n ρ − p ≤ min[p − 1, 2 ln(n)] 2≤ρ≤p ρ 2, we have Dp(x)[h] = D2 p(x)[h, h] =

µ P

¶ ρ2 −1

P |xi |ρ−1 sign(xi )hi |xi | i i µ ¶µ ¶ ρ2 −2 µ ¶2 P P 2 ρ ρ−1 2 −1 |xi | |xi | sign(xi )hi ρ i | {z } i ≤0 ¡X ¢ 2 −1 P P +2 |xi |ρ ρ (ρ − 1)|xi |ρ−2 h2i ≤ 2(ρ − 1) |xi |ρ−2 h2i 2

ρ

i

i



| {z } =1 µ ¶ ρ−2 µ ¶ ρ2 ρ ρ P P ρ 2(ρ − 1) (|xi |ρ−2 ) ρ−2 (|hi |2 ) 2 i

= 2(ρ − 1)khk2ρ = 2(ρ − 1)p(h)

i

i

as required in (9) when κ+ = ρ − 1. In the case of ρ = 2 relation (9) with κ+ = ρ − 1 = 1 is evident. 2 2 Now, when ρ ∈ [2, p] and x ∈ Rn , one has kxk2ρ /kxk2p ∈ [1, n ρ − p ], so that (Rn , k · kp ) is κ-regular 2

2

with κ = (ρ − 1)n ρ − p , and (8) follows. Example 2.2 Let 2 ≤ p ≤ ∞. The norm |X|p = kσ(X)kp on the space Rm×n of m × n real matrices, where σ(X) is the vector of singular values of X, is κp (m, n)-regular, with 2

2

κp (m, n) = min max[2, ρ − 1](min(m, n)) ρ − p ≤ min [max[2, p − 1], (2 ln(min[m, n] + 2) − 1) exp{1}] . 2≤ρ 2. A. We start with the following fact which is important by its own right (for the proof, see Appendix): Proposition 2.1 Let ∆ be an open interval on the axis, and f be a C2 function on ∆ such that for certain θ± , µ± ∈ R one has ∀(a < b, a, b ∈ ∆) : θ−

f 00 (a) + f 00 (b) f 0 (b) − f 0 (a) f 00 (a) + f 00 (b) + µ− ≤ ≤ θ+ + µ+ 2 b−a 2

(12)

Let, further, Xn (∆) be the set of all n × n symmetric matrices with eigenvalues belonging to ∆. The function F (X) = Tr(f (X)) : Xn (∆) → R is C2 , and for every X ∈ Xn (∆) and every H ∈ Sn one has θ− Tr(Hf 00 (X)H) + µ− Tr(H 2 ) ≤ D2 F (X)[H, H] ≤ θ+ Tr(Hf 00 (X)H) + µ+ Tr(H 2 ). 3

(13)

ρ B.h Let us i apply Proposition 2.1 to ∆ = R, f (t) = |t| with θ− = µ− = 0, µ+ = 0 and θ+ = 2 max ρ−1 , 1 (this choice, as it is immediately seen, satisfies (12)). By Proposition, the function F (X) = ρ |X|ρ on Sn is twice continuously differentiable, and · ¸ 2 2 00 2 ∀X, H : 0 ≤ D F (X)[H, H] ≤ θ+ Tr(f (x)H ), θ+ = max ,1 . (14) ρ−1 2

It follows that the function p(X) = |X|2ρ = (F (X)) ρ is continuously differentiable everywhere and twice 2

continuously differentiable outside of the origin. For X 6= 0 we have Dp(X)[H] = ρ2 (F (X)) ρ −1 DF (X)[H], whence · ¸ 2 2 2 2 2 − 1 (F (X)) ρ −2 (DF (X)[H])2 + ρ2 (F (X)) ρ −1 D2 F (X)[H, H] X 6= 0 ⇒ D p(X)[H, H] = ρ ρ | {z } (15) 0, and let L be a linear subspace in E. Then (L, αk · k) is κ-regular. Proof. (i): To prove (i), let pi (xi ) = kxi k2i . A. Let ρ ∈ [2, ∞) be such that ρ ≤ p, and let r = ρ/2. Our local goal is to prove Lemma 2 The norm k · k on E = E1 × ... × Em defined as k(x1 , ..., xm )k = k(kx1 k1 , ..., kxm km )kρ is κ+ -smooth, with κ+ = κ + ρ − 2 (20) Proof. We have p(x1 , ..., xm ) ≡ k(kx1 k1 , ..., kxm km )k2ρ = k(p1 (x1 ), ..., pm (xm ))kr . From this observation it immediately follows that p(·) is continuously differentiable. Indeed, ρ ≥ 2, whence r ≥ 1, so that the function kykr is continuously differentiable everywhere on Rm + except for the origin; the functions pi (xi ) are continuously differentiable by assumption. Consequently, p(x) is continuously differentiable everywhere on E = E1 × ... × Em , except, perhaps, the origin; the fact that p0 is continuous at the origin is evident. Invoking Proposition 2.2, in order to prove Lemma 2 it suffices to verify that kp0 (x) − p0 (y)k∗ ≤ 2κ+ kx − yk

(21)

0

for all x, y. Since p is continuous, it suffices to prove this relation for a dense in E × E set of pairs x, y, for example, those for which all blocks xi ∈ Ei in x are nonzero. With such x, the segment [x, y] contains finitely many points u such that at least one of the blocks ui is zero; these points split [x, y] into finitely many consecutive segments, and it suffices to prove that kp0 (x0 ) − p0 (y 0 )k∗ ≤ 2κ+ kx0 − y 0 k when x0 , y 0 are endpoints of such a segment. Since p0 is continuous, to prove the latter statement is the same as to prove similar statement for the case when x0 , y 0 are interior points of the segment. The bottom line is as follows: in order to prove (21) for all pairs x, y, it suffices to prove the same statement for those pairs x, y for which every segment [xi , y i ] does not pass through the origin of the corresponding Ei . Let x, y be such that [xi , y i ] does not pass through the origin of Ei , i = 1, ..., m. Same as in the item “(i)⇒(iii)” of the proof of Proposition 2.2, for every i there exists a sequence of C∞ convex functions {pti (·) > 0}∞ t=1 on Ei converging to pi (·) along with first order derivatives uniformly on compact sets and such that (22) |D2 pti (ui )[hi , hi ]| ≤ 2κkhi k2i ∀(ui , hi ∈ Ei ). Functions pt (u) = k(pt1 (u1 ), ..., ptm (um ))kr clearly are convex, C∞ (recall that pti (·) > 0) and converge to p(·), along with their first order derivatives, uniformly on compact sets. It follows that Z1 0

0

D2 pt (x + t(y − x))[y − x, h]dt.

hp (y) − p (x), hi = lim

t→∞ 0

7

(23)

1

r Setting F (y1 , ..., ym ) = y1r + ... + ym , y ≥ 0, we have pt (u) = F r (pt1 (u1 ), ..., ptm (um )). Now let u ∈ [x, y], and let v ∈ E. We have µ ¶ P t −1 r1 −1 t 1 t m t i r−1 t i i Dp (u)[v] = r F (p1 (u ), ..., pm (u )) r(pi (u )) Dpi (u )[v ] i µ ¶ µ ¶2 P 1 1 − 1 F r −2 (pt1 (u1 ), ..., ptm (um )) r(pti (ui ))r−1 Dpti (ui )[v i ] ⇒ D2 pt (u)[v, v] = 1r r i | {z } ≤0 ¤ P£ 1 +F r −1 (pt1 (u1 ), ..., ptm (um )) (r − 1)(pti (ui ))r−2 (Dpti (ui )[vi ])2 + (pti (ui ))r−1 D2 pti (ui )[v i , v i ] i £ ¤ P 1 ≤ F r −1 (pt1 (u1 ), ..., ptm (um )) (r − 1)(pti (ui ))r−2 (Dpti (ui )[vi ])2 + 2κ(pti (ui ))r−1 pi (v i ) i

whence 0 ≤ D2 pt (u)[v, v] ¤ P£ 1 ≤ F r −1 (pt1 (u1 ), ..., ptm (um )) (r − 1)(pti (ui ))r−2 (Dpti (ui )[vi ])2 + 2κ(pti (ui ))r−1 pi (v i ) .

(24)

i

Taking into account that pi (·) are bounded away from zero on [x, y] and that pti (·) converge, along with first order derivatives, to pi (·) uniformly on compact sets as t → ∞, the right hand side in bound (24) converges, as t → ∞, uniformly in u ∈ [x, y] and v, kvk ≤ 1, to Ψ(u, v) =

à X

! ρ2 −1 kui kρi

i

i Xh (r − 1)kui kiρ−4 (Dpi (ui )[vi ])2 + 2κkui kρ−2 kv i k2i . i i

By evident reasons, |Dpi (ui )[vi ]| ≤ 2kui kkv i k, whence Ψ(u, v) ≤ =

µ P

¶ ρ2 −1

i Ph 2 i ρ−2 i 2 4(r − 1)kui kρ−2 kv k + 2κku k kv k i i i i i i µ i ¶ ρ2 −1 P i ρ P i ρ−2 i 2 [2ρ + 2κ − 4] ku ki ku ki kv ki | {z } i i kui kρi

(25)

2κ+

When ρ > 2, we have P i

kui kρ−2 kv i k2i i



µ P i

=

µ P i

¶ ρ−2 µ ¶2 ρ P i 2 ρ ρ (kv ki ) 2 i µ ¶2 P i ρ ρ , kv ki

ρ (kui kiρ−2 ) ρ−2

kui kρi

¶ ρ−2 ρ

i

and (25) implies that Ψ(u, v) ≤ 2κ+ kvk2 . This inequality clearly is valid for ρ = 2 as well. Recalling the origin of Ψ(·, ·), we conclude that for every ² > 0 there exists t² such that t ≥ t² , u ∈ [x, y], kvk ≤ 1 ⇒ 0 ≤ D2 pt (u)[v, v] ≤ 2κ+ kvk2 + ². The resulting inequality via the same reasoning as in the proof of item “(i)⇒(iii)” of Proposition 2.2 implies that t ≥ t² , u ∈ [x, y] ⇒ |D2 pt (u)[v, w]| ≤ (2κ+ + ²)kvkkwk ∀v, w. In view of this bound and (23), we conclude that hp0 (y) − p0 (x), hi ≤ (2κ+ + ²)ky − xkkhk for all h, whence kp0 (y) − p0 (x)k∗ ≤ (2κ+ + ²)ky − xk. Since ² > 0 is arbitrary, we arrive at (21). 8

B. When ρ ≤ p, we have 2

2

k(kx1 k1 , ..., kxm km )k2p ≤ k(kx1 k1 , ..., kxm km )k2ρ ≤ m ρ − p k(kx1 k1 , ..., kxm km )k2p , 2

2

which combines with Lemma 2 to imply that the norm in (i) is κ-regular with κ = [ρ + κ − 2]m ρ − p , for every ρ ∈ [2, p], and (i) follows. p m k2 on E × E × ... × E. (ii): To prove (ii), consider the norm |(x1 , ..., xm )| = m1/2 kx1 k21 + ... + kxP m As it is immediately seen, this norm is κ-smooth. If, further, k(x1 , ..., xm )k† = kxi ki , then i

kxk2† ≤ |x|2 ≤ mkxk2† ∀x ∈ E × ... × E, whence k · k† is mκ-regular. The norm in (ii) is nothing but the restriction of k · k† on the image of E under the embedding x 7→ (x, ..., x) of E into E × ... × E, and it remains to use (iii). (iii): Evident. To proceed, we need the following fact: Lemma 3 Let (E, k · k) be a finite-dimensional κ-regular space. Then there exists κ-smooth norm k · k+ on E such that ∀(x ∈ E) : kxk2 ≤ kxk2+ ≤ 2kxk2 . (26) Proof. By definition, there exists κ+ ∈ [1, κ] and a norm π(·) on E which is κ+ -smooth and such that ∀(x ∈ E) : kxk2 ≤ π 2 (x) ≤ µkxk2 , µ = κ/κ+ , or, which is the same, ∀ξ ∈ E ∗ : π∗2 (ξ) ≥ kξk2∗ ≥

1 2 π (ξ), µ ∗

(27)

where E ∗ is the space dual to E and π∗ , k · k∗ are the norms on E ∗ conjugate to π, k · k, respectively. In the case of µ ≤ 2, let us take k · k+ ≡ π(·), thus getting a κ+ -smooth (and thus – κ-smooth as well) norm on E satisfying (26). Now let µ > 2, so that γ = 1/(µ − 1) ∈ (0, 1). Let us set q∗ (ξ) = p γπ∗2 (ξ) + (1 − γ)kξk2∗ , so that q∗ (·) is a norm on E ∗ . We have ∀ξ ∈ E ∗ : q∗2 (ξ) ≥ kξk2∗ ≥

1 1 q∗2 (ξ) = q∗2 (ξ). γµ + 1 − γ 2

(28)

Further, by Proposition 2.2 we have ∀(ξ, η ∈ E ∗ , x ∈ ∂π∗2 (ξ)) : π∗2 (ξ + η) ≥ π∗2 (ξ) + hη, xi +

1 2 π (η), κ+ ∗

whence, due to kξ + ηk2∗ ≥ kξk2∗ + hη, yi for all ξ, η and every y from the subdifferential D(ξ) of k · k2∗ at the point ξ, ∀(ξ, η ∈ E ∗ , x ∈ ∂π∗2 (ξ), y ∈ D(ξ)) : q∗2 (ξ + η) ≥ q∗2 (ξ) + hη, x + yi +

γ 2 γ 2 π∗ (η) ≥ q∗2 (ξ) + hη, x + yi + q (η) κ+ κ+ ∗ γ κ+

=

1 (µ−1)κ+

∀(ξ, η ∈ E ∗ , z ∈ ∂q∗2 (ξ)) : q∗2 (ξ + η) ≥ q∗2 (ξ) + hη, zi +

1 2 q (η). κ ∗

(note that π∗ (·) ≥ q∗ (·) by (27)). Since ∂π∗2 (ξ) + D(ξ) = ∂q∗2 (ξ) and

≥ κ1 , we get

By the same Proposition 2.2, it follows that the norm k · k+ ≡ q(·) on E such that q∗ (·) is the conjugate of q(·) is κ-smooth. At the same time, (28) implies (26). Lemma 3 allows to prove the following modification of Proposition 2.3.(i,ii):

9

Proposition 2.4 (i) Let p ∈ [2, ∞], and let (Ei , k · ki ) be finite-dimensional κ-regular spaces, i = 1, ..., m > 2. The space E = E1 × ... × Em equipped with the norm 1

m

k(x , ..., x )k =

à n X

!1/p kxi kpi

i=1

(the right hand side is max kxi ki when p = ∞) is κ++ -regular with i

2

2

κ++ = 2 min [κ + ρ − 1]m ρ − p ≤ 2 min[κ + p − 1, [κ + 2 ln(m) − 1] exp{1}]. 2≤ρ≤p

(29)

(ii) Let k · ki be κ-regular norms on a finite-dimensional space E. Then the norm m X

kxk =

kxki

i=1

is 2mκ-regular on E. Proof is readily given by Lemma 3 combined with the corresponding items of Proposition 2.3. E.g., to prove (i), note that by Lemma 3 we can find κ-smooth norms qi (·) on Ei such that qi2 (xi ) ≤ kxi k2i ≤ 2qi2 (xi ) for every i and all xi ∈ Ei . Applying Proposition 2.3.(i) to the spaces (Ei , qi (·)), we get that the µ n ¶ P p i 1/p 1 m norm q(x , ..., x ) = qi (x ) on E1 × ... × Em is κ+ -regular with κ+ given by (19). Taking into i=1

account the evident relation q 2 (x1 , ..., xm ) ≤ k(x1 , ..., xm )k2 ≤ 2q 2 (x1 , ..., xm ) and recalling the definition of regularity, we conclude that k · k is κ++ -regular, as required.

3

Sums of random vectors in regular spaces

In this Section, we consider the situation as follows. We are given a finite-dimensional κ-regular space (E, k · k), a Polish space Ω with Borel probability measure µ and a sequence F0 = {∅, Ω} ⊂ F1 ⊂ F2 ⊂ ... of σ-sub-algebras of the Borel σ-algebra of Ω. We denote by Et , t = 1, 2, ... the conditional expectation w.r.t. Ft , and by E ≡ E0 the expectation w.r.t. µ. The question we are interested in as follows: Given a martingale-difference {ξt }∞ t=1 with values in E, so that ξt is a Ft -measurable random vector in E such that Et−1 {ξt } ≡ 0, t = 1, 2, ..., what can we say about “typical norms” of the associated sums Sn =

n X

ξt .

t=1

In the sequel, we denote by k · k+ a κ+ -smooth norm on E which is (κ/κ+ )-compatible with k · k: kxk2 ≤ kxk2+ ≤ (κ/κ+ )kxk2 ∀x ∈ E, and set

p(x) = kxk2+ .

10

(30)

3.1

Bounds on second moments of kSn k

Our first observation is nearly tautological: Proposition 3.1 Assume that E-valued martingale-difference ξ = {ξt }∞ t=1 is square summable: © ª E kξt k2 ≤ σt2 < ∞. Then

n X © ª E kSn k2 ≤ κ σt2 .

(31)

t=1

Proof. Since k · k+ is κ+ -smooth, we have p(St+1 ) ≤ p(St ) + Dp(St )[ξt+1 ] + κ+ p(ξt+1 ) whence, taking expectations and making use of the fact that ξ is martingale-difference, © ª E {p(St+1 )} ≤ E {p(St )} + κ+ E {p(ξt+1 )} ≤ E {p(St )} + κE kξt+1 k2 (we have used the right inequality in (30)). From this recurrent inequality we get n n X X © ª © ª E kSn k2+ ≤ κ E kξt k2 ≤ κ σt2 . t=1

t=1

©

ª 2

The left hand side in this inequality, by (30), is ≥ E kSn k

3.2

, and (31) follows.

Large deviations for kSn k, I

Theorem 3.1 Let E-valued martingale-difference ξ = {ξt }∞ t=1 and reals σt > 0 be such that © ª Et−1 exp{kξt k2 σt−2 } ≤ exp{1}, t = 1, 2, ... Then for all n ≥ 1 and Ω ≥ 0 one has   v u n n   X X u √ ξt k > 15Ω κt σt2 ≤ 3 exp{−Ω2 }. Prob k   t=1

−1/2

ξt = σ ¯t ηt ,

Et−1

kηt k2+

(33)

t=1

Proof. 10 . Let us set σ ¯t = κ1/2 σt κ+ −1 −2 2 −1 (κκ+ )kξt k (κ κ+ σt ) = kξt k2 σt−2 , whence ©

(32)

ª

2 and ηt = ξt σ ¯t−1 , so that kηt k2+ ≤ (κκ−1 = + )kηt k

≤ exp{1},

Et−1 {ηt } = 0,

Sn =

n X

σ ¯t ηt .

(34)

t=1

Observe that by the moment inequality © ª 0 ≤ τ ≤ 1 ⇒ Et−1 exp{τ kηt k2+ } ≤ exp{τ }. Further, since exp{x} ≥

x` `!

for all x ≥ 0, it follows from (35) that © ª Et−1 kηt k2` + ≤ e`!, ` = 0, 1, ...

20 . Let ωn =

à n X t=1

11

(35)

(36)

!1/2 σ ¯t2

.

(37)

Let us prove by induction in n ≥ 1 that if 1 √ , 13 + 2 κ+

²≤ then (Pn ) :

(38)

© ª 1 ⇒ E exp{²2 τ p(Sn )} ≤ exp{ωn2 τ }, ωn2

0≤τ ≤

Base n = 1: evident in view of (35). Step n ⇒ n + 1: Assume that (Pn ) holds true. Denoting by k · k† is the norm on E ∗ conjugate to k · k+ and by hφ, xi the value of a linear functional φ ∈ E ∗ at a vector x ∈ E, we have: © ª E ©exp{²2 τ p(S n+1 )} ¤ ª £ 0 2 ≤ E ©exp{²2 τ p(Sn ) + σ ¯n+1 hp (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+ } £ ¤ ª (39) 2 2 ≤ E ©exp{²2 τ p(Sn )} exp{² τ σ ¯n+1 hp0 (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+ } ¤ ªª © £ 2 = E exp{²2 τ p(Sn )}En exp{²2 τ σ ¯n+1 hp0 (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+ } −2 Now let 0 ≤ τ ≤ ωn+1 . We have © £ 2 0

¤ ª

2 En exp{² τ σ ¯n+1 hp (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+ }

©

=

2 1 + ² 2 τ En σ ¯n+1 hp0 (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+



2 1 + ²2 τ En κ+ σ ¯n+1 kηn+1 k2+



2 1 + ²2 τ En κ+ σ ¯n+1 kηn+1 k2+

²2 τ E

©

ª

©

ª

©

2 κ+ σ ¯n+1 kηn+1 k2+

ª

∞ P

+

`=2 ∞

P

+

`=2 ∞

P

=

1+



2 En exp{2κ+ ²2 τ σ ¯n+1 kηn+1 k2+ } +

n

+

`=2

©

ª

1 E `! n 1 E `! n 1 E `! n

∞ P

`=2



2 exp{2κ+ ²2 τ σ ¯n+1 }

+

∞ P `=2



2 exp{2κ+ ²2 τ σ ¯n+1 }+

∞ P `=2



n£ n

1 E `! n



£

2 ²2 τ σ ¯n+1 hp0 (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+

h



¡

2 2²2 τ κ+ σ ¯n+1 kηn+1 k2+

n

1 (2²2 τ )` En `!

1` 2

¢` o

+

∞ P `=2

1

1 (2²2 τ )` En `!

o

©

¢` io

` kp0 (Sn )k`† σ ¯n+1 kηn+1 k`+

` (4p(Sn )) 2 ` σ ¯n+1 kηn+1 k`+

[since kp0 (u)k† ≤

o

¤¤` o

¤¤` o

` 2 (2²2 τ )` kp0 (Sn )k`† σ ¯n+1 kηn+1 k`+ + κ+ σ ¯n+1 kηn+1 k2+

³

2 exp{2κ+ ²2 τ σ ¯n+1 } + exp{ 21 }

∞ P

`=2

£

(4p(Sn ))

1 (2²2 τ σ ¯n+1 )` `!

+

2 ²2 τ σ ¯n+1 hp0 (Sn ), ηn+1 i + κ+ σ ¯n+1 kηn+1 k2+

n 1 (2²2 τ )` En `!

ª

ª

p

4p(u)]

` σ ¯n+1 kηn+1 k`+

´` 1

(4p(Sn )) 2

2 [we have used (35) combined with 2κ+ ²2 τ σ ¯n+1 ≤ 1] 1

(e`!) 2 [we have used (36)]

∞ P `=2

µ

·z £

θn+1

}|

7²2 τ σ ¯n+1

¤{2

¸` p(Sn )

`!

¶ 12

`

3− 2 (40)

Further, 2 exp{2κ+ ²2 τ σ ¯n+1 } + exp{ 12 }

´1 ∞ ³ P (θn+1 p(Sn ))` 2

`

3− 2 `=2 µ∞ ¶1 µ ∞ ¶ 12 P (θn+1 p(Sn ))` 2 P 1 2 2 −` ≤ exp{2κ+ ² τ σ ¯n+1 } + exp{ 2 } 3 `! `=2 `=2 µ∞ ¶ 12 P (θn+1 p(Sn ))` 2 ≤ exp{2κ+ ²2 τ σ ¯n+1 }+ `! `!

`=2

(41)

1

2 ≤ exp{2κ+ ²2 τ σ ¯n+1 } + (exp{θn+1 p(Sn )} − 1 − θn+1 p(Sn )) 2

30 . We need the following Lemma 4 Let 0 ≤ τ and θ = (7²2 τ σ ¯n+1 )2 . Then © ª 1 2 2 2 exp{2κ+ ²2 τ σ ¯n+1 } + (exp{θp(Sn )} − 1 − θp(Sn )) 2 ≤ exp 2κ+ ²2 τ σ ¯n+1 + 75²4 τ 2 σ ¯n+1 p(Sn ) 12

(42)

Proof. Observe, first, that for x ≥ 0 one has exp{x} − 1 − x ≤

x2 exp{x}, 2

(43)

and, second, that for x, y ≥ 0 one has (44)

exp{x}y ≤ exp{x + y} − 1. Therefore

1

2 exp{2κ+ ²2 τ σ ¯n+1 } + (exp{θp(Sn )} − 1 − θp(Sn )) 2 h i 21 2 2 ≤ exp{2κ+ ²2 τ σ ¯n+1 } + (θp(S2n )) exp{θp(Sn )}

[by (43)]

µ

¶ ©1 ª θp(Sn ) 2 2 √ exp = exp{2κ+ ² τ σ ¯n+1 } + θp(Sn ) 2 |2 {z } | {z } x y n o θp(Sn ) 1 2 2 ≤ exp{2κ+ ² τ σ ¯n+1 } + exp 2 θp(Sn ) + √2 −1

[byª (44)]¤ £ ©3 2 2 2 2 ≤ exp{2κ ² τ σ ¯ } + exp{2κ ² τ σ ¯ } exp θp(S + + n) − 1 n+1 n+1 2 © ª 2 = exp 2κ+ ²2 τ σ ¯n+1 + 32 θp(Sn ) 40 . Combining (39), (40), (41) and (42), we arrive at the relation −2 0≤ ªª ¤ © τ ≤ ω2n+1 ⇒ ª © ©£ 2 2 . p(Sn ) + 2κ+ ²2 τ σ ¯n+1 E exp{² τ p(Sn+1 )} ≤ E exp ²2 τ + 75²4 τ 2 σ ¯n+1 −2 Now let 0 ≤ τ ≤ ωn+1 and let

Then, setting ρ =

(45)

2 µ = τ + 75²2 τ 2 σ ¯n+1 .

2 σ ¯n+1 2 , ωn

µ ≤ =

2

2

75² σ ¯ 1 1 + ω4 n+1 = ω2 +¯ 2 2 ωn+1 n σn+1 n+1 2 1 1+ρ(1+75² ) 1 < ω2 2 ωn 1+2ρ+ρ2 n

+

2 75²2 σ ¯n+1 2 2 +¯ (ωn σn+1 )2

=

1 2 ωn

h

1 1+ρ

+

75²2 ρ (1+ρ)2

i

(since 75²2 ≤ 1). Consequently, by (Pn ) one has © ª © ª 2 E exp{[²2 τ + 75²4 τ 2 σ ¯n+1 ]p(Sn )} = E exp{²2 µp(Sn )} ≤ exp{µωn2 }, and (45) implies that 0≤τ ≤

1 2 ωn+1

© ª ⇒ E exp{²2 τ p(Sn+1 )}

where χ=

ª © 2 ≤ exp 2κ+ ²2 τ σ ¯n+1 + µωn2 © £ ¤ª 2 2 = exp τ 2κ+ ²2 σ ¯n+1 + [1 + 75²2 τ σ ¯n+1 ]ωn2 2 = exp{τ ωn+1 χ},

ρ 2 2 2κ+ ²2 ρ 1 + 75² (τ ωn+1 ) 1+ρ + . 1+ρ 1+ρ

2 Taking into account that 2κ+ ²2 ≤ 1/2, τ ωn+1 ≤ 1 and 75²2 ≤ 1/2, we get

χ≤

1 2ρ

1+ρ

+

1 + 12 ρ ≤ 1, 1+ρ

and (46) implies (Pn+1 ). 13

(46)

50 . Now we are ready to prove (33). Let us set ² = 15√1κ+ , so that (38) holds true. Then for Ω > 0 one has ( ( ) ) s s n n P P √ √ p 2 2 Prob kSn k > 15Ω κ σt ≤ Prob kSn k+ > 15Ω κ κ+ /κ σ ¯t t=1

t=1

= < ≤

[since © ª k · k+ ≥ k · k] −1 kS k > Ω² ω Prob n + n © ª E exp{²2 ωn−2 kSn k2+ } exp{−Ω2 } exp{1 − Ω2 }

(we have used (Pn ) with τ = ωn−2 ). Refinements in the case of bounded ξt . In this case, constants in Theorem 3.1 can be improved. Theorem 3.2 Let ξ = {ξt }∞ t=1 be a E-valued martingale-difference and σt > 0 be reals such that kξt k ≤ σt . Then   v u n  X  √ u Ω2 Prob kSn k > Ω κt σt2 ≤ exp{1 − }. (47)   4.4 t=1

The proof is completely similar to the one of Theorem 3.1. Note that a result similar to the one of Theorem 3.2 can be easily derived from Talagrand Inequality. Here is this inequality (in slightly extended form presented in [2]): Theorem 3.3 Let (Et , k · kEt ) be finite-dimensional normed spaces, t = 1, ..., n, F be the direct product s n P of E1 , ..., En equipped with the norm k(x1 , ..., xn )kF = kxt k2Et , µt be Borel probability distributions t=1

on the unit balls of Et and µ be the product of these distributions. Given a closed convex set A ⊂ F , let dist(x, A) = min kx − ykF . Then y∈A

¾

½ Eµ

1 exp{ dist2 (x, A)} 4



1 . µ(A)

(48)

Talagrand Inequality immediately implies the following result: Theorem 3.4 Let (E, k · k) be κ-regular space, and let ξ1 , ..., ξn be independent random vectors in E with zero means and σt be reals such that kξt k ≤ σt , t = 1, ..., n. Then   v u n  uX  √ Ω2 (49) Ω ≥ 2 2κ ⇒ Prob kSn k > Ωt σt2 ≤ 2 exp{− }.   32 t=1

s 1

n

Proof. Let F be the direct product of n copies of E equipped with the norm k(x , ..., x )kF = let ζt = (2σt )−1 ξt , and let Q be the set of all x = (x1 , ..., xn ) ∈ F such that S(x) ≡ 2

n P t=1

n P

kxt k2 ,

t=1

σt xt ∈ E satisfies

kS(x)k ≤ 1. s Note that Q is a closed convex set in F , and that kS(x)k ≤ r if and only if x ∈ rQ. n P Let Θ = σt2 . Our first observation is that Q contains k·kF -ball of the radius ρ = (2Θ)−1 centered t=1 s n n P P at the origin. Indeed, if k(x1 , ..., xn )kF ≤ ρ, then kS(x)k ≤ 2σt kxt k ≤ 2Θ kxt k2 ≤ 2Θρ = 1. t=1

14

t=1

Next observe that if ζ = (ζ1 , ..., ζn ), then for every γ > 0 one has Prob {kS(ζ)k > γ} ≡ Prob {ζ 6∈ γQ} ≤ Indeed, we have S(ζ) =

n P t=1

2σt ζt =

n P t=1

κΘ2 . γ2

(50)

© ª ξt ; by Proposition 3.1, we have E kS(ζ)k2 ≤ κΘ2 , and (50)

follows from the Tschebyshev inequality. √ Let us fix γ > κΘ and set A = γQ; note that A is closed convex set in F symmetric w.r.t. the origin and containing the centered at the origin k · kF -ball of radius γρ; besides this, Prob {ζ ∈ A} ≥ 1 −

κΘ2 >0 γ2

(51)

by (50). Observe that s > 1, x ∈ F \(sA) ⇒ dist(x, A) > (s − 1)γρ.

(52)

Indeed, for s, x from the premise of this implication, the set B = x + (s − 1)A does not intersect A; since A contains the k · kF -ball of radius γρ centered at the origin, B contains k · kF -ball of the radius (s − 1)γρ centered at x. Since B ∩ A = ∅, the conclusion in (52) follows. Applying (48) to the distribution of ζ, we get ½ ¾ 1 1 1 2 E exp{ dist (ζ, A)} ≤ ≤ 4 Prob{ζ ∈ A} 1 − κΘ2 γ −2 (we have used (51)). In view of (52), this bound implies s > 1 ⇒ Prob {ζ 6∈ sA = sγQ} ≤ Since ζ 6∈ αQ if and only if k

n P t=1

∀(s > 1, γ >



1 γ 2 ρ2 (s − 1)2 exp{− } 1 − κΘ2 γ −2 4

(53)

ξt k > α, we arrive at (

κΘ) : Prob k

n X

) ξt k > γs

t=1



1 γ 2 (s − 1)2 exp{− } 1 − κΘ2 γ −2 8Θ2

√ √ √ (we have substituted the value of ρ). Given Ω ≥ 2 2κ and setting γ = 2κΘ s = Ω/ 2κ, we arrive at (49).

3.3

Large deviations for kSn k, II

Theorem 3.5 Let α ∈ (0, 2], and let E-valued martingale-difference ξ = {ξt }∞ t=1 and reals σt > 0 be such that © ª Et−1 exp{kξt kα σt−α ≤ exp{1}, t = 1, 2, ... (54) Then for all n ≥ 1 and Ω ≥ 0 one has   v u n n  X X  √ u Prob k ξ t k > Ω κt σt2 ≤ Cα exp{−Cα−1 Ωα },   t=1

(55)

t=1

where Cα ≥ 2 depends solely on α ∈ (0, 2] and is continuous in α > 0. In particular, for appropriately chosen cα > 0 depending solely on α ∈ (0, 2] and continuous in α, one has for all n ≥ 1:   n P   α   k ξ k t   t=1 } ≤ exp{1}. E exp{ (56) √ pPn 2 α   (cα κ   t=1 σt )  

15

Proof. 10 . Let ρ ≥ (2/α)1/α . Let us set ηt = χ{kξt k>σt ρ} ξt , ζt = ηt − Et−1 {ηt }, ωt = ξt − ζt . | {z } δt

Observe that Et−1 {ζt } = 0,

(57)

Et−1 {ωt } = 0.

(58)

whence, due to Et−1 {ξt } = 0, also 0

2 . We have kδt k



© ª Et−1 ©χ{kξt k>σt ρ} kξt k £ ¤ª α −α α −α Et−1 · exp{kξt k σt }σ ¸ t exp{−kξt k σt }[kξt k/σt ]χ{kξt k>σt ρ} © ª σt max [z exp{−z α }] Et−1 exp{kξt kα σt−α }



σt exp{1} max [z exp{−z α }]

=

σt exp{1}ρ exp{−ρα }

≤ =

z≥ρ

(59)

z≥ρ

[due to ρ ≥ α−1/α ] Consequently, kωt k

= ≤

kξt − [ηt − δt ]k ≤ kξt − ηt k + kδt k σt exp{1}ρ exp{−ρα } + kξt kχ{kξt k≤σt ρ} .

(60)

Setting σ bt = 2σt ρ

2−α 2

,

we have from (60):

≤ ≤ ≤ ≤ ≤ ≤ It follows that with

© © ªª Et−1 nexp nkωt k2 σ bt−2 oo £ ¤2 Et−1 exp 0.25 σt exp{1}ρ exp{−ρα } + kξt kχ{kξt k≤σt ρ} σt−2 ρα−2 © © £ ¤ªª Et−1 ©exp ©0.5 £exp{2} exp{−2ρα }ρα + [kξt k2 σt−2 ]ρα−2 χ{kξt k≤σ t ρ} ¤ªª −α Et−1 ©exp £0.5 exp{2} exp{−2ρα }ρα + [kξt kα {kξt k≤σ t ρ} ©σt ]χ ª¤ª α α Et−1 max exp {exp{2} exp{−2ρ }ρ } , exp [kξt kα σt−α ] exp {exp{2} exp{−2ρα }ρα } + exp{1} exp{0.5 exp{1}} + exp{1} ≤ exp{2}. √ 2−α σ et = 2 2ρ 2 σt

(61)

one has

© ª Et−1 exp{kωt k2 σ et−2 } ≤ exp{1}. © ª 30 . We have kEt−1 {ηt } k2 ≤ Et−1 kηt k2 , whence ª © ª © ª © 2 2 Et−1 kζt k2 ≤ Et−1 4kη = 4Et−1 kξ tk t k χ{kξt k>σt ρ} ¤ª © £ = 4σt2 Et−1 ½exp{kξt kα σt−α } kξt k2 σt−2 exp{−kξt¾ kα σt−α }χ{kξt k>σt ρ} £ ¤ ≤ 4σt2 Et−1 exp{kξt kα σt−α } max z 2 exp{−z α }

(62)

z≥ρ

= 4σt2 exp{1}ρ2 exp{−ρα } Thus,

[since ρ ≥ (2/α)1/α ]

© ª Et−1 kζt k2 σt−2 ≤ 4 exp{1}ρ2 exp{−ρα }.

Thus, ζt is Ft -measurable random vector (by its origin) such that © ª Et−1 {ζt } = 0, Et−1 kζt k2 σt−2 ≤ 4 exp{1}ρ2 exp{−ρα }, 16

(63)

(see (57)). Besides this, ωt is Ft -measurable random vector (by its origin) such that © ª Et−1 {ωt } = 0, Et−1 exp{kωt k2 σ et−2 } ≤ exp{1}, (see (58), (62)). 40 . Applying Theorem 3.1 to random vectors ω1 , ..., ωn and taking into account (64), we get   v u n n  X X  u √ Prob k ωt k ≥ ρα/2 κt σ et2 ≤ C1 exp{−C2 ρα }   t=1

(64)

(65)

t=1

(from now on, Ci are appropriate positive absolute constants), whence, recalling (61),   v u n n   X √ √ uX Prob k σt2 ≤ C3 exp{−C4 ρα }. ωt k ≥ 2 2ρ κt  

(66)

t=1

t=1

Further, by (63) and Proposition 3.1 as applied to random vectors ζ1 , ..., ζn , we have ( n ) Ã n ! X X 2 2 E k ζt k ≤ C5 κ σt ρ2 exp{−ρα }, t=1

t=1

whence by Tchebyshev inequality   v u n n  X X  √ u ζ t k ≥ ρ κt Prob k σt2 ≤ C6 exp{−C7 ρα }.   t=1

t=1

Combining this inequality with (65) and taking into account that ξt = ωt + ζt , we conclude that   v u n n  X  √ √ uX Prob k ξt k ≥ [1 + 2 2]ρ κt σt2 ≤ C7 exp{−C8 ρα }   t=1

t=1

whenever ρ ≥ (2/α)1/α , and (55) follows. √ pPn 2 (56) is an immediate corollary ofP(55). Indeed, let us fix n, and let D = κ t=1 σt . For c > 0, n α α considerPthe random variable θc = k t=1 ξt k /(cD) . By (55), for t > 0 we have ψ(t) ≡ Prob{θc > t} = n Prob{k t=1 ξt k > ct1/α D} ≤ Cα exp{−Cα−1 cα t}; setting c = (2Cα )1/α , we get ψ(t) ≤ Cα exp{−2t}, whence Z∞ E{exp{θc }} = −

Z∞ exp{t}dψ(t) = 1 +

0

Z∞ exp{t}ψ(t)dt ≤ 1 + Cα

0

exp{−t}dt = 1 + Cα . 0

½ ¾ Pn k ξt k α Thus, E exp{ 2Ct=1 } ≤ 1 + Cα , whence, by Moment Inequality, (56) holds true with cα = α αD (2Cα ln(1 + Cα ))1/α .

4

Refinements in Gaussian case

We are about to refine the above results for the case when ξ = {ξt }∞ t=1 is a sequence of independent Gaussian random vectors with zero mean in a finite-dimensional normed space (E, k · k).

17

4.1

The basic fact

We start with the following fact which seems to be important by its own right. Let Z∞ Z φ(r) 1 1 2 Φ(t) = √ exp{−s2 /2}ds = r. exp{−s /2}ds, φ(r) : √ 2π 2π −∞ t

Proposition 4.1 Let η ∼ N (0, Ik ), and let B be a closed convex set in Rk such that Prob{η ∈ B} ≥ θ > Then

1 . 2

½ 2 ¾ φ (θ) 0 < α < 1 ⇒ Prob{αη ∈ B} ≥ 1 − exp − . 2α2

(67)

(68)

Equivalently: for a closed and convex set B and ζ ∼ N (0, Σ) one has Prob {ζ 6∈ B} ≤ δ
1. 2 2

(69)

Proof is based on the following fact [1]: (!) For every γ > 0, ² ≥ 0 and every closed set X ⊂ Rk such that Prob{η ∈ X} ≥ γ one has Prob {dist(η, X) > ²} ≤ Φ(φ(γ) + ²) where dist(a, X) = min ka − xk2 . x∈X

Now let η, ζ be independent N (0, Ik ) random vectors, and let p(α) = Prob{αη 6∈ B}. The vector αη +



1 − α2 ζ is N (0, Ik ), so that p Prob{dist(αη + 1 − α2 ζ, B) > t} ≤ Φ(φ(θ) + t)

(70)

by (!). On the other hand, let αη 6∈ B, and let e = e(η) be a unit vector such that eT [αη] > max eT x. If x∈B √ √ ζ is such that 1 − α2 eT ζ > t, then dist(αη + 1 − α2 ζ, B) > t, whence n o p p αη 6∈ B ⇒ Prob ζ : dist(αη + 1 − α2 ζ, B) > t ≥ Φ(t/ 1 − α2 ), √ whence for all t ≥ 0 such that δ(t) ≡ φ(θ) + t − t/ 1 − α2 ≥ 0 one has √ √ p(α)Φ(t/ 1 − α2 ) ≤ Prob{dist(αη + 1 − α2 ζ, B) > t} ≤ Φ(φ(θ) + t) R∞ 2 ⇒ p(α)



Φ(φ(θ)+t) √ Φ(t/ 1−α2 )

=

t/

exp{−(s+δ(t)) /2}ds



1−α2 ∞

R



t/

R∞ √

=

t/

exp{−s2 /2}ds

1−α2

exp{−s2 /2−sδ(t)−δ 2 (t)/2}ds

1−α2

R∞ √

t/

Setting in the resulting inequality t =

exp{−s2 /2}ds

1−α2

φ(θ)(1−α2 ) , α2

we get

p(α) ≤ exp{−

18

φ2 (θ) }. 2α2

√ ≤ exp{−tδ(t)/ 1 − α2 − δ 2 (t)/2}.

4.2

Gaussian version of Theorem 3.1

Theorem 4.1 Let (E, k · k) be κ-regular and let ξ1 , ξ2 , ... be independent Gaussian random vectors in E with zero means. Setting © ª δt2 = E kξt k2 , one has

  v u n n  X X  √ u Ω2 Ω ≥ 3 ⇒ Prob k ξ t k > Ω κt δt2 ≤ exp{− }.   12.1 t=1

(71)

t=1

Proof. Let U be the unit ball of k · k. By Proposition 3.1, we have n X © ª E kSn k2 ≤ κ δt2 , t=1

| {z } 2 ωn

whence by Tschebyshev inequality for every β >



2 one has

ª © 1 1 Prob ωn−1 Sn 6∈ βU ≤ 2 < . β 2 Applying (69) (which is legitimate since Sn is Gaussian with zero mean), we arrive at © ª 1 1 φ2 (1 − β −2 )γ 2 Prob ωn−1 Sn 6∈ γβU ≤ 2 < exp{− } ∀γ > 1, β 2 2 √ or, which is the same, whenever Ω > 2, one has   v u n  X  √ u φ2 (1 − β −2 )Ω2 Prob kSn k > Ω κt δt2 ≤ √ inf exp{− }   2β 2 2 Ω κ δt2 ≤ 3 exp{− 225 t=1

in the bound given by Theorem 4.1 is

Ω2 exp{− 12.1 }.

19

5

Extensions to semi-scalar case

In this section we extend the results for Gaussian case to the situation of random sums of the form Sn =

n X

ζt ft

t=1

with deterministic vectors ft ∈ E and independent random scalars ξt which are symmetrically distributed on the axis with ”light tail” of the distributions. Since multiplying ξt by deterministic positive reals and dividing ft by the same reals does not affect the situation, in the sequel we normalize ξt by the condition © ª (73) ξt ∼ −ξt , E exp{4ξt2 } ≤ exp{1}.

5.1

Basic results

We start with results which can be viewed as modifications of Proposition 4.1. Proposition 5.1 Let ξt be independent and symmetrically distributed random reals such that E{ξt2 } ≥ σ 2 > 0, t = 1, ..., n

(74)

and either (i) |ξt | ≤ 1/2, t = 1, ..., n, or ª © (ii) E exp{4ξt2 } ≤ exp{1}, t = 1, ..., n, and let ξ = (ξ1 , ..., ξt ). Let, further, A be a closed convex symmetric w.r.t. the origin set in E such that σ4 . 12σ 4 + 1

(75)

½ ¾ (ϑ − 1)2 σ 2 Prob {ξ 6∈ ϑA} ≤ 2 exp − 8

(76)

Prob{Sn ∈ A} ≡ µ > ν ≡ 1 − Then for every ϑ > 1 one has in the case of (i):

in the case of (ii): Prob{ξ 6∈ ϑA} ≤ O(1) exp{−O(1)

σ 2 (ϑ − 1)2 }, σ(ϑ − 1) + ln n

(77)

with properly chosen positive absolute constants O(1). Proof. A. We start with the following Lemma 5 Under the premise of (ii) (and thus under the premise of (i) as well), the set A contains the centered at the origin k · k2 -ball of the radius σ ρ= √ . 2

(78)

Proof. Assume, on the contrary to what should be proved, that there exists a 6∈ A with kak2 = ρ. Then there exists a vector p, kpk2 = 1, such that pT x < pT a ≤ ρ for all x ∈ A; since A is symmetric w.r.t. the origin, we have (79) max |pT x| < ρ. x∈A P Consider the random variable ζ = |pT ξ|, and let θ = p2i E{ξi2 }, so that θ ≥ σ 2 . We have i

E{ζ 4 } = 6

X

p2i p2j E{ξi2 }E{ξj2 } +

i<j

X i

20

p4i E{ξi4 } ≤ γ(θ) ≡ 3θ2 +

1 4

p2

(since p4i E{ξi4 } ≤ 8i [E{exp{4ξi2 }} − 1] ≤ 4 Prob{ζ ∈ A} > ν ≡ 1 − 12σσ4 +1 , we get

p2i 4

due to t4 ≤ 18 [exp{4t2 } − 1]). Recalling that Prob{ζ ≤ ρ} ≥

θ = E{ζ 2 } ≤ ρ2 Prob{0 ≤ ζ ≤ ρ} +

p

p

E{ζ 4 }

Prob{ζ > ρ} < ρ2 +

p

(1 − ν)γ(θ),

whence

p ρ2 > θ − (1 − ν)γ(θ) ≡ φ(θ). (80) √ √ 3θ 2 +1/4 Observe that φ0 (θ) ≥ 0 provided that 1 − ν ≤ , which definitely is the case when ν > 23 , 3θ as guaranteed by the origin of ν (note that σ ≤ 1 due to E{exp{4ζt2 }} ≤ exp{1}). Consequently, (80) combines with θ ≥ σ 2 to imply that p √ ρ2 > φ(σ 2 ) = σ 2 − 1 − ν 3σ 4 + 1/4, whence, by (75), ρ2 > σ 2 /2, which is a contradiction. B. Recall that by Talagrand Inequality, for a sequence of independent random real variables ξi , i = 1, ..., n, taking values in [−1/2, 1/2] and a closed set A in Rn one has ½ ¾ 1 1 2 E exp{distk·k2 (ξ, Conv(A))} ≤ . (81) 4 Prob{ξ ∈ A} C. W.l.o.g., let A be a compact set; by Lemma 5, A contains k · k2 -ball of radius ρ centered at the origin. Let ϑ > 1 and x 6∈ ϑA. Consider the norm k · k in which A is the unit ball; since x 6∈ ϑA, the k · k-ball B of the radius ϑ − 1 centered at x does not intersect A. Since the k · k2 -ball of the radius ρ(ϑ − 1), centered at x, is contained in B, this ball does not intersect A as well. Thus, (82)

x 6∈ ϑA ⇒ distk·k2 (x, A) ≡ distk·k2 (x, Conv(A)) > ρ(ϑ − 1). D. Assume that (i) is the case. Combining (81) with (82), we arrive at Prob {ξ 6∈ ϑA} ≤ µ−1 exp{−ρ2 (ϑ − 1)2 /4}

with ρ given by (78), as required in (76). E. Now assume that (ii) is the case. Let L > 0, let Ξ be the event {ξ : kξk∞ ≤ L/2} and p be the probability of the event {ξ 6∈ Ξ}. We have p≤

n X

Prob{|ξi | > L/2} ≤

i=1

X

ª © E exp{4ξi2 − L2 } ≤ n exp{1 − L2 }.

(83)

i

Applying (81) to the conditional, by the condition ξ ∈ Ξ, distribution of ξ (which is again a distribution with independent coordinates), we get ½ ¾ dist2k·k2 (ξ,A) ¯¯ 1−p n1 ¯ o ≤ µ−p E exp{ }Ξ ⇒ ≤ 4L2 Prob ξ∈A¯Ξ ¾ ½ 2 dist2k·k2 (ξ,A) }χ E exp{ ≤ (1−p) ξ∈Ξ 4L2 µ−p ⇒ ¾ ½ dist2k·k2 (ξ,A) ρ2 (ϑ−1)2 ≥ Prob{ξ ∈ Ξ & ξ 6∈ ϑA} ≤ Prob ξ ∈ Ξ & 4L2 4L2 ½ ¾ 2 2 2 distk·k2 (ξ,A) }E exp{ }χ ≤ exp{− ρ (ϑ−1) ξ∈Ξ 4L2 4L2 2 2 2 (ϑ−1)2 (1−p)2 } µ−p = exp{− σ (ϑ−1) } (1−p) 4L2 8L2 µ−p 2 2 2 exp{− σ (ϑ−1) } (1−p) 8L2 µ−p

≤ exp{− ρ Prob{ξ 6∈ ϑA} ≤

p+

2



Assuming σ 2 (ϑ − 1)2 ≥ 4, let L2 = ln n + σ(ϑ − 1). With this L, the resulting bound combines with (83) to imply (77). 21

Proposition 5.1 describes a family of probability distributions P on Rn with the following common property: for every closed convex set A centered at the origin, the “probability mass” P (γA) of γenlargement of A rapidly approaches 1 as γ grows, provided that P (A) is not small. A shortcoming of the representation of this phenomenon as given by Proposition 5.1 is that that what is and what is not small depends on the parameter σ; for small value of the parameter, “not small” actually means “close to 1”. We are about to present a slightly modified version of Proposition 5.1 in which every fixed positive value of P (A) “is not small”. Proposition 5.2 Let ξt be independent and symmetrically distributed random reals, and let the distributions of ξt possess densities pt (·) such that 1 pt (·) ≤ √ 3 3σ

(84)

and either (i) |ξt | ≤ 1/2, t = 1, ..., n, or © ª (ii) E exp{4ξt2 } ≤ exp{1}, t = 1, ..., n, and let ξ = (ξ1 , ..., ξt ). Let, further, A be a closed convex symmetric w.r.t. the origin set in E, and let µ ≡ Prob{Sn ∈ A} > 0. Then for every ϑ > 1 one has in the case of (i): Prob {ξ 6∈ ϑA} ≤

(85)

½ ¾ 1 µ6 σ 6 (ϑ − 1)2 exp − µ 256

(86)

in the case of (ii): Prob{ξ 6∈ ϑA} ≤ µ−1 exp{−O(1)

µ6 σ 6 (ϑ − 1)2 } µ3 σ 3 (ϑ − 1) + ln(n/µ)

(87)

with properly chosen positive absolute constant O(1). Proof. A. We start with the following Lemma 6 Let a ∈ Rn , kak2 = 1, and let ζ = aT ξ. Then ρ ∈ (0, 1/2] ⇒ Prob {|ζ| ≤ ρ} ≤ 2ρ1/3 σ −1 .

(88)

σt2 ≡ E{ξt2 } ≥ σ 2 .

(89)

Proof. Observe, first, that √

Indeed, for every δ ≤ 3 23σ , we have Prob{|ξt | > δ} ≥ 1 − 3√2δ3σ , whence σt2 ≥ δ 2 (1 − 3√2δ3σ ). Maximizing over δ, we arrive at (89). © ª Since pt (·) are even and E exp{4ξt2 } ≤ exp{1}, the generation functions φt (y) = E{exp{ixy}} are © ª © ª (4) real-valued, even and C∞ ; besides this, φ00t (0) = −σt2 ≤ −σ 2 , |φt (y)| ≤ E ξt4 ≤ 18 E exp{4ξt2 } − 1 ≤ e−1 8 . Consequently, for |y| ≤ 5σ the remainder in the third order Taylor expansion of φt (y), taken at the 1 e−1 4 2 2 origin, does not exceed 24 8 y ≤ σ y /4, whence |y| ≤ 5σ ⇒ φt (σ) ≤ 1 − σt2 y 2 /2 + σ 2 y 2 /4 ≤ 1 − σ 2 y 2 /4 ≤ exp{−σ 2 y 2 /4}. Now let α = max |at | and φ(y) be the generating function of ζ: φ(y) = t

|y| ≤ 5σα−1 ⇒ φ(y) ≤

n Y

n Q t=1

φt (at y). By (90), we have

exp{−a2t σ 2 y 2 /4} ≤ exp{−σ 2 kak22 y 2 /4}.

t=1

22

(90)

(91)

Now let 1/2 ≥ ρ > 0. Consider the function h(x) = √12ρ χ|x|≤2ρ along with the function g = h ∗ h (∗ stands for convolution). Function g clearly is nonnegative and g(x) ≥ 1 when |x| ≤ ρ. Observe that the 2 Fourier transform of g is the function 2 sinρy(ρy) ∈ [0, 1]. Denoting by p(·) the density of ζ, we have 2 Prob{|ζ| ≤ ρ} ≤ ≤

R

R 2 1 p(x)g(x)dx = 2π φ(y) 2 sinρy(ρy) dy 2 R R∞ 2 1 φ(y) 2 sinρy(ρy) dy + π1 2 2π |y|≤5σα−1



1 π

R

5σα−1

2 2

2

exp{− σ4ρz2 } sinz2(z) dz +

2α 5πρσ



1 π

R

2 ρy 2 dy



1 2π 2 2

R

2 2

2

exp{− σ 4y } 2 sinρy(ρy) dy + 2

exp{− σ4ρz2 }dz +

2α 5πρσ

=

√4ρ 2πσ

+

2α 5πρσ

2α 5σρ

Besides this, the uniform norm of the density of ζ clearly does not exceed the minimum, over t, of the uniform norms of the densities of at ξt , that is, it does not exceed 3√13σα . We conclude that ¸ · 2ρ 4ρ 2α (92) Prob {|ζ| ≤ ρ} ≤ min √ ,√ + 3 3σα 2πσ 5σρ In the case of α ≥ ρ2/3 , (92) yields 2ρ1/3 Prob {|ζ| ≤ ρ} ≤ √ . 3 3σ Now let α < ρ2/3 . Invoking (92), we have Prob{|ζ| ≤ ρ} ≤ Prob{|ζ| ≤ ρ

1/3

· ¸ · ¸ 1 4ρ1/3 2α ρ1/3 4 2 √ √ }≤ + 1/3 ≤ + ≤ 2ρ1/3 σ −1 . σ σ 5ρ 2π 2π 5

Thus, in all cases Prob {|ζ| ≤ ρ} ≤ 2ρ1/3 σ −1 , as claimed. B. We now claim that under the premise of Proposition 5.2, A contains the centered at the origin k · k2 -ball of radius ρ=(µσ 2)3 . Indeed, otherwise, same as in the proof of Lemma 5, we could find a vector p, kpk2 = 1 and ρ0 < ρ such that A is contained in the stripe {x : |pT x| < ρ}, that is, with ζ = pT ξ one has Prob{|ζ| < ρ0 } ≥ Prob{ξ ∈ A} = µ. On the other hand, ρ0 < ρ ≤ 1/2, where the latter inequality follows from the fact that σ ≤ 1 (indeed, σ 2 ≤ σt2 by (89), while σt2 = E{ξt2 } ≤ E{ 18 [exp{4ξt2 } − 1]} ≤ e−1 8 ). Applying Lemma 6, we get Prob{|ζ| < ρ0 } ≤ 2(ρ0 )1/2 σ < µ, which is a contradiction. C. Now we can complete the proof in exactly the same way as in the case of Proposition 4.1. Specifically, same as in item C of the latter proof, relation (82) with ρ given by B holds true. In the case of (i) this observation combines with Talagrand Inequality (81) to yield the relation Prob {ξ 6∈ ϑA} ≤

1 (ϑ − 1)2 ρ2 1 (ϑ − 1)2 ρ2 1 µ6 σ 6 (ϑ − 1)2 exp{− } = exp{− } = exp{− }, µ 4 µ 4 µ 256

as required in (77). In the case of (ii), the same reasoning as in item E of the proof of Proposition 4.1 results in (87).

5.2

Semi-scalar version of Theorem 3.1

Theorem 5.1 Let ft be deterministic vectors from a normed finite-dimensional space (E, k · k) and ξt be independent symmetrically distributed random scalars such that E{ξt2 } ≥ σ 2 > 0, t = 1, ..., n and either (i) |ξt | ≤ 12 , t = 1, ..., n, or © ª (ii) E exp{4ξt2 } ≤ exp{1}, t = 1, ..., n. Assume that ( E k

n X

) 2

ξt ft k

t=1

23

< Θ2 .

(93)

Then in the case of (i):

( Prob k

n X

) ξt ft k ≥ ΩΘ

≤ O(1) exp{−O(1)σ 6 Ω2 }

(94)

t=1

with appropriate positive absolute constants O(1); in the case of (ii): ( n ) X Prob k ξt ft k ≥ ΩΘ ≤ O(1) exp{−O(1) t=1

σ 6 Ω2 } ln n + σ 3 Ω

Proof. Let ξ = (ξ1 , ..., ξn ), B = {y ∈ E : kyk ≤ rΘ} and A = {s ∈ Rn :

n P t=1

(95)

st ft ∈ B}. Then A

is a convex closed symmetric w.r.t. the origin set such that Prob{ξ ∈ A} ≥ 1 − r−2 by (93). Setting 4 4 r2 = 12σσ4+1 , we get Prob {ξ ∈ A} > ν ≡ 1 − 12σσ4 +1 . Applying Proposition 5.1, we get • In the case of (i): Prob{k

n P t=1

© ª = Prob ξ 6∈ Ωr−1 A ≤ O(1) exp{−O(1)Ω2 r−2 σ 2 }

ξt ft k > ΩΘ}

≤ O(1) exp{−O(1)σ 6 Ω2 }.

• in the case of (ii): n P

Prob{k

t=1

ξt ft k > ΩΘ}

=

© ª 2 −2 2 r σ Prob ξ 6∈ Ωr−1 A ≤ O(1) exp{−O(1) lnΩn+Ωr −1 σ }



Ω O(1) exp{−O(1) ln σn+σ 3 Ω }.

6

2

References [1] Borell, C., “The Brunn-Minkowski inequality in Gauss space” – Inventiones Mathematicae 30 (2) (1975), 207-216. [2] Johnson, W.B., Schechtman, G., “Remarks on Talagrand’s deviation inequality for Rademacher functions”, Banach Archive 2/16/90, Springer Lecture Notes 1470 (1991), pp. 72-77. [3] Nemirovski, A., “On tractable approximations of randomly perturbed convex constraints” – Proceedings of the 42nd IEEE Conference on Decision and Control Maui, Hawaii USA, December 2003, 2419-2422.

6

Appendix: Proof of Proposition 2.1

Let {fk (t)} be a sequence of polynomials converging to f , along with the first and the second derivatives, PN j uniformly on every compact subset of ∆. For a polynomial p(t) = j=0 pj t the function P (X) = P Tr( j pj X j ) is a polynomial on Sn . Let now X, H ∈ Sn , let λs = λs (X) be the eigenvalues of X, b be such that H = U HU b T . We have X = U Diag{λ}U T be the eigenvalue decomposition of X, and let H P (X) = DP (X)[H] =

Pn p(λs (X)) s=1 PN P Pn N −1 b ss Tr( j=1 s=0 X s HX N −s−1 = Tr(p0 (X)H) = s=1 p0 (λs (X))H

(a) (b)

Further, let γ be a closed contour in the complex plane encircling all the eigenvalues of X. Then H 0 1 p (z)Tr((zI − X)−1 H)dz DP (X)[H] = Tr(p0 (X)H) = 2πı 2

⇒ D P (X)[H, H] =

1 2πı

H

γ

0

p (z)Tr((zI − X)−1 H(zI − X)−1 H)dz =

γ

24

1 2πı

H Pn γ

2 0 bst H p (z) s,t=1 (z−λs )(z−λt ) dz.

(96)

Computing the residuals, we get 2

D P (X)[H, H] =

X

( 2 b st Γs,t [p]H ,

Γs,t [p] =

s,t

p0 (λs )−p0 (λt ) , λs −λt 00 p (λs ),

λs = 6 λt λ s = λt

(97)

Substituting p = fk into (96.a, b) and (97), we see that the sequence of polynomials Fk (X) = Tr(fk (X)) converges, along with the first and the second order derivatives, uniformly on compact subsets of Xn (∆); by (96.a), the limiting function is exactly F (X). We conclude that F (X) is C2 on Xn (∆) and that the first and the second derivatives of this function are limits, as k → ∞, of the corresponding derivatives of b T ∈ Sn we Fk (X), so that for X = U Diag{λ}U T ∈ Xn (∆) (where U is orthogonal) and every H = U HU have P 0 b ss = Tr(f 0 (X)H) DF (X)[H] = f (λs )H Ps (98) 2 b2 D F (X)[H, H] = s,t Γs,t [f ]Hst So far, we did not use (12). Invoking the right inequality in (12), we get i 00 00 P h P P b2 P b2 (λt ) 2 b st D2 F (X)[H, H] ≤ s,t θ+ f (λs )+f + µ H = θ+ s f 00 (λs ) t H + st + µ+ s,t Hst 2 00 00 2 2 00 2 2 b b = θ+ Tr(Diag{f (λ1 ), ..., f (λn )}H ) + µ+ Tr(H ) = θ+ Tr(f (X)H ) + µ+ Tr(H ), which is the right inequality in (13). The derivation of the left inequality in (13) is similar.

25