An extension of McDiarmid's inequality

Report 5 Downloads 97 Views
An extension of McDiarmid’s inequality Richard Combes*

arXiv:1511.05240v1 [cs.LG] 17 Nov 2015

Abstract We derive an extension of McDiarmid’s inequality for functions f with bounded differences on a high probability set Y (instead of almost surely). The behavior of f outside Y may be arbitrary. The proof is short and elementary, and relies on an extension argument similar to Kirszbraun’s theorem [4]. Keywords: Concentration Inequalities; McDiarmid’s inequality ; Bounded differences. AMS MSC 2010: NA.

1

Introduction Qn

Consider sets (X1 , ..., Xn ) and their product X = i=1 Xi . Consider independent random variables X1 , ..., Xn with Xi ∈ Xi . Define the corresponding vector X = (X1 , ..., Xn ) ∈ X , and a function f : X → R with expectation µ = E[f (X)]. For each vector Pn c ∈ (R+ )n , define the sum of its components c¯ = i=1 ci and the following distance Pn dc (x, y) = i=1 ci 1{xi 6= yi }. We say that f has c-bounded differences on X if and only if |f (x) − f (y)| ≤ dc (x, y) for all (x, y) ∈ X 2 . It is noted that f has c-bounded differences if and only if |f (x) − f (y)| ≤ ci for all (x, y) ∈ X 2 such that xj = yj for all j 6= i. McDiarmid’s inequality states that if f has c-bounded differences then f (X) concentrates around its expected value. Proposition 1.1 (McDiarmid, [7]). If f has c-bounded differences on X , then for all  ≥ 0:

 22 P[f (X) ≥  + µ] ≤ exp − Pn

2 i=1 ci

 .

In the present work we consider the case where the finite differences property only holds on a subset Y ⊂ X . Typically Y will be such that X ∈ Y with high probability, so that f has "bounded differences with high probability". The behaviour of f outside of Y can be arbitrary. We define p = 1 − P[X ∈ Y] the probability of X \ Y , and m = E[f (X)|X ∈ Y] the expectation of f conditional to X ∈ Y . Assumption 1.2. There exists c ∈ (R+ )n , such that |f (x) − f (y)| ≤ dc (x, y) for all (x, y) ∈ Y 2 . It should first be noted that, in general, under assumption 1.2, f (X) does not concentrate around its expected value µ. Consider the following elementary counter-example: X = {0, 1}n , Xi ∼ Ber(1/2), Y = X \ {(0, ..., 0)}, and f (X) = 2n 1{X 6∈ Y}. f verifies assumption 1.2 with c = (0, ..., 0). We have p = 2−n , µ = 1, m = 0 and P[f (X) = 0] = 1−2−n so that concentration around µ does not occur. This simple example suggests that f (X) should concentrate around its conditional expectation m, which is in fact correct. It would be tempting to upper bound P[f (X) ≥  + m, X 6∈ Y] by p, and then attempt upper bounding P[f (X) ≥  + m|X ∈ Y] using McDiarmid’s inequality. However, unless Y has a very specific structure (e.g. a product set), this argument fails since, in general, (X1 , ..., Xn ) are not independent conditional to X ∈ Y . * Centrale-Supelec

/ L2S, France. E-mail: [email protected]

An extension of McDiarmid’s inequality

2

Main Result

Theorem 2.1. Under assumption 1.2, for all  ≥ 0 we have:

  2( − p¯ c)+2 P[f (X) − m ≥ ] ≤ p + exp − Pn 2 . i=1 ci where a+ = max(a, 0). By corollary:

   2( − p¯ c)+2 P[|f (X) − m| ≥ ] ≤ 2 p + exp − Pn 2 . i=1 ci Theorem 2.1 stated above is our main result and some remarks are in order: (i) In typical situations, p will be exponentially small, while c¯ will be independent of n, so that p = a−n and ci = b/n for all i for some a, b > 0. In that case one obtains the same exponent as in McDiarmid’s inequality. Think for instance of the case where Pn f (X) = (1/n) i=1 Xi for all X ∈ Y . (ii) If assumption 1.2 holds, it also holds when Y is replaced by Y 0 ⊂ Y , at the cost of increasing p. If p is controlled by another concentration inequality that holds   for some family of sets, one can equalize the two terms p and exp −

+2 2(−p¯ Pn c) 2 i=1 ci

to

obtain a refined version of Theorem 2.1. (iii) The behavior of f outside of Y may be arbitrary, and in particular f may even be unbounded so that supx∈X f (x) = +∞. It is also noted that if f is bounded, the difference between the expectation µ and conditional expectation m can be controlled in a simple manner. Fact 2.2. If supx∈X |f (x)| ≤ F < ∞, then |µ − m| ≤ 2pF . The proof of Theorem 2.1 is short and relies on an argument similar to Kirzbraun’s theorem [4]. Since one may not apply McDiarmid’s inequality to f directly, we construct a "smoothed" version of f denoted by f¯ such that: (i) f¯(x) = f (x) for all x ∈ Y (ii) f¯ has c-bounded differences on X (Lemma 2.3). Applying McDiarmid’s inequality to f¯ then yields the result. Lemma 2.3. Define the function:

f¯(x) = inf {f (y) + dc (x, y)}. y∈Y

Under assumption 1.2 we have (i) f¯(x) = f (x) for all x ∈ Y , and (ii) |f¯(x)− f¯(y)| ≤ dc (x, y) for all (x, y) ∈ X 2 .

3

Related work

While McDiarmid’s inequality sometimes gives loose concentration bounds, its strength lies in its applicability (see [8] for an extensive survey): sets X1 , ..., Xn may be completely arbitrary, and, even when f is involved, it is usually easy to check that the bounded differences assumption holds. Two notable applications of McDiarmid’s inequality are combinatorics and learning theory. Two representative results are the concentration of the chromatic number of Erd˝ os-Rényi graphs [1], and the fact that stable algorithms have good generalization performance [2]. Namely, if the output of a learning algorithm does not vary too much when a training example is modified, then it performs well on an unseen, randomly selected, example. Motivated by the study of random graphs, [3, 9, 10, 11] have provided concentration inequalities for particular classes of functions f (e.g. polynomials) which have bounded

Page 2/4

An extension of McDiarmid’s inequality

differences with high probability. Polynomials are of interest for combinatorics since the number of subgraphs such as triangles or cliques can be written as a polynomial in the entries of the adjacency matrix. On the other hand, concentration inequalities for general functions whose differences are bounded with high probability were provided in [6], [5], [12]. The authors assume that there exists vectors b and c, with bi ≥ ci for all i such that function f has cbounded differences on Y and b-bounded differences on X . The provided concentration inequalities usually give a strong improvement over McDiarmid’s inequality, but are not informative if b is too large. Theorem 2.1 shows that this is an artefact, since all the required "information" about the behaviour of f outside of Y is contained in p. A toy example of this phenomenon is X = {0, 1}n , Xi ∼ Ber(1/2), Y = X \ {(0, ..., 0), (1, ..., 1)}, B ≥ 0 and

  B f (X) = −B   1 Pn n

if X = (0, ..., 0) if X = (1, ..., 1) .

i=1

2(Xi − 1)

otherwise.

For all B ≥ 0, Theorem 2.1 guarantees that P[f (X) ≥ ] ≤ 2−n + exp(−2n( − 21−n )+2 ) while previously known inequalities become uninformative for B arbitrarily large.

4

Proofs We now state the proofs of our results, starting by lemma 2.3.

4.1

Proof of Lemma 2.3

(i) Consider x ∈ Y . We have f¯(x) = inf y∈Y {f (y) + dc (x, y)} ≤ f (x) + dc (x, x) = f (x). Further, f has c bounded-differences on Y so for all y ∈ Y , f (x) ≤ f (y) + dc (x, y). Hence f (x) ≤ f¯(x) taking the infimum over y ∈ Y . So f¯(x) = f (x) for all x ∈ Y . (ii) Consider (x, x0 ) ∈ X 2 , and y ∈ Y . By the triangle inequality:

f¯(x) ≤ f (y) + dc (x, y) ≤ f (y) + dc (x0 , y) + dc (x0 , x). Taking infimum over y ∈ Y on the r.h.s we get: f¯(x) ≤ f¯(x0 ) + dc (x0 , x). By symmetry |f¯(x) − f¯(x0 )| ≤ dc (x0 , x) so that f¯ has c bounded-differences on X as announced. 4.2

Proof of Theorem 2.1

Theorem 2.1 may be proven as follows. Decompose according to the occurrence of Y :

P[f (X) − m ≥ ] ≤ P[f (X) − m ≥ , X ∈ Y] + P[X 6∈ Y]. Define the function f¯(x) = inf y∈Y {f (y) + dc (x, y)} and M = E[f¯(X)] its expectation. We upper bound the difference M − m as follows. By Lemma 2.3, statement (i), f¯(x) = f (x) for all x ∈ Y so that:

P[f (X) − m ≥ , X ∈ Y] ≤ P[f¯(X) − m ≥ ] = P[f¯(X) − M ≥  + m − M ]. By definition, we have dc (x, y) ≤ c¯ for all (x, y) ∈ X 2 . Hence

f¯(x) = inf {f (y) + dc (x, y)} ≤ inf {f (y)} + c¯ ≤ E[f (X)|X ∈ Y] + c¯ = m + c¯. y∈Y

y∈Y

Once again, f¯(x) = f (x) if x ∈ Y so that:

M = E[f¯(X)1{X ∈ Y}] + E[f¯(X)1{X 6∈ Y}] ≤ E[f (X)1{X ∈ Y}] + p(m + c¯) = m + p¯ c.

Page 3/4

An extension of McDiarmid’s inequality

So M − m ≤ p¯ c and:

P[f¯(X) − M ≥  + m − M ] ≤ P[f¯(X) − M ≥  − p¯ c]. By Lemma 2.3, statement (ii), f¯ has c-bounded-differences on X . Applying McDiarmid’s inequality to f¯ gives:

  2( − p¯ c)+2 P[f¯(X) − M ≥  − p¯ c] ≤ exp − Pn 2 . i=1 ci from which we deduce the first statement:

  2( − p¯ c)+2 ¯ P P[f (X) − m ≥ ] ≤ P[X ∈ 6 Y] + P[f (X) − M ≥  − p¯ c] ≤ p + exp − n 2 i=1 ci 2

The corollary follows by symmetry.

References [1] B. Bollobás, The chromatic number of random graphs, Combinatorica 8 (1988), no. 1, 49–55. [2] O. Bousquet and A. Elisseeff, Algorithmic stability and generalization performance, Proc. of NIPS (2001). [3] J. H. Kim and V. H. Vu, Concentration of multivariate polynomials and its applications, Combinatorica 20 (2000), 417–434. [4] M. D. Kirszbraun, Uber die zusammenziehende und lipschitzsche transformationen, Fund. Math 22 (1934), 77–108. [5] A. Kontorovich, Concentration in unbounded metric spaces and algorithmic stability, Proc. of ICML (2014). [6] S. Kutin, Extensions to McDiarmid’s inequality when differences are bounded with high probability, Technical report (2002). [7] C. McDiarmid, On the method of bounded differences, Surveys in Combinatorics 141 (1989), 148–188. [8]

, Concentration, Probabilistic Methods for Algorithmic Discrete Mathematics 16 (1998), 195–248.

[9] W. Schudy and M. Sviridenko, Concentration and moment inequalities for polynomials of independent random variables, Proc. of SODA (2012). [10] V. H. Vu, On the concentration of multivariate polynomials with small expectation, Random Structures and Algorithms 16 (2000), 344–363. [11]

, Concentration of non-Lipschitz functions and applications, Random Structures and Algorithms 20 (2002), 262–316.

[12] L. Warmke, On the method of typical bounded differences, Combinatorics, Probability and Computing (2015), 1–31.

Page 4/4