COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
arXiv:1410.3351v1 [math.DG] 13 Oct 2014
ANTONIO G. ACHE AND MICAH W. WARREN
Abstract. We consider the framework used by Bakry and Emery in their work on logarithmic Sobolev inequalities to define a notion of coarse Ricci curvature on smooth metric measure spaces alternative to the notion proposed by Y. Ollivier. We discuss applications of our construction to the manifold learning problem, specifically to the statistical problem of estimating the Ricci curvature of a submanifold of RN from a point cloud assuming that the sample has smooth density with definite lower bounds. More generally we are able to approximate a 1-parameter family of Ricci curvatures that include the Bakry-Emery Ricci curvature.
Contents 1. Introduction 1.1. Background and Motivation 1.2. Carr´e Du Champ and Bochner Formula 1.3. Iterated Carr´e du Champ and Coarse Ricci Curvature 1.4. Statement of Results 1.5. Final Remarks 1.6. Organization of the Paper 1.7. Acknowledgements 2. Bias Error Estimates 2.1. Bias for Submanifold of Euclidean Space 2.2. Proof of Proposition 2.1 2.3. Bias for Smooth Metric Measure Space with a Density 3. Convergence of Coarse Ricci to Actual Ricci on Smooth Manifolds 4. Empirical Processes and Convergence 4.1. Estimators of the Carr´e Du Champ and the Iterated Carr´e Du Champ in the uniform case 4.2. Glivenko-Cantelli Classes 4.3. Ambient Covering Numbers 4.4. Sample Version of the Carr´e du Champ 4.5. Subexponential Decay and Almost Sure Convergence. 4.6. Proof of Theorem 1.12 5. Almost Sure Convergence in the non-Uniform Case
2 2 5 7 10 14 15 15 16 16 16 19 20 21 22 23 25 28 32 34 38
The first author was partially supported by a postdoctoral fellowship of the National Science Foundation, award No. DMS-1204742. The second author was partially supported by NSF Grant DMS-1161498. 1
2
ANTONIO G. ACHE AND MICAH W. WARREN
5.1. Proof of Theorem 1.18 References
43 45
1. Introduction In [BN08], Belkin and Niyogi show that the graph Laplacian of a point cloud of data samples taken from a submanifold in Euclidean space converges to the LaplaceBeltrami operator on the underlying manifold. Our goal in this paper is to demonstrate that this process can be continued to approximate Ricci curvature as well. This answers a question of Singer and Wu [SW12, pg. 1103] , allowing one to approximate the Hodge Laplacian on 1-forms. The Hodge Laplacian allows one to extract certain topological information, thus we expect our result to have applications to manifold learning. To this end, we define a notion of coarse Ricci curvature. Coarse Ricci curvature is a quantity defined on pairs of points rather than tangent vectors, thus it can be defined on any metric measure space. We define a family of coarse Ricci curvature operators, which depend on a scale parameter t. We show that when taken on a smooth manifold embedded in Euclidean space, these operators converge to the Ricci curvature as t → 0. The approach is grounded in the Bakry-Emery Γ2 calculus. In particular, we appeal to the Bochner formula (1.1)
Γ2 (f, f ) = Ric(∇f, ∇f ) + kD 2 f k2 .
Iterating the approximate Laplacian operators in [BN08] one can construct an approximate Γ2 operator, and test this operator on a set of “linear” functions. This defines a coarse Ricci curvature on any two points from a submanifold. This approach recovers the Ric∞ tensor and can be modified to recover the standard Ricci curvature as well, provided the volume density is smooth. We show two types of results. First, the coarse Ricci operators converge as t → 0 when taken on a fixed smooth submanifold. Secondly, there exists an explicit choice of scales tn → 0 such that the quantities converge almost surely when computed from a set of n points sampled from a smooth probability distribution on the manifold. 1.1. Background and Motivation. The motivation for the paper stems from both the theory of Ricci lower bounds on metric measure spaces and the theory of manifold learning. 1.1.1. Ricci Curvature Lower Bounds on Metric Measure Spaces. One of our main sources of intuition for understanding Ricci curvature is the problem of reinterpreting lower bounds on the Ricci curvature in such a way that it becomes stable under Gromov-Hausdorff limits and thus defining “weak” notion of lower bounds on Ricci curvature. This theory has undergone significant development over the last 10-15 years. A major achievement is the work of Lott-Villani [LV09] and Sturm [Stu06a, Stu06b], which defines Ricci curvature lower bounds on metric measure spaces. These
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
3
definitions work quite beautifully provided the metric space is also a length space, but fail to be useful on discrete spaces. The underlying calculus for this theory lies in optimal transport: Given a metric measure space (X, d, µ) one can consider the space B of all Borel probability measures with the 2-Wasserstein distance, which we will denote by W2 , and is given by Z W2 (µ1 , µ2 ) = inf (1.2) d(x, y)dγ(x, y), γ∈Π
X×X
where Π is the set of all probability measures in B whose marginals are the measures µ1 and µ2 , i.e. if Pi : X × X → X for i = 1, 2 are the projections onto the first and second factors respectively, then µi = (Pi )∗ γ (the push-forward of γ by Pi ). One of the crucial ideas in the work of Lott-Villani and Sturm is to use tools from optimal transport to define a notion of convexity with respect to 2-Wasserstein geodesics (i.e., geodesics with respect to the Wasserstein distance W2 ) such that lower bounds on the Ricci curvature are equivalent to the geodesic convexity of well-chosen functionals. Further, they show that this convexity property is stable under Gromov-Hausdorff limits. In particular, they consider the functional given by the entropy of a measure µ, given a background measure ν, defined by Z dµ dµ E(µ|ν) = (1.3) dν, log dν dν and define a space to have non-negative Ricci curvature if the entropy (1.3) is convex along Wasserstein geodesics. This convexity property along Wasserstein geodesics, or displacement convexity, used by Lott-Villani was in turn inspired by the work of Robert McCann in [McC97]. Before the work of Lott-Villani and Sturm, it was well known that lower bounds on sectional curvature are stable under Gromov-Hausdorff; For more on the theory of Alexandrov spaces, see for example [BBI01]. The work of Lott-Villani and Sturm provides a precise definition of lower bounds on Ricci Curvature on length spaces that a priori may be very rough. More recently, Aaron Naber in [Nab13] used stochastic analysis on manifolds to characterize twosided bounds on non-smooth spaces. Naber’s approach also relies heavily on the existence of Lipschitz geodesics. On the other hand, it is easy to show (for example, consider the space with two positively measured points unit distance apart) that the Wasserstein space W2 for discrete spaces does not admit any Lipschitz geodesics. To overcome the lack of Wasserstein geodesics on discrete space, Bonciocat and Sturm [BS09] introduce the notion of an“h-rough geodesic,” which behaves like a geodesic up to an error h. Then using optimal transport methods, they are able to define a lower bound on Ricci curvature, which depends on the scale h. The notion of coarse Ricci curvature was developed to characterize Ricci curvature, including lower bounds, on a more general class of spaces. The first definition of coarse Ricci curvature was proposed by Ollivier [Oll09] as a function on pairs of points in a metric measure space. Heuristically, two points should have positive coarse Ricci curvature if geodesics balls near the points are “closer” to each other than the points themselves. One possible precise definition of “closer” is given by (1.2). Comparing
4
ANTONIO G. ACHE AND MICAH W. WARREN
the distance between two points to the distance of normalized unit balls is useful geometric information. It is also natural to consider the heat kernel for some small positive time around each given point. In fact, the intuition becomes blatant in the face of [vRS05, Cor. 1.4 (x)], which states that the distance (1.2) decays at an exponential rate given by the Ricci lower bound, as mass spreads out from two points via Brownian motion. Even earlier, it was shown by by Jordan, Kinderlehrer and Otto [JKO98] that the gradient flow of the entropy function (1.3) on W2 -space agrees with the heat flow on L2 . On nice metric measure spaces, one expects this flow to converge to an invariant measure, the same one from which the entropy was originally defined. Thus in principle, the behavior of the entropy functional (1.3) and the generator of heat flow are fundamentally related. This deep relationship is explored in generality in the recent paper [AGS]. It is natural then to attempt to define Ricci curvature in terms of a Markov process. In fact, Ollivier’s idea was to compare the distance between two points to the distance between the point masses after one step in the Markov process. This is also essentially the idea in Lin-Lu-Yau [LLY11], where a lower bound on Ricci curvature of graphs is defined. In contrast, a Γ2 approach was used by Lin-Yau to define Ricci curvature lower bounds on graphs in [LY10]. Recently, Erbas and Maas [EM12] and Mielke [Mie13] have provided a very natural way to define Ricci curvature for arbitrary Markov chains. This involves creating a new Wasserstein space, called the discrete transportation metric, in which one can implement the idea of Lott-Sturm-Villani, i.e., relating lower bounds on Ricci curvature to geodesic convexity of certain entropy functional. Gigli and Maas [GM13] show that in the limit these transportation metrics converge to the Wasserstein space of the manifold in question as the mesh size goes to zero, at least on the torus. Thus this notion recovers the Ricci curvature in the limit. Any discussion of lower bounds on Ricci curvature would not be complete without a discussion of isoperimetric inequalities, most notably the log-Sobolev inequality. These inequalities, which hold for positive Ricci lower bounds, were generalized by Lott-Villani. In the coarse Ricci setting, log-Sobolev inequalities have been proved by Ollivier, Lin-Yau and Maas for their respective definitions of Ricci curvature lower bounds. We will formulate a log-Sobolev inequality, which holds at a scale t as well (see section 1.5). This statement requires a standard Bakry-Emery type condition, which is related, but not precisely, a statement about coarse Ricci curvature. For further introduction to concepts of coarse Ricci curvature see the survey of Ollivier [Oll10]. 1.1.2. The Manifold Learning Problem. Roughly speaking, the manifold learning problem deals with inferring or predicting geometric information from a manifold if one is only given a point cloud on the manifold, i.e., a sample of points drawn from the manifold at random according to a certain distribution, without any further information. It is clear then that the manifold learning problem is highly relevant to machine learning and to theory of pattern recognition. An example of an object related to the geometry of an embedded submanifold Σ of Euclidean space that one can “learn”
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
5
or estimate from a point cloud is the rough Laplacian or Laplace-Beltrami operator. Given an embedding F : Σd → RN consider its induced metric g. By the rough Laplacian of g we mean the operator defined on functions by ∆g f = g ij ∇i ∇j f where ∇ is the Levi-Civita connection of g. Belkin and Niyogi showed in [BN08] that given a uniformly distributed point cloud on Σ there is a 1-parameter family of operators Lt , which converge to the Laplace-Beltrami operator ∆g on the submanifold. More precisely, the construction of the operators Lt is based on an approximation of the heat kernel of ∆g , and in particular the parameter t can be interpreted as a choice of scale. In order to learn the rough Laplacian ∆g from a point cloud it is necessary to write a sample version of the operators Lt. Then, supposing we have n data points that are independent and identically distributed (abbreviated by i.i.d.) one can choose a scale tn in such a way that the operators Ltn converge almost surely to the rough Laplacian ∆g . This step follows essentially from applying a quantitative version of the law of large numbers. Thus one can almost surely learn spectral properties of a manifold. While in [BN08] it is assumed that the sample is uniform, it was proved by Coifman and Lafon in [CL06] that if one assumes more generally that the distribution of the data points has a smooth, strictly positive density in Σ, then it is possible to normalize the operators Lt in [BN08] to recover the rough Laplacian. More generally, the results in [CL06] and [SW12] show that it is possible to recover a whole family of operators that include the Fokker-Planck operator and the weighted Laplacian ∆ρ f = ∆f −h∇ρ, ∇f i associated to the smooth metric measure space (M, g, e−ρ dvol), where ρ is a smooth function. Since then, Singer and Wu have developed methods for learning the rough Laplacian of an embedded submanifold on 1-forms using Vector Diffusion Maps (VDM) (see for example [SW12]). The relationship of Ricci curvature to the Hodge Laplacian on 1-forms is given by the Weitzenb¨ock formula. In this paper we consider the problem of learning the Ricci curvature of an embedded submanifold Σ of RN at a point from a point cloud. The idea is to construct a notion of coarse Ricci curvature that will serve as a sample estimator of the actual Ricci curvature of the embedded submanifold Σ. In order to explain our results we provide more background in the next section. ´ 1.2. Carr´ e Du Champ and Bochner Formula. Bakry and Emery [BE85] introduced the notion of Carr´e du Champ which we now recall. Let Pt be a 1-parameter family of operators of the form Z (1.4) Pt f (x) = f (y)pt (x, dy), M
where f is a bounded measurable function defined on M and pt (x, dy) is a non-negative kernel. We assume that Pt satisfies the semi-group property, i.e. (1.5) (1.6)
Pt+s = Pt ◦ Ps . P0 = Id.
6
ANTONIO G. ACHE AND MICAH W. WARREN
In Rn , an example of Pt is the Brownian motion, defined by the density 2 1 − |x−y| 2t e (1.7) dy, t ≥ 0. pt (x, dy) = (2πt)n/2 If now Pt is a diffusion semi-group defined on (M, g), we let L be the infinitesimal generator of Pt , which is densely defined in L2 by (1.8)
Lf = lim t−1 (Pt f − f ), t→0
We consider a bilinear form which has been introduced in potential theory by J.P. Roth [Rot74] and by Kunita in probability theory [Kun69] and measures the failure of L from satisfying the Leibnitz rule. This bilinear form is defined as 1 (1.9) Γ(L, u, v) = (L(uv) − L(u)v − uL(v)) . 2 When L is the rough Laplacian with respect to the metric g, then (1.10)
Γ(∆g , u, v) = h∇u, ∇vi.
We will also consider the iterated Carr´e du Champ introduced by Bakry and Emery denoted by Γ2 and defined by 1 Γ2 (L, u, v) = (L(Γ(L, u, v)) − Γ(L, Lu, v) − Γ(L, u, Lv)) . (1.11) 2 Notation 1.1. When considering the operators (1.9) and (1.11) we will use the slightly cumbersome three-parameter notation, as the main results will be stated in terms of a family of operators {Lt }. Note that if we restrict our attention to the case L = ∆g the Bochner formula yields 1 1 1 (1.12) Γ2 (∆g , u, v) = ∆h∇u, ∇vi − h∇∆g u, ∇vi − h∇u, ∇∆g vi 2 2 2 (1.13) = Ric(∇u, ∇v) + hHessu , Hessv i.
We observe immediately that if ∇u = ei and Hessu = 0 that one can recover the Ricci tensor via (1.14)
Γ2 (∆g , u, u) = Ric(ei , ei ).
The fundamental observation of Bakry and Emery is that the properties of Ricci curvature lower bounds can be observed and exploited by using the bilinear form Γ2 . With this in mind, they define a curvature-dimension condition for an operator L on a space X as follows. If there exist measurable functions k : X → R and N : X → [1, ∞] such that for every f on a set of functions dense in L2 (X, dν) the inequality 1 Γ2 (L, f, f ) ≥ (Lf )2 + kΓ(L, f, f ) (1.15) N
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
7
holds, then the space X together with the operator L satisfies the CD(k, N) condition, where k stands for curvature and N for dimension. In particular, when considering a smooth metric measure space (M n , g, e−ρdvol) one has the natural diffusion operator ∆ρ u = ∆u − h∇ρ, ∇ui,
(1.16)
corresponding to the variation of the Dirichlet energy with respect to the measure e−ρ dvol. By studying the properties of ∆ρ , Bakry and Emery arrive at the following dimension and weight dependent definition of the Ricci tensor:
(1.17)
Ric + Hessρ Ric + Hessρ − N 1−n (dρ ⊗ dρ) RicN = Ric + Hessρ − ∞(dρ ⊗ dρ) −∞
if if if if
N = ∞, n < N < ∞, N = n, N < n,
and moreover, they showed the equivalence between the CD(k, N) condition (1.15) and the bound RicN ≥ k. The main goal of this paper is to show that one can recover both the Ric and Ric∞ for embedded smooth metric measure spaces, by approximating the operators ∆g and ∆ρ . We can recover the Ricci curvature of a submanifold of Rd from (1.14) via the following observation: For any tangent vector ξ at a point x, the function (1.18)
fξ (z) = hξ, zi
has vanishing Hessian at x when restricted to the submanifold, and has gradient (1.19) thus (1.14) yields that (1.20)
∇fξ (z)|z=x = ξ, Γ2 (∆g , fξ , fξ )(x) = Ric(ξ, ξ).
1.3. Iterated Carr´ e du Champ and Coarse Ricci Curvature. In this section we provide a definition of coarse Ricci curvature on general metric measures spaces, using a family of operators which are intended to approximate a Laplace operator on a space at scale t. As this definition holds on metric measure spaces constructed from sampling points from a manifold, we can define an empirical or sample version of the Ricci curvature at a given scale t. This last construction will have an application to the Manifold Learning Problem, namely it will serve to predict the Ricci curvature of an embedded submanifold of RN if one only has a point cloud on the manifold and the distribution of the sample has a smooth positive density. As noted above, to recover the Ricci curvature on a manifold, we need to evaluate Γ2 on functions with vanishing Hessian at a point. In this section, we do not assume that the space X is Euclidean or has a tangent space. On a general metric measure space, where a tangent space may not be available, the test functions should distinguish different directions. Since we do not have tangent vectors, we replace the notion of tangent vectors with the notion of direction pointing from one point towards another. The coarse Ricci curvature will then be defined on pairs of points. For submanifolds
8
ANTONIO G. ACHE AND MICAH W. WARREN
in Euclidean space, the obvious choice is the linear function whose gradient is the vector which points from a point x to a point y. To define the coarse Ricci curvature on a general space X we need to generalize this idea to some sort of locally almost linear function. We propose the following function, which serves our purposes. Given x, y ∈ X define
1 d2 (y, z) − d2 (x, z) − d2 (x, y) . 2 d(x, y) One can check that, in the Euclidean case, this is simply the linear function with unit gradient pointing from x to y and that, for x very near y on a Riemannian manifold, this is (up to high order) the coordinate function in normal coordinates at x determined by the normal coordinates of y. This leads us to the following definition of coarse Ricci curvature. (1.21)
fx,y (z) =
Definition 1.2. Given an operator L we define the coarse Ricci curvature for L as (1.22)
RicL (x, y) = Γ2 (L, fx,y , fx,y )(x).
To see this is consistent with the classical notions, note that this defines a coarse Ricci curvature on a Riemannian manifold as (1.23)
Ric△g (x, y)(x) = Γ2 (∆g , fx,y , fx,y )(x).
In Proposition 1.8 we will see how (1.23) recovers the Ricci curvature on the manifold. Remark 1.3. RicL (x, y) need not be symmetric. 1.3.1. Approximations of the Laplacian, Carr´e du Champ and its iterate. We now construct operators which can be thought of as approximations of the Laplacian on metric measure spaces. This construction is reminiscent of the approximation constructed by Belkin-Niyogi in [BN08] and more generally Coifman-Lafon in [CL06]. Consider a metric measure space (X, d, µ) with the Borel σ-algebra such that µ(X) < ∞. Given t > 0, let θt be given by Z d2 (x,y) θt (x) = e− 2t dµ(y). (1.24) X
We define a 1-parameter family of operators Lt as follows: given a function f on X define Z d2 (x,y) 2 Lt f (x) = (f (y) − f (x)) e− 2t dµ(y). (1.25) tθt (x) X With respect to this Lt one can define a Carr´e du Champ on appropriately integrable functions f, h by 1 Γ(Lt , f, h) = (Lt (f h) − (Lt f )h − f (Lt h)) , (1.26) 2 which simplifies to Z d2 (x,y) 1 (1.27) e− 2t (f (y) − f (x))(h(y) − h(x))dµ. Γ(Lt , f, h)(x) = tθt (x) X
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
9
In a similar fashion we define the iterated Carr´e du Champ of Lt to be 1 Γ2 (Lt , f, h) = (Lt (Γ(Lt , f, h)) − Γ(Lt , Lt f, h) − Γ(Lt , f, Lt h)) . (1.28) 2 Remark 1.4. This definition of Lt differs from Belkin-Niyogi operator in that we normalize by θt (x) instead of (2πt)d/2 for an assumed manifold dimension d. 1.3.2. Empirical Coarse Ricci Curvature at a given scale. We can also define empirical versions of Lt , Γ(Lt , ·, ·) and Γ2 (Lt , ·, ·). On a space which consists of n points {ξ1 , ..., ξn } sampled from a manifold, it is natural to consider the empirical measure defined by n 1X µn = (1.29) δξ n i=1 i where δξi is the atomic point measure at the point ξi (also called δ-mass). For any function f :→ R we will use the notation Z n 1X (1.30) f (ξj ). µn f = f (y)dµn(y) = n j=1 X
ˆ t ) to distinguish those Notation 1.5. We will use the “hat” notation (for example L operators, measures, or t-densities which have been constructed from a sample of finite points. ˆ t as To be more precise, we define the operator L Z d2 (x,y) ˆ t f (x) = 2 (1.31) L (f (y) − f (x)) e− 2t dµn (y), tθˆt (x) X where
θˆt (x) =
(1.32)
Z
−
e
d2 (x,y) 2t
X
and of course (1.33)
Z
−d
X
(f (y) − f (x)) e
2 (x,y) 2t
n
1 X − d(ξj ,x)2 2t e dµn (y) = , n j=1 n
1 X − d(ξj ,x)2 2t e dµn (y) = (f (ξj ) − f (x)). n j=1
ˆ t , f, h) which The sample version of Carr´e du Champ will be the bilinear form Γ(L from (1.27) takes the form n X d2 (ξj ,x) ˆ t , f, h)(x) = 1 1 e− 2t (f (ξj ) − f (x))(h(ξj ) − h(x)), Γ(L (1.34) tθˆt (x) n j=1
ˆ t by Γ2 (L ˆ t , f, h), and by We denote the iterated Carr´e du Champ corresponding to L this we mean ˆ t (Γ(L ˆ t )f, h) − Γ(L ˆt, L ˆ t f, h) − Γ(L ˆ t , f, L ˆ t h) . ˆ t , f, h) = 1 L (1.35) Γ2 (L 2
10
ANTONIO G. ACHE AND MICAH W. WARREN
Our notion of empirical coarse Ricci curvature is simply Definition 1.6. Let be a sample of points of X. The empirical Ricci curvature of (X, d, µ) at a scale t with respect to a sample {ξ1 , . . . , ξn } at (x, y) ∈ X × X, which ˆ will be denoted by Ric(x, y) is ˆ ˆ t , fx,y , fx,y ). (1.36) Ric(x, y) = Γ2 (L Remark 1.7. Observe that (1.36) does not make use of the measure µ. We will discuss the role played by the measure when we discuss applications to Manifold Learning in Section 1.4.2. We are now in position to state our main results. 1.4. Statement of Results. 1.4.1. Convergence of Coarse Ricci Curvature on Smooth Manifolds. We start by showing that Ric∆g has the following expected property of coarse Ricci curvature on smooth manifolds. Proposition 1.8. Suppose that M is a smooth Riemannian manifold. Let V ∈ Tx M with g(V, V ) = 1. Then (1.37)
Ric(V, V ) = lim Ric△g (x, expx (λV )) λ→0
The proof of Proposition 1.8 will be given in Section 3 and follows from a classical result of Synge in [Syn31], which provides an expansion for the square of the geodesic distance at a point in normal coordinates. 1.4.2. Applications to Manifold Learning. We now show how our notions of coarse Ricci curvature and empirical coarse Ricci curvature at a scale have applications to the Manifold Learning Problem. For the rest of subsection 1.4.2 we will consider a closed embedded submanifold Σ of RN , and the metric measure space will be (Σ, k · k, dvol), where • k · k is the distance function in the ambient space RN , • dvolΣ is the volume element corresponding to the metric g induced by the embedding of Σ into RN . In addition we will adopt the following conventions • All operators Lt , Γ(Lt , ·, ·) and Γ2 (Lt , ·, ·) will be taken with respect to the distance k · k and the measure dvolΣ . ˆ t , Γ(L ˆ t , ·, ·) and Γ2 (L ˆ t , ·, ·) are taken with respect to the • All sample versions L ambient distance k · k. The choice of the above metric measure space is consistent with the setting of manifold learning in which no assumption on the geometry of the submanifold Σ is made, in particular, we have no a priori knowledge of the geodesic distance and therefore we can only hope to use the chordal distance as a reasonable approximation for the geodesic distance. We will show that while our construction at a scale t involves only information from the ambient space, the limit as t tends to 0 will recover the
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
11
Ricci curvature of the submanifold with intrinsic geodesic distance. As pointed out by Belkin-Niyogi [BN08, Lemma 4.3], the chordal and intrinsic distance functions on a submanifold differ to fourth order near a point , so while much of the analysis is done on submanifolds, the intrinsic geometry will be recovered in the limit. We are able to show the following. Theorem 1.9. Let Σd ⊂ RN be a closed embedded submanifold, let g be the Riemannian metric induced by the embedding, and let (Σ, k · k, dvolΣ ) be the metric measure space defined with respect to the ambient distance. Then there exists a constant C1 depending on the geometry of Σ and the function f such that (1.38)
sup |Γ2 (∆g , f, f )(x) − Γ2 (Lt , f, f )(x)| < C1 (Σ, D 5 f )t1/2 . x∈Σ
Theorem 1.9 will follow from Corollary 2.2 which is proved in Section 2. Corollary 1.10. With the hypotheses of Theorem 1.9 we have (1.39)
Ric∆g (x, y) = lim Γ2 (Lt , fx,y , fx,y )(x). t→0
Theorem 1.11. Let Σd ⊂ RN be a closed embedded submanifold, and let g be the metric induced by the embedding. Let γ(s) be a unit speed geodesic in Σ such that γ(0) = x. There exists constants C2 , C3 depending on the geometry of Σ such that (1.40)
|Ric(γ ′ (0), γ ′ (0)) − RicLt (x, γ(s))| ≤ C2 t1/2 + C3 s
In principle, Corollary 1.10 combined with Proposition 1.8 give us a way to approximate the Ricci curvature. Now we address the problem of choosing a scale depending on the size of the data and the dimension of the submanifold Σ, such that the sequence of empirical Ricci curvatures corresponding to the size of the data converge almost surely to the actual Ricci curvature of Σ at a point. In order to simplify the presentation of our results, we start by stating the simplest possible case, which correspond to a uniformly distributed i.i.d. sample {ξ1 , . . . , ξn }. We will be able to relax this assumption to i.i.d. samples whose distributions have a smooth everywhere positive density. Theorem 1.12. Consider the metric measure space (Σ, k·k, dvolΣ ) where Σd ⊂ RN is a smooth closed embedded submanifold. Suppose that we have a uniformly distributed i.i.d. sample {ξ1 , . . . , ξn } of points from Σ. For σ > 0, let 1
tn = n− 3d+3+σ , .
(1.41) Then, (1.42)
a.s. ˆ sup Γ , f, f )(ξ) , f, f )(ξ) − Γ (L (L −→ 0. 2 tn 2 tn ξ∈Σ
The proof of Theorem 1.12, even in the uniform case, requires using ideas from the theory of empirical processes. We will provide the necessary background in Section 4. As pointed out in Section 1.1.2, since we are interested in recovering an object from its sample version, we are forced to consider a law of large numbers in order to
12
ANTONIO G. ACHE AND MICAH W. WARREN
obtain convergence in probability or almost surely. The problem is that the sample version of Γ2 (Lt , ·, ·) involves a high correlation between the data points, destroying independence and any hope of applying large number results directly. The idea then is to reduce the convergence of the sample version of Γ2 (Lt , ·, ·) to the application of a uniform law of large numbers to certain classes of functions. Theorem 1.12 is proved in Section 4. Corollary 1.13. Let Σd ⊂ RN be an embedded submanifold and consider the metric measure space (Σ, k · k, dvolΣ ). Suppose that we have an i.i.d. uniformly distributed sample ξ1 , . . . , ξn drawn from Σ. Let 1
tn = n− 3d+3+σ ,
(1.43) for any σ > 0. Then (1.44)
a.s. ˆ sup Γ2 (Ltn , fx,y , fx,y )(x) − Ric∆g (x, y) −→ 0. x∈Σ
In other words, there is a choice of scale depending on the size of the data for which the corresponding empirical coarse Ricci curvatures converge almost surely to the coarse Ricci curvature. Remark 1.14. The convergence is better if tn is chosen to go to zero slower than in (4.82). In particular, if one replaces d with an upper bound on d, then Theorem 1.12 and Corollary 1.13 still hold. We now state our results for non-uniformly distributed samples and show how we can recover more general objects than the Ricci Curvature, for example the BakryEmery tensor if we sample adequately. 1.4.3. Smooth Metric Measure Spaces and non-Uniformly Distributed Samples. Consider a smooth metric measure space (M, g, e−ρdvol) and let ∆ρ be the operator (1.45)
△ρ u = △g u − ∇ρ · ∇u.
In [CL06], the authors consider a family of operators Lαt which converge to △2(1−α)ρ . Note that a standard computation (cf [Vil09, Page 384]) gives 1 Γ2 (△2(1−α)ρ , f, f ) = ∆g k∇f k2g − h∇ρ, ∇∆g f ig + 2(1 − α)∇2g ρ(∇f, ∇f ) 2 We adapt [CL06] to our setting: Recall that Z d2 (x,y) (1.47) θt (x) = e− 2t dµ(y), (1.46)
X
and define, for α ∈ [0, 1] (1.48)
θt,α (x) =
Z
X
e−
kx−yk2 2t
1 dµ(y). [θt (y)]α
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
13
We can define the operator Z kx−yk2 2 1 1 α e− 2t (1.49) (f (y) − f (x)) dµ(y) Lt f (x) = t θt,α (x) [θt (y)]α
and again obtain bilinear forms Γ(Lαt , f, f ) and Γ2 (Ltα , f, f ). For the rest of the section we will consider the metric measure space (Σ, k · k, e−ρ dvolΣ ) where Σd ⊂ RN is an embedded submanifold, k · k is the ambient distance and ρ is a smooth function in Σ. We again take all the operators Lt , Γt (Lt , ·, ·) and Γ2 (Lt , ·, ·) and ˆ t , Γt (L ˆ t , ·, ·) and Γ2 (L ˆ t , ·, ·) with respect to the data of their sample counterparts L (Σ, k · k, e−ρ dvolΣ ).
Theorem 1.15. Let Σd ⊂ RN be an embedded submanifold and consider the smooth metric measure space (Σ, k · k, e−ρ dvolΣ ). Let f ∈ C 5 (Σ) such that kf kC 5 ≤ M. There exists C4 = C4 (Σ, M, ρ) such that
(1.50)
sup Γ2 (Lαt , f, f )(ξ) − Γ2 (△2(1−α)ρ , f, f )(ξ) ≤ C4 t1/2 . ξ∈Σ
Corollary 1.16. With the hypotheses of Theorem 1.15, if in addition we assume that kD 2 f k (ξ) = 0 and choose α = 1/2 we have for C5 = C5 (Σ, M, ρ) (1.51)
1/2 Ric∞ (∇f, ∇f )(ξ) − Γ2 (Lt , f, f )(ξ) ≤ C5 t1/2 .
Corollary 1.17. With the hypotheses of Theorem 1.15, if in addition we assume that kD 2 f k (ξ) = 0 and choose α = 1, we have for C6 = C6 (Σ, M, ρ) (1.52)
Ric(∇f, ∇f )(ξ) − Γ2 (L1t , f, f )(ξ) ≤ C6 t1/2 .
In particular, if the density is positive and smooth enough, we can still recover the Ricci tensor. Finally, adapting (1.31) to the modified operators we have Theorem 1.18. Let Σ ⊂ RN be a closed embedded submanifold of RN and consider the metric measure space (Σ, k · k, e−ρ dvolΣ ). Let ξ1 , . . . , ξn , . . . be an i.i.d. sample ˆ α ]2 be the sample version whose distribution has density precisely e−ρ dvol, and let [Γ t 1 , where σ is a positive number, we have of [Γαt ]2 . Choosing tn = n−γ with γ = 4d+4+σ ˆ α a.s. α sup [Γ2 (Ltn , f, f )(ξ) − Γ2 (Ltn , f, f )(ξ) −→ 0. (1.53) ξ∈Σ
The proof of Theorem 1.18 will be given in Section (5).
Corollary 1.19. Let Σd ⊂ RN be an embedded submanifold and consider the metric measure space (Σ, k · k, e−ρ dvolΣ ). Suppose that we have an i.i.d. sample {ξ1 , . . . , ξn } whose distribution has density e−ρ dvolΣ . Letting (1.54)
1
tn = n− 4d+4+σ ,
14
ANTONIO G. ACHE AND MICAH W. WARREN
for σ > 0, it follows that Γ2 (Lαtn , fx,y , fx,y )(x) converges almost surely to the coarse Ricci curvature (1.55)
Ric∆2(1−α) (x, y).
1.5. Final Remarks. At this point we can make comparisons to other definitions of coarse Ricci curvature. Lin and Yau [LY10], following Chung and Yau [CY96], consider curvature dimension lower bounds of the form (1.15), in the particular case where L is the graph Laplacian using the standard distance function on graphs. This class of metric spaces is quite restricted - in their setting they are able to show that every locally finite graph satisfies a CD(2, −1) condition. The approach by Lin, Lu and Yau in [LLY11] follows the ideas of Ollivier [Oll09], using an optimal transport distance like (1.2), using distance instead of squared distance. With this metric, they compare mass distributions after short diffusion times, and take the limit as the diffusion time approaches zero. They are able to show a Bonnet-Myers type theorem which holds for graphs with positive Ricci curvature. While the BonnetMyers result holds in the classical Riemannian setting, it fails for general notions of Ricci curvature, for example, when the Ricci curvature is derived from the standard OrnsteinUhlenbeck process. Ollivier’s definition [Oll09, Definition 3] is very general, and also uses the W1 Wasserstein metric. Ollivier uses ε-geodesics to obtain local-toglobal results. Note that the notion of ε-geodesics is stronger than h-rough geodesics seen in [BS09]. For emphasis, we summarize that one of the main advantages of our approach is that is uses only integration, and does not require any minimization. In particular, we do not use optimal transport or any notion of geodesics. Further, our approach lends itself to making explicit bounds on the convergence given points randomly selected according to a distribution. From the theory of Gross and Bakry-Emery, one can show that whenever a heat group together with the invariant measure satisfies standard properties (self adjointness, ergodicity, Leibnitz and product rules) a Bakry-Emery condition of the form (1.56)
Γ2 (f, f ) ≥ KΓ(f, f )
implies a log-Sobolev inequality of the form Z Z 1 Γ(f, f ) f log f dµ ≤ (1.57) dµ 2K f R for all f > 0f with f dµ = 1. In particular, these conditions do hold for any of the approximate operators Lt . Thus the log-Sobolev inequality appropriate to our situation takes the following formulation. We omit the proof. Theorem 1.20. Suppose that Σ is an embedded submanifold of Euclidean space and dµ is a measure on Σ. Suppose that (1.58)
Γ2 (Lt , f, f ) ≥ KΓ(Lt , f, f )
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
15
R for all f ∈ C 4 (Σ). Then for all f > 0 with f dµ = 1, we have Z Z Z 1 − kx−yk2 1 1 1 (1.59) e 2t (f (y) − f (x))2 dµ(y)dµ(x). f log f dµ ≤ 2K t Σ Σ θt (x) f (x) Σ Note that the condition (1.58) is a priori stronger than a coarse Ricci lower bound of the form (1.60)
RicLt (x, y) = Γ2 (Lt , fx,y , fx,y ) ≥ K
as the latter only tests the condition (1.58) on a subset of functions and at certain points. While the conditions (1.58) (1.60) are equivalent in the infinitesimal case, it is unclear to us if the weaker condition (1.60) implies any meaningful isoperimetric inequalities in the general case. Our results show that one can give a definition of coarse Ricci curvature at a scale on general metric measure spaces that converges to the actual Ricci curvature on smooth Riemannian manifolds. Moreover, our definition of empirical coarse Ricci curvature at a scale can be thought of as an extension of Ricci Curvature to a class of discrete metric spaces namely those obtained from sampling points from a smooth closed embedded submanifold of RN . Note however, that in order to obtain convergence of the empirical coarse Ricci curvature at a scale to the actual Ricci Curvature we need to assume that there is a manifold which fits the distribution of the data. Recently, Fefferman-MitterNarayanan in [FMN13] have developed an algorithm for testing the hypothesis that there exists a manifold which fits the distribution of a sample, however, a problem that remains open is how to estimate the dimension of a submanifold from a sample of points. In another vein, there is much current interest in a converse problem : The development of algorithms for generating point clouds on manifolds or even on surfaces. Recently, there has been progress in this direction by Palais-Palais-Karcher in [KPP14], specifically on methods for generating point clouds on implicit surfaces using Monte Carlo simulation and the Cauchy-Crofton formula. 1.6. Organization of the Paper. In section 2 we prove Theorems 1.9 and 1.15. In Section 3 we show that on smooth manifolds our definition of coarse Ricci curvature at a scale converges to the actual Ricci curvature. In Section 4 we provide the necessary background on the theory of empirical processes for proving Theorems 1.12 and its corollaries. Finally, in Section 5 we prove Theorem 1.18. 1.7. Acknowledgements. The authors would like to thank Amit Singer, Hau-Tieng Wu and Charles Fefferman for constant encouragement. The first author would like to express gratitude to Adolfo Quiroz for very useful conversations on the topic of empirical processes, and to Richard Palais for bringing his work to his attention. The second author would like to thank Jan Maas for useful conversation, and Matthew Kahle for stoking his interest in the topic.
16
ANTONIO G. ACHE AND MICAH W. WARREN
2. Bias Error Estimates 2.1. Bias for Submanifold of Euclidean Space. In this section we prove Theorem 1.9. The theorem will follow from Proposition 2.1 and Corollary 2.2 below. For simplicity we will assume that (Σ, dvolΣ ) has unit volume. Recall the definitions (1.24), (1.25), (1.26), (1.27) and (1.28). Proposition 2.1. Suppose that Σd is a closed, embedded, unit volume submanifold of RN . For any x in Σ and for any functions f, h in C 5 (Σ) we have (2.1) (2πt)d/2 = 1 + tG1 (x) + t3/2 R1 (x), θt (x) (2.2) Γt (f, h)(x) = h∇f (x), ∇h(x)i + t1/2 G2 (x, J 2 (f )(x), J 2 (h)(x))
(2.3)
+ tG3 (x, J 3 (f )(x), J 3 (h)(x)) + t3/2 R2 (x, J 4 (f )(x), J 4 (h)(x)),
(2.4) Lt f (x) = ∆g f (x) + t1/2 G4 (x, J 3 (f )(x)) + tG5 (x, J 4 f (x)) + t3/2 R3 (x, J 5 f (x)), where each Gi is a locally defined function, which is smooth in its arguments, and J k (u) is a locally defined k-jet of the function u. Also, each Ri is a locally defined function of x which is bounded in terms of its arguments. Corollary 2.2. We have the following expansions (2.5) (2.6)
Lt (Γt (f, f ))(x) = ∆g k∇f (x)k2g + t1/2 R4 (x, J 5 (f )(x)),
Γt (Lt f, f )(x) = h∇∆g f (x), ∇f (x)i + t1/2 R(x, J 5 (f )(x)),
2.2. Proof of Proposition 2.1. Our first goal is to fix a local structure which we will use to define the quantities Gi and Ri and J k (u) that appear in Proposition 2.1. Choose a point x ∈ Σ, and an identification of tangent plane Tx Σ with Rn . Locally we may make a smooth choice of ordered orthonormal frame nearby points Σ so that at each point y there now is a fixed identification of the tangent plane. At each nearby point y ∈ Σ, we can represent Σ as the graph of a function Uy over the tangent plane Ty Σ. Each Uy will satisfy (2.7)
Uy (0) = 0,
(2.8)
DUy (0) = 0.
By our choice of identification, the functions Uy are well defined and for y near x and z ∈ Ty Σ near 0, the function (y, z) 7→ Uy (z) has the same regularity as Σ. Fixing a point x, consider a function f on Σ. The function f is locally well defined as a function over the tangent plane, i.e. (2.9)
f (y) = f (y, Ux(y)) for y ∈ Tx Σ.
With the above identification we obtain coordinates on the tangent plane at x, and we may take derivatives of f in this new coordinate system to define the m-jet of f
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
17
at the point x by J m f (x) = (f (x), Df (x), ..D mf (x)) .
(2.10)
More concretely, all derivatives in (2.10) are taken with respect to the variable y in (2.9). Since Σ is compact, there exists τ0 > 0 such that for every y ∈ Σ we have (1) The function Uy is defined and smooth on Bτ0 (0) ⊂ Ty Σ, (2) BRN ,τ0 (y) ∩Σ is contained in the graph of Uy over the ball Bτ0 (0) ⊂ Ty Σ where BRN ,τ0 (y) is the ball in RN centered at y with respect to the ambient distance.
We will use the following notation: given y ∈ Σ and τ0 > 0 as above, we let Σy,τ = Σ ∩ {(z, Uy (z)), y ∈ Bτ (0) ⊂ Ty Σ} ,
(2.11)
in other words, Σy,τ0 is the part of Σ contained in the graph of Uy on Bτ0 (0) ⊂ Ty Σ. Observe that with this notation, the statement in (2) above simply says that BRN ,τ0 (y) ∩ Σ ⊂ Σy,τ0 .
(2.12)
Observe that for any f ∈ L∞ (Σ) we have Z Z 2 − kx−yk f (y)e 2t dµ(y) = (2.13) Σ
f (y)e−
kx−yk2 2t
dµ(y)
Σx,τ0
(2.14)
+
Z
f (y)e−
kx−yk2 2t
dµ(y),
Σ\Σx,τ0
and by (2) (2.15)
Z
−
f (y)e
kx−yk2 2t
Σ\Σx,τ0 (x)
dµ(y) ≤ kf k
L∞ (Σ)
τ2
− 2t0
e
,
Note also, that for any polynomial p(z), there is a constant C such that on Z τ02 2 (2.16) e−kzk /2 p(z)dz ≤ C(p)e− 2t . Rd \Bτ /√t 0
The volume form over Tx Σ will be q (2.17) µx (z)dz = det (δij + hDi Ux (z), Dj Ux (z)i).
If in (1.24) we choose our distance to be the ambient distance k · k in RN and the measure µ to be the volume measure in Σ, the density θt (x) takes the form Z kx−yk2 θt (x) = e− 2t dµ(y). (2.18) Σ
In the following, we will use Tk f (x)(y) to denote the k-th order term in the Taylor expansion of f at x, in the variable y.
18
ANTONIO G. ACHE AND MICAH W. WARREN
We now prove (2.1). Observe that Z kx−zk2 e− 2t dµ(z) θt (x) − Σ\Σx,τ0
=
Z
e−(kzk
2
+kUx (z)k2 )/2t
µx (z)dz
Bτ0
=t
Z
= tn/2
Z
n/2
2
e−kwk
Bτ
− tn/2
√ 0/ t 2
e−kwk
ZR
d
√ 2 /2 −kUx ( tw)k /2t
e
√ 2 /2 −kUx ( tw)k /2t
e
2
e−kwk
Rd \Bτ
√ 0/ t
√ µx ( tw)dw
√ µx ( tw)dw
√ 2 /2 −kUx ( tw)k /2t
e
√ µx ( tw)dw
Now considering (2.17) (2.8) we have √ (2.19) µx ( tw) = 1 + tT2 µx (0)(w) + t3/2 R2 µx (0, w) √ i h i h √ √ 2 2 kUx ( tw)k2 2t (2.20) e− = 1 + tT4 e−kUx ( t·)k /2t (0)(w) + t3/2 R4 e−kUx ( t·)k /2t (0, w).
Expanding, collecting lower order terms, integrating and absorbing the exponentially decaying terms into t3/2 R1 (x, z) using (2.15) and (2.16) yields (2.1). Next we prove (2.4). First compute Z kx−yk2 (f (y) − f (x)) e− 2t dµ(y). Σ
Z
Bτ0
−
(f (y) − f (x)) e
Now,
(2.21)
kx−yk2 2t
d/2
µx (y)dy = t
Z
Bτ
e− 0/
√
t
kzk2 2
√ √ f ( tz) − f (0) µx ( tz)dz
√ √ f ( tz) − f (0) = tT1 f (z) + tT2 f (z) + t3/2 T3 f (z) + t2 T4 f (z) + t5/2 R4 (z)
and also recall (2.19). (2.20). Note that if A is a symmetric d × d matrix we have the identity Z (2.22) z T Azdz = (2π)d/2 tr(A), Rd
where tr denotes Trace. From this it follows that Z kzk2 (2.23) e− 2 T2 f (0)(z)dz = (2π)d/2 tr(T2 f (0)). Rd
Again, expanding, collecting lower order terms, integrating odd and even terms, and absorbing the exponentially decaying terms via (2.15) and (2.16) yields Z kx−yk2 −d/2 (f (y) − f (x)) e− 2t dµ(y) = t (2π)d/2 ∆g f + t3/2 G4 (J 3 f (x)) t Σ
+ t2 G5 (J 4 f (x)) + t5/2 R3 (J 5 f (x)).
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
19
Combining with (2.1) yields (2.4). A very similar calculation yields (2.2). 2.2.1. Proof of Corollary 2.2. Directly from (2.2) (2.24) Lt (Γt (f, f ))(x) = Lt k∇f k2 (x) + t1/2 G2 (x, J 2 (f )(x)) + tG3 (x, J 3 (f )(x)) (2.25)
+ t3/2 Lt R2 (x, J 4 (f )(x))
This last term can be bounded directly by the definition of Lt : Z 2 − kx−yk 3/2 3/2 1 2 (R(y) − R(x)) e 2t dµ(y) t |Lt R(x)| = t t θt (x) Z 1/2 2 kx−yk2 − ≤ t 2 kRkL∞ e 2t dµ(y) θt (x) = 4t1/2 kRkL∞ .
The first three terms are differentiable, so can be dealt with directly by (2.4), giving an expression involving J 5 (f )(x). The estimate (2.6) follows from a similar argument. The result follows by combining the above lemmata for the first term, and then directly bounding the second term. 2.3. Bias for Smooth Metric Measure Space with a Density. The bias estimate Theorem 1.9 for a metric measure space with density will follow from the following proposition whose proof is very similar to that of Proposition 2.1. Recall definitions (1.48) etc. Proposition 2.3. Let f ∈ C 5 . We have the following expansions (2.26)
Lαt f (x) =∆g f (x) + (1 − α)h∇f (x), ∇ρ(x)ig
+ t1/2 G1 (x, J 3 (f )) + t3/2 R1 (x, J 5 (f ))
(2.27) Γαt (f, h)(x) = h∇f, ∇hig + t1/2 G2 (x, J 2 (f ), J 2 (h)) + t3/2 R2 (x, J 4 (f ), J 4(h)). Proof. Following the proof of Proposition 2.1 we have the following expansions (2.28) θt (x) = (2πt)d/2 e−ρ(x) 1 + tG1 (x, ρ) + t3/2 R1 (x, ρ) , (2.29)
θt,α (x) = (2πt)(1−α)d/2 e(α−1)ρ(x) (1 + tG2 (x, ρ)R2 (x, ρ)) ,
as t → 0. Also, taking coordinates on the tangent plane of Σ at the point x and identifying x with 0 we have the expansion (2.30)
dµx (z) = e(α−1)ρ(0) 1 + (1 − α)hDρ(0), zi + O(kzk2 ) dz, α θt (z)
which holds in a small neighborhood of 0. The rest of the proposition follows from a straightforward computation.
20
ANTONIO G. ACHE AND MICAH W. WARREN
Corollary 2.4. We have the following expansions Lαt (Γαt (f, h))(x) = ∆h∇f, ∇hig (x) + (1 − α)h∇ρ, ∇h∇f, ∇hig ig (x) + t1/2 R3 (x, J 5 (f ), J 5 (h), J 5 (ρ)),
Γαt (Lαt f, h)(x) = h∇∆g f, ∇hig + (1 − α)h∇h∇ρ, ∇f ig , ∇hig + t1/2 R4 (x, J 5 (f ), J 5 (h), J 5 (ρ)).
as t → 0. From Corollary 2.4 we obtain Theorem 1.15. 3. Convergence of Coarse Ricci to Actual Ricci on Smooth Manifolds We start by proving Proposition 1.8 Proof of Proposition 1.8. Take normal coordinates and without loss of generality assume that V = ∂1 . Then in these coordinates 1 d2 (λe1 , z) − kzk2 − λ2 fx,expx (λV ) (z) = (3.1) = he1 , zi + G(λ, z) 2 λ where G(λ, z) is a smooth function which vanishes to fourth order, by a classical result of Synge [Syn31, Section 5]. Applying the Bochner formula and taking the limit gives the result. We now prove Theorem 1.11: Proof of Theorem 1.11 . Our goal is to show that |Ric(γ ′ (0), γ ′ (0)) − RicLt (x, γ(s))| ≤ C1 t1/2 + C2 s.
First, note that letting
fs = fx,γ(s) (·) = and we have (3.2)
γ(s) − x ,· |γ(s) − x|
f0 = hγ ′ (0), ·i
|RicLt (x, γ(s)) − Ric(γ ′ (0), γ ′ (0))| = |Γ2 (Lt , fs , fs )(x) − Ric(γ ′ (0), γ ′ (0))| Γ2 (Lt , fs , fs )(x) − Γ2 (△g , fs , fs )(x) (3.3) = +Γ2 (△g , fs , fs )(x) − Γ2 (△g , f0 , f0 )(x) +Γ2 (△g , f0 , f0 )(x) − Ric(γ ′ (0), γ ′(0)) (3.4)
≤ R(J 5 fs )t1/2 + C7 (Σ)s.
Here we have used the following the facts:
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
21
First, |Γ2 (Lt , fs , fs )(x) − Γ2 (△g , fs , fs )(x)| ≤ R(x, J 5 fs )t1/2
by Corollary 2.2. Second, a straightforward computation yields that for any two functions f0 , fs (3.5) |Γ2 (△g , fs , fs )(x) − Γ2 (△g , f0 , f0 )(x)| ≤ (kf0 kC 2 + kfs kC 2 ) kfs − f0 kC 2
+ kf0 kC 3 kfs − f0 kC 1 + kfs kC 1 kfs − f0 kC 3 .
(3.6)
Since the functions f0 , fs are ambient linear functions restricted to a submanifold, the higher derivatives are well-controlled. The derivatives of the difference are controlled as follows
γ(s) − x ′
− γ (0), · kfs − f0 kC 3 =
3 kγ(s) − xk C
γ(s) − x
′
≤
kγ(s) − xk − γ (0) kzkC 3 where kzkC 3 is the norm of the derivatives of the coordinate functions, which is also controlled by the geometry of Σ. Certainly, for any unit speed curve with bounded curvature we have
γ(s) − x ′
kγ(s) − xk − γ (0) ≤ κs.
The curvature of any geodesic inside Σ is controlled by the geometry of Σ. Finally, because ∇f0 = γ ′ (0) we have by the Bochner formula (3.7)
Γ2 (△g , f0 , f0 )(x) − Ric(γ ′ (0), γ ′(0)) = D 2 f0 .
Using the tangent plane as coordinates at a point, it is easy to compute that the Hessian of any coordinate function vanishes at the origin. The vector γ ′ (0) is in the tangent space, so we conclude that (3.7) vanishes. 4. Empirical Processes and Convergence The goal of this section is to prove Theorem 1.12. This will be done using tools from the theory of empirical processes in order to establish uniform laws of large numbers in a sense that we will explain in Sections 4.2 through 4.6. For a standard reference in the theory of empirical processes, see [vdVW96]. See also [SW13] for further applications of the theory of empirical processes to the recovery of diffusion operators from a sample.
22
ANTONIO G. ACHE AND MICAH W. WARREN
4.1. Estimators of the Carr´ e Du Champ and the Iterated Carr´ e Du Champ in the uniform case. Let us assume that the measure µ is the volume measure dvolΣ . Recall that our formal definition of the Carr´e du Champ of Lt with respect to the uniform distribution is given by Z 2 1 1 − kx−yk e 2t (f (y) − f (x)) (h(y) − h(x)) dµ(y) . (4.1) Γ(Lt , f, h) = t θt (x) Σ
It is clear from (4.1) that a sample estimator of the Carr´e Du Champ at a point x is given by ! X kx−ξj k2 1 1 1 ˆ t , f, f )(x) = (4.2) Γ(L e− 2t (f (ξj ) − f (x)) (h(ξj ) − h(x)) , t θˆt (x) n j=1
and recall that we defined the t-Laplace operator by Z kx−yk2 2 1 e− 2t (f (y) − f (x)) dµ(y), Lt f (x) = (4.3) t θt (x) and its sample version is
n
X kx−ξj k2 ˆ t f (x) = 2 1 1 e− 2t (f (ξj ) − f (x)) . L ˆ t θt (x) n j=1
(4.4)
Recall that the iterated Carr´e du Champ is 1 Γ2 (Lt , f, h) = (Lt Γ(Lt , f, h) − Γ(Lt , Lt f, f ) − Γ(Lt , f, Lt h)) . (4.5) 2 For simplicity, we will evaluate [Γ2 ]t at a pair (f, f ) instead of (f, h) and by symmetry it is clear that we obtain 1 Γ2 (Lt , f, f ) = (Lt (Γt (f, f )) − 2Γt (Lt f, f )) . (4.6) 2 Combining the sample versions of Γt and Lt we obtain a sample version for Γ2 (Lt , f, f ) (4.7) (4.8)
n kx−ξj k2 kξj −ξk k2 1 1 1 XX ˆ e− 2t − 2t (f (ξj ) − f (ξk ))2 Γ2 (Lt , f, f )(x) = 2 2 t n j=1 k=1 θˆt (ξk )θˆt (x) n n 1 X X 1 − kx−ξj k2 − kx−ξk k2 2t 2t (f (ξk ) − f (x))2 − 2 2 e 2 ˆ t n j=1 θt (x) k=1
(4.9) (4.10)
n n kx−ξj k2 kξj −ξk k2 2 XX 1 − 2 2 e− 2t − 2t (f (ξk ) − f (ξj ))(f (ξj ) − f (x)) t n j=1 k=1 θˆt (x)θˆt (ξj )
+
n n 2 X X 1 − kx−ξj k2 − kξj −ξk k2 2t 2t e (f (ξk ) − f (x))(f (ξj ) − f (x)). t2 n2 j=1 k=1 θˆt2 (x)
In principle, the convergence analysis for (4.7)-(4.10) can be done using the following standard result in large deviation theory
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
23
Lemma 4.1 (Hoeffding’s Lemma). Let ξ1 , . . . , ξn be i.i.d. random variables on the probability space (Σ, B, µ) where B is the Borel σ-algebra of Σ, and let f : Σ → [−K, K] be a Borel measurable function with K > 0. Then for the corresponding empirical measure µn and any ε > 0 we have (4.11)
ε2 n
Pr {|µn f − µf | ≥ ε} ≤ 2e− 2K2 .
Observe, however, that (4.7)-(4.10) is a non-linear expression which will involve non-trivial interactions between the data points ξ1 , . . . , ξn . This non-trivial interaction between the points ξ1 , . . . , ξn will produce a loss of independence and we will not be able to apply Hoeffding’s Lemma directly to (4.7)-(4.10). In order to address this difficulty we will establish several uniform laws of large numbers which will provide us with a large deviation estimate for (4.7)-(4.10). Remark 4.2. We will not use directly the expression (4.7)-(4.10), instead we will write (4.7)-(4.10) schematically in the form 1 ˆ ˆ ˆ ˆ ˆ (4.12) Lt Γ(f, f ) (x) − 2Γt (Lt f, f )(x) , Γ2 (Lt (f, f )(x) = 2 which is clearly equivalent to (4.7)-(4.10).
4.2. Glivenko-Cantelli Classes. A Glivenko-Cantelli class of functions is essentially a class of functions for which a uniform law of large numbers is satisfied. Definition 4.3. Let µ be a fixed probability distribution defined on Σ. A class F of functions of the form f : Σ → R is Glivenko-Cantelli if
(a) f ∈ L1 (dµ) for any f ∈ F , (b) For any i.i.d. sample ξ1 , . . . , ξn drawn from Σ whose distribution is µ we have uniform convergence in probability in the sense that for any ε > 0 ∗ (4.13) lim Pr sup |µn f − µf | > ε = 0. n→∞
f ∈F
Remark 4.4. Note that in general we have to consider outer probabilities Pr∗ instead of Pr because the class F may not be countable and the supremum supf ∈F |µn f − µf | may not be measurable. On the other hand, if the class F is separable in L∞ (Σ), then we can replace Pr∗ by Pr. While all of the classes we will encounter in this paper will be separable in L∞ (Σ), we use Pr∗ when we deal with a general class. Let F be a class of functions defined on Σ and totally bounded in L∞ (Σ). Given δ > 0 we let N (F , δ) be the L∞ δ-covering number of F , i.e., (4.14)
N (F , δ) = inf{m : F is covered by m balls of radius δ in the L∞ norm}.
Lemma 4.5. Let F be an equicontinuous class of functions in L∞ (Σ) and satisfies sup{kf kL∞ (Σ) } ≤ M < ∞ for some M > 0. Then for any distribution µ which is f ∈F
24
ANTONIO G. ACHE AND MICAH W. WARREN
absolutely continuous with respect to dvolΣ the class F is µ-Glivenko-Cantelli. Moreover, if ξ1 , . . . , ξn is an i.i.d. sample drawn from Σ with distribution µ we have ε ε2 n ∗ e− 8M 2 . Pr sup |µn f − µf | ≥ ε ≤ 2N F , (4.15) 4 f ∈F Proof. By equicontinuity of F , it follows from the Arzel`a-Ascoli theorem that F is precompact in the L∞ (Σ) norm and hence totally bounded in L∞ (Σ). In particular for every δ > 0, the number N (F , δ) is finite. Let G be a finite class such that the union of all balls with center in G and radius δ covers F and |G| = N (F , δ). For any f ∈ F there exists φ ∈ G such that kf − φkL∞ (Σ) < δ and we obtain |µn f − µf | ≤ 2δ + |µn φ − µφ|,
(4.16) and clearly
sup |µn f − µf | ≤ 2δ + max |µn φ − µφ|.
(4.17)
f ∈F
φ∈G
Fixing ε > 0 and choosing δ = ε/4 we observe that ∗ (4.18) Pr sup |µn f − µf | ≥ ε ≤ Pr max |µn φ − µφ| ≥ ε/2 , φ∈G
f ∈F
and by Hoeffding’s inequality we have ε ε2 n ε ≤ 2N F , e− 8M 2 , (4.19) Pr max |µn φ − µφ| ≥ φ∈G 2 4 which implies the lemma. We will frequently use the following classes of functions 2 t −1/2 − kξ−ζk 2t Ff,h = φt (ξ, ζ) = t e (4.20) (f (ξ) − f (ζ))(h(ξ) − h(ζ)) : ξ ∈ Σ , kξ−ζk2 t 1/2 − 2t G = ψt (ξ, ζ) = t e (4.21) :ξ∈Σ , t where f, h in (4.20) are fixed functions. We will also use Fft to denote the class Ff,f . Observe that for the classes in (4.20) and (4.21) one has 2 1/2 t (4.22) t kf kLipkhkLip , MFf,h = sup {kφkL∞ (Σ) } ≤ t e φ∈Ff,h
(4.23)
MG t = sup {kψkL∞ (Σ) } = t1/2 . ψ∈Gt
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
25
4.3. Ambient Covering Numbers. In this subsection we show that the computat tion of covering numbers of the classes of functions Ff,h and G t introduced in (4.20), (4.21) reduces to the computation of covering numbers of submanifolds of RN . For this purpose we will need the notion of ambient covering number of Σ defined by [ (4.24) A(Σ, δ) = inf{m : ∃A with |A| = m and Σ ⊂ BRN ,δ (a)}. a∈A
where the ball BRN ,δ (a) is taken with respect to the ambient distance k · k.
˜ ζ ∈ RN we have Lemma 4.6. For any ξ, ξ, ˜ 2 1/2 − kζ−ξk2 kζ−ξk 1/2 − ˜ t e 2t − t e 2t ≤ e−1/2 kξ − ξk. (4.25) In particular
N (G t , ε) ≤ A(Σ, e1/2 ε).
(4.26)
Proof. Observe that the function ψt (ξ, ζ) = t1/2 e− Dξ ψt (ξ, ζ) = t1/2
(4.27) and therefore (4.28)
sup kDξ ψt (ξ, ζ)k ≤
ξ,ζ∈RN
kζ−ξk2 2t
satisfies
(ζ − ξ) − kξ−ζk2 e 2t , t
√
o n 1 2 2 sup ρe−ρ = e− 2 , ρ>0
and therefore we have the Lipschitz estimate ˜ ζ)| ≤ e−1/2 kξ − ξk. ˜ (4.29) |ψt (ξ, ζ) − ψt (ξ,
Corollary 4.7. Fix a function 0 6= h ∈ L∞ (Σ) with khkL∞ (Σ) ≤ C and consider the class of functions (4.30)
Hht = {ψt (ζ, ·)h(·) : ζ ∈ Σ} .
Then for every ε > 0 we have (4.31)
N (Hht , ε)
e1/2 ≤ A Σ, ε . C
Proof. We use Lemma 4.6 to obtain the estimate
′
(4.32) ≤ Ckψt (ζ, ·) − ψt (ζ ′ , ·)kL∞ (Σ)
ψt (ζ, ·)h(·) − ψt (ζ , ·)h(·) L∞ (Σ)
(4.33)
from which the corollary follows.
≤ Ce−1/2 kζ − ζ ′ k,
For Lemma 4.8 we will use the following notation:
26
ANTONIO G. ACHE AND MICAH W. WARREN
• For a Lipschitz function f defined on the ambient space RN we will write kf kLip to denote the Lipschitz norm of f with respect to the ambient distance k · k, i.e., |f (x) − f (y)| kf kLip = inf (4.34) . kx − yk x,y∈RN ,x6=y • For a function f ∈ C k (RN ) we will use kf kC k to denote the following kf kC k = kf k
(4.35)
(4.36)
L∞ (Σ)
+
k X j=1
kDj f kL∞ (Σ) .
In particular, when in (4.35) we write kDj f kC k we mean kDj f kL∞ (Σ) = sup kDj f (x)k, x∈Σ
i.e., the norm kDj f (x)k is the norm in the ambient space RN . ′
t Lemma 4.8. For any φt (ξ, ·), φt(ξ , ·) ∈ Ff,h we have
(4.37)
′
′
sup |φt (ξ, ζ) − φt (ξ , ζ)| ≤ C(f, h)kξ − ξ k,
ζ∈Rd
where (4.38)
C(f, h) = C0 (kf kLip khkLip + kf kC 1 khkLip + khkC 1 kf kLip )
and C0 is a universal constant. Thus
(4.39)
t N (Ff,h , δ)
≤ A Σ,
δ C(f, h)
.
t Proof. Let φt (ξ, ζ) ∈ Ff,h , then we have
(4.40) (4.41) (4.42) and then (4.43) (4.44) (4.45) (4.46)
Dξ φt (ξ, ζ) =t−1/2
(ζ − ξ) − kξ−ζk2 e 4t (f (ξ) − f (ζ))(h(ξ) − h(ζ)) 2t
+ t−1/2 e− + t−1/2 e−
kξ−ζk2 4t kξ−ζk2 2t
Dξ f (ξ)(h(ξ) − f (ζ)) Dξ h(ξ)(f (ξ) − f (ζ)),
kξ − ζk3 − kξ−ζk2 kDζ φt (ξ, ζ)k ≤ kf kLipkhkLip e 2t t3/2 kξ − ζk − kξ−ζk2 e 2t kf kC 1 khkLip + t1/2 kξ − ζk − kξ−ζk2 + e 2t khkC 1 kf kLip t1/2 ≤ C0 (kf kLip khkLip + kf kC 1 khkLip + kf kLipkhkC 1 ) .
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
27
where
3 −ρ2 /2
C0 = max sup{ρ e
(4.47)
ρ>0
−ρ/2
}, sup{ρe ρ>0
} .
It follows that for any ξ ∈ Σ we have the Lipschitz estimate ′
′
|φt (ξ, ζ) − φt (ξ, ζ )| ≤ C(f, h)kξ − ξ k.
(4.48)
The following can be obtained in a similar fashion. Lemma 4.9. Let f ∈ C 1 and let υt be given by υt (ξ, ζ) = e−
kξ−ζk2 2t
(f (ξ) − f (ζ)).
We then have the following estimate 2t ′ ′ (4.49) kf kLip + kf kC 1 kξ − ξ k. υt (ξ, ζ) − υt (ξ , ζ) ≤ e Proof. Follows from
(4.50) (4.51)
kξ−ζk2 kξ − ζk − kξ−ζk2 e 2t |f (ξ) − f (ζ)| + e− 2t kDξ f (ξ)k t kξ − ζk2 − kξ−ζk2 ≤ e 2t kf kLip + sup kDξ f (ξ)k, t x∈Σ
|Dx υt (ξ, ζ)| ≤
and (4.52)
2 2 sup ρ2 e−ρ = . e ρ>0
t Lemmas 4.6 and 4.8 say that we can relate covering numbers of the classes Ff,h , Gt to ambient covering numbers of the submanifold Σ. In order to estimate A(Σ, δ) we need to introduce the notion of reach of an embedded submanifold of RN . For every ε > 0 we can consider the ε neighborhood of Σ
(4.53)
Σε = {x ∈ RN : d(x, Σ) < ε},
where d(·, Σ) measures the distance from points in RN to Σ with respect to the ambient norm k · k. If Σ is a smooth, embedded submanifold of RN , for ε > 0 sufficiently small we can define a smooth map ϕ : Σε → Σ such that (1) ϕ is smooth, (2) ϕ(x) is the unique point in Σ such that kϕ(x) − xk = d(x, Σ) for all x ∈ Σε . ⊥ (3) x − ϕ(x) ∈ Tϕ(x) Σ, (4) ϕ(y + z) ≡ ϕ(y) for all y ∈ Σ and z ∈ (Ty Σ)⊥ with kzk < ε, (5) For any vector V ∈ RN , DV ϕ(x) = pϕ(x) (V ), where pϕ(x) (V ) is the orthogonal projection of V onto Tϕ(x) Σ. See for example [Sim96, Theorem1]. The map ϕ is called nearest point projection onto Σ.
28
ANTONIO G. ACHE AND MICAH W. WARREN
Definition 4.10. Let Σ be an embedded submanifold of RN . The reach of Σ is the number (4.54)
τ = sup{There exists a nearest point projection in Σε }. ε>0
We have the following result Theorem 4.11 ([FMN13]Corollary 6). Suppose that Σ is a d-dimensional embedded submanifold of RN with volume V and reach τ > 0 and let U : R+ → R+ be the function 1 d (4.55) +r , U(r) = V τd
then for any ε > 0 there is an ε net of Σ with respect to the ambient distance k · k of no more than Cd U(ε−1 ) points where Cd is a dimensional constant. In particular, (4.56)
A(Σ, ε) ≤ Cd U(ε−1 ).
Corollary 4.12. We have the following bounds (a) (4.57)
t N (Ff,h , ε) ≤ Cd V
1 + τd
N (G t , ε) ≤ Cd V
1 + τd
1 + τd
C(f, h) ε
d !
(b) (4.58)
1 e1/2 ε
d !
,
(c) (4.59)
N (Hht , ε) ≤ Cd V
C e1/2 ε
d !
.
where C is an upper bound for khkL∞ (Σ) . 4.4. Sample Version of the Carr´ e du Champ. In this section we are still assuming that the distribution of the sample ξ1 , . . . , ξn in Σ is uniform, i.e., dµ = dvolΣ . In this case we know that limt→0 t−d/2 θt = (2π)d/2 uniformly in L∞ (Σ) and therefore there exists t0 > 0 such that for 0 < t < t0 we have 1 2(2π)d/2 ≥ θt ≥ (2π)d/2 . 2 If we let (2π)d/2 (4.60) , λ0 = 4 we have for 0 < t < t0 the inequality (4.61)
θt (x) ≥ 2td/2 λ0 ,
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
29
which will be a convenient normalization for us in the sequel. The main lemma in this section is the following. Lemma 4.13. Suppose F is a class of functions totally bounded in L∞ (Σ) of the form f (x, ·) for x in Σ for a fixed f ∈ L∞ (Σ × Σ) and M = kf kL∞ (Σ×Σ) . Suppose also 0 < t < t0 and ε is small enough so that εtd+1/2 λ0 < M.
(4.62) Then (4.63) (4.64) (4.65)
) µ f (x, ·) µf (x, ·) n Pr∗ sup t−1/2 − t−1/2 ≥ε ˆ θ (x) x∈Σ t θt (x) d+1 2 λ0 ε2 nt2d λ40 t εt ≤ 2N G , exp − 4M 8M 2 ε2 λ20 t2d+1 εtd+1 λ20 exp − . + 2N F , 4 8M 2 (
Before proving Lemma 4.13 we will prove the following elementary lemma. Lemma 4.14. Let ξ, ζ be positive random variables. For any ε > 0 we have 1 1 εξ 2 . Pr − ≥ ε ≤ Pr |ζ − ξ| ≥ ξ ζ 1 + εξ Proof. Assume 0 < ζ < ξ
and from
we have
1 1 ξ−ζ ξ−ζ − = = ζ ξ ξζ ξζ − ξ 2 + ξ 2 ζ −ξ = ξ (ζ − ξ) + ξ 2 |ζ − ξ| , = −ξ|ζ − ξ| + ξ 2 |ζ − ξ| ≤ε −ξ |ζ − ξ| + ξ 2 |ζ − ξ| ≤
εξ 2 . 1 + εξ
30
ANTONIO G. ACHE AND MICAH W. WARREN
For the case 0 < ξ < ζ we have 1 1 ζ −ξ |ξ − ζ| − = = . 2 ξ ζ ξ(ζ − ξ) + ξ ξ|ξ − ζ| + ξ 2 and
|ξ−ζ| ξ|ξ−ζ|+ξ 2
≥ ε implies
(1 + εξ)|ξ − ζ| ≥ (1 − εξ)|ξ − ζ| ≥ εξ 2 .
Proof of Lemma 4.13. It is easy to prove from Lemma 4.14 and (4.61) that for any δ > 0 we have ) 1 1 − Pr sup ≥δ ˆ θ (x) x∈Σ θt (x) t 4δtd λ20 ˆ ≤ Pr sup θt (x) − θt (x) ≥ . 1 + 2δtd/2 λ0 x∈Σ (
(4.66) (4.67) Let us write t−1/2
µn f (x, ·) µf (x, ·) − θt (x) θˆt (x)
!
= t−1/2 µn f (x, ·) + t−1/2
1
1 − θˆt (x) θt (x)
!
1 [µn f (x, ·) − µf (x, ·)] . θt (x)
Thus ) µ f (x, ·) µf (x, ·) n − Pr t−1/2 sup ≥ε ˆt (x) θt (x) x∈Σ θ ( ) 1 1 ε t1/2 ≤ Pr sup − ≥ ˆt (x) θt (x) 2 M x∈Σ θ (d+1)/2 . + Pr sup |µn f (x, ·) − µf (x, ·)| ≥ ελ0 t (
x∈Σ
Analyzing the first term and using (4.62) and (4.66)-(4.67) leads us to the inequality ( ) 1 ε t1/2 1 Pr sup − ≥ ˆ θt (x) 2 M x∈Σ θt (x) 2εtd+1/2 λ20 ˆ ≤ Pr sup θt (x) − θt (x) ≥ M + εtd+1/2 λ0 x∈Σ εtd+1/2 λ2 0 ≤ Pr sup θt (x) − θˆt (x) ≥ M x∈Σ 2 2d 4 d+1/2 2 εt λ0 − ε t 2λ0 ≤2N G t , e 8M , 4M
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
31
where we have applied Lemma 4.5 to the class G t (in particular we have used (4.23)). Finally, (4.68) 2 2 2d+1 ε λ0 t εtd+1 λ20 (d+1)/2 exp − . ≤ 2N F , Pr sup |µn f (x, ·) − µf (x, ·)| ≥ ελ0 t 4 8M 2 x∈Σ In view of Lemma 4.13, we introduce the following notation Definition 4.15. Given a class of functions F as in the statement of Lemma 4.13 and positive numbers t, ε, M we define for compactness of notation the following function (4.69) (4.70)
2 2d 4 εtd+1 λ20 ε nt λ0 Qt (F , ε, M, n) = 2N G , exp − 4M 8M 2 εtd+1 λ20 ε2 λ20 t2d+1 + 2N F , exp − . 4 8M 2
t
As a corollary we obtain the rate of convergence in probability of the sample Carr´e du Champ to its expected value. Corollary 4.16. Letting K = 2e kf kLipkhkLip we have (4.71)
ˆ t Pr sup Γt (f, h)(x) − Γt (f, h)(x) ≥ ε ≤ Qt (Ff,h , ε, Kt1/2 , n). x∈Σ
Proof. Recall that (4.72)
n 1 1 X − kx−ξj k2 ˆ 2t Γt (Lt , f, h)(x) = (f (ξj ) − f (x))(h(ξj ) − h(x)) e t θˆt j=1 n
(4.73)
=
t−1/2 X µn (φt (x, ·)) φt (x, ξj ) = t−1/2 , θˆt (x) θˆt (x) j=1
t where φt is given by the definition of the class Ff,h in (4.20) and therefore we can t apply Lemma 4.13 to the class Ff,h and use the bound
(4.74)
sup kψkL∞ (Σ) ≤ t1/2 K.
t ψ∈Ff,h
which follows from (4.22).
32
ANTONIO G. ACHE AND MICAH W. WARREN
ˆ t , we see that the If we now consider the t-Laplacian Lt and its sample version L ∞ ˆ t from Lt on a function h ∈ L (Σ) with khkL∞ (Σ) ≤ M simplifies to deviation of L ˆ t h(x) − Lt h(x) = (4.75) L
2
n X
e−
kξj −xk2 2t
(h(ξj ) − h(x)) θˆt (x)tn j=1 Z kξ−xk2 2 e− 2t (h(ξ) − h(x))dµ(ξ) − tθt (x) Σ Z n X kξ −xk2 kξ−xk2 2 2 − j 2t e− 2t h(ξ)dµ(ξ) e h(ξj ) − = tθt (x) Σ θˆt (x)tn j=1
(4.76) (4.77)
=
(4.78)
2µn ψt (x, ·)h(·) 2µψt (x, ·)h(·) . − t3/2 θt (x) t3/2 θˆt (x)
Observe that (4.79)
( ) µη µ η ˆ n ≥ tε . Pr sup Lt h(x) − Lt h(x) ≥ ε ≤ Pr t−1/2 sup − ˆ t θ x∈Σ t θt η∈Hh
We have obtained
Corollary 4.17. Fix a function h ∈ L∞ (Σ). If we set M = khkL∞ we have ˆ (4.80) Pr sup Lt h(x) − Lt h(x) ≥ ε ≤ Qt (Hht , εt, t1/2 M, n). x∈Σ
Proof. The proof follows from combining Lemma 4.13 with (4.79) and the fact that (4.81)
sup kηkL∞ (Σ) ≤ t1/2 khkL∞ (Σ) = t1/2 M.
η∈Hth
4.5. Subexponential Decay and Almost Sure Convergence. The goal of this subsection is to demonstrate that the decay rate for the quantities Qt (F , ε, M, n) implies the almost sure convergence. We illustrate the Borel-Cantelli type proof in this section for purpose of introducing some notation. This notation will be used in later sections. Theorem 4.18. Consider the metric measure space (Σ, k·k, dvolΣ ) where Σd ⊂ RN is a smooth closed embedded submanifold. Suppose that we have an uniformly distributed i.i.d. sample {ξ1 , . . . , ξn } of points from Σ.For σ > 0, let 1
tn = n− 2d+σ .
(4.82)
Then we have for fixed f, h ∈ Lip(Σ) a.s. ˆ sup Γ(Ltn , f, h)(ξ) − Γ(Ltn , f, h)(ξ) −→ 0. ξ∈Σ
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
33
Proof. For a fixed n and ε, we have that ˆ tn Pr sup Γ(Ltn , f, h)(x) − Γ(Ltn , f, h)(x) ≥ ε ≤ Qtn (Ff,h , ε, Kt1/2 n , n). x∈Σ
Now plugging in the expression for tn we observe a bound of the form 1 1 tn 1/2 exp −c1 ε2 nσ/(2d+σ) (4.83) Qtn (Ff,h , ε, Ktn ) ≤ p n 2(2d+σ) , ε
where p is a fixed polynomial bound and c1 > 0 is a constant. Thus with ε fixed, we have ∞ X tn , ε, Kt1/2 Qtn (Ff,h n , n) < ∞. n=1
Applying the Borel-Cantelli Lemma gives the almost sure convergence.
Now we see that for almost sure convergence, one requires a bound of the form (4.83). For this reason, we introduce notation for use in the sequel: consider a function Q : R+ × R+ × N → R+ . We say that Q(t, ε, n) ∈ OBC (β) if ∞ X n=1
1
Q(n− β+σ , ε, n) < ∞
for all σ, ε > 0. Clearly the definition gives OBC (β) ⊂ OBC (β ′ ) for β ′ > β. We also observe that OBC (β) + OBC (β ′ ) ∈ OBC ((max {β, β ′}) and that if for a class of functions F , a positive number K we set Q(t, ε, n) = Qt (F , ε, K, n) then Qt (F t , ε, M, n) ∈ OBC (β)
=⇒ Qt (F t , εtα , Ktδ , n) ∈ OBC (β + 2α − min{2δ, 1}).
We record the following, which is clear from the definitions and Corollary 4.12. Corollary 4.19. For the classes of functions define by (4.20) (4.21)and (4.30) and ε, M > 0 fixed we have Qt (F , ε, K, n) ∈ OBC (2d).
34
ANTONIO G. ACHE AND MICAH W. WARREN
4.6. Proof of Theorem 1.12. Recall that 1 Γ2 (Lt , f, f ) = (Lt (Γt (f, f )) − 2Γt (Lt f, f )) , (4.84) 2 and that from Remark 4.2 ˆt Γ ˆ t (f, f ) − 2Γ ˆ t (L ˆ t f, f ) , ˆ 2 (Lt , f, f ) = 1 L (4.85) Γ 2 ˆt Γ ˆ t (f, f ) − Lt (Γt (f, f )) which we so we will start by estimating the difference L write as ˆ t (Γ ˆ t (f, f )) − L(Γt (f, f ))(x) =L ˆt Γ ˆ t (f, f ) − Γt (f, f ) (x) L (4.86) ˆ t − Lt (x)Γt (f, f ) (4.87) + L (4.88)
= A1 (x) + A2 (x),
and observe that 4 ˆ ˆ ˆ Γ (f, f ) − Γ (f, f ) ≤ |A1 | = L sup Γ (f, f )(ξ) − Γ (f, f )(ξ) . t t t t t t ξ∈Σ ˆ In order to estimate A2 = Lt − Lt (Γt (f, f )) we note from Proposition 2.1 that (4.89)
Γt (f, f ) converges to |∇f |2 in L∞ (Σ) as t → 0 and we can choose t1 with 0 < t < t1 < t0 so that (4.90)
kΓt (f, f )kL∞ (Σ) ≤ 2kf k2Lip,
and the idea will be to use Lemma 4.17. We now estimate the difference ˆ ˆ ˆ ˆ ˆ (4.91) Γt (Lt f, f )(x) − Γt (Lt f, f )(x) = Γt (Lt f − Lt f, f )(x) + Γt − Γt (Lt f, f )(x) (4.92)
= A3 (x) + A4 (x),
and we note that ˆ ˆ (4.93) |A3 | = Γt (Lt f − Lt f, f )(x) n 1 X kξ −xk2 2 ˆ − j 2t ≤ sup Lt (f )(ξ) − Lt (f )(ξ) e (4.94) |f (ξj ) − f (x)| t ξ∈Σ nθˆt (x) j=1 e−1/2 kf kLip ˆ sup (Lt (f )(ξ) − Lt (f )(ξ)) , (4.95) ≤2 1/2 ˆ ξ∈Σ t θt where we have used the fact that for functions u, v ! n 2 1 X − kx−ξj k2 ˆ 2t (4.96) e |v(x) − v(ξj )| Γt (u, v)(x) ≤ sup |u(ξ)| t ξ∈Σ nθˆt (x) j=1 ! ρ2 kvkLip 2 sup ρe− 2t , (4.97) ≤ sup |u(ξ)| ˆ t ξ∈Σ nθt (x) ρ>0
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
35
and ρ2
sup ρe− 2t = t1/2 e−1/2 ≤ t1/2 .
(4.98)
ρ>0
In view of Lemma (4.13), we will write the bound (4.95) on |A3 | as e−1/2 kf kLip ˆ sup (Lt (f )(ξ) − Lt (f )(ξ)) (4.99) |A3 | ≤ 2 t1/2 θt ξ∈Σ −1/2 2e kf kLip 1 1 ˆ + sup (Lt (f )(ξ) − Lt (f )(ξ)) . − (4.100) ˆ t1/2 θ ξ∈Σ t θt In order to estimate |A4 |, i.e.
ˆ |A4 | = Γ − Γ (L (f, f )) , t t t
(4.101)
we consider the class of functions FLt t f,f introduced in (4.20) so that we can use Lemma 4.16, however, this implies that we need a Lipschitz estimate on the functions Lt f . Letting υt (x, ξ) be as in Lemma 4.9, i.e. υt (x, ξ) = e−
(4.102) we have (4.103)
θt (x′ ) − θt (x) Lt f (x) − Lt f (x ) = tθt (x)θt (x′ ) ′
and (4.104) (4.105)
Z
kx−ξk2 2t
(f (ξ) − f (x)),
1 υt (ξ, x)dµ + tθt (x′ ) Σ
Z
Σ
(υt (ξ, x) − υt (ξ, x)) dµ,
Z θt (x′ ) − θt (x) υ(ξ, x)dµ tθt (x)θt (x′ ) Σ ! −1/2 Z t ′ = ψt (ξ, x) − ψt (ξ, x ) dµ Lt f (x) ′ ˆ θt (x ) Σ
and again we can choose t0 in the discussion surrounding (4.61) so that Z θt (x′ ) − θt (x) vol(Σ) ′ (4.106) υ(ξ, x)dµ ≤ 4 d+1 kf kC 2 (Σ) kx − x k. tθt (x)θt (x′ ) t 2 Σ From Lemma 4.9 2t ′ ′ (4.107) kf kLip + kf kLip kx − x k υt (x, ξ) − υt (x , ξ) ≤ e
and therefore we can adjust t0 in (4.61) so that for 0 < t < t0 Z 1 ′ (4.108) υt (ξ, x) − υt (ξ, x ) dµ tθt (x′ ) Σ 2vol(Σ) 2 ′ (4.109) ≤ kf kLip + kf kC 1 kx − x k. d+2 e t 2
36
ANTONIO G. ACHE AND MICAH W. WARREN
We have shown that (4.110)
kLt f kLip ≤
2vol(Σ) t
d+1 2
−1/2 2
2kf kC 2 + t
e
−1/2
kf kLip + t
kf kC 1 .
Let A∗i = supx∈Σ |Ai (x)| for i = 1, 2, 3, 4. We now have from (4.86) and (4.91) n o ˆ Pr Γ (4.111) ] (f, f ) − [Γ ] (f, f ) ≥ ε t 2 t 2 o n n n n εo εo εo ε ∗ ∗ ∗ ∗ (4.112) + Pr A2 ≥ + Pr A3 ≥ + Pr A4 ≥ ≤ Pr A1 ≥ 4 4 4 4 (4.113) = P1 + P2 + P3 + P4 . From (4.89) we have tε ˆ t εt 1/2 ≤ Qt Ff , , Kt , n , (4.114) P1 ≤ Pr sup Γt (f, f )(ξ) − Γt (f, f )(ξ) ≥ 16 8 ξ∈Σ (4.115) ∈ OBC (2(d + 1)) . ˆ t − Lt (Γt (f, f )) we apply Corollary 4.17 and the estimate (4.90) to the For A2 = L
class HΓt t (f,f ) and obtain (4.116)
P2 ≤ Qt HΓt t (f,f ) , εt, 2kf kLip, n ∈ OBC (2d + 3) .
Using (4.100) we have ε e−1/2 kf kLip ˆ sup (Lt (f )(ξ) − Lt (f )(ξ)) ≤2 (4.117) P3 ≤ Pr 8 t1/2 θt ξ∈Σ −1/2 ε 2e kf kLip 1 1 ˆ + Pr sup (Lt (f )(ξ) − Lt (f )(ξ)) ≤ − (4.118) ˆ 8 t1/2 θ ξ∈Σ t θt
and choosing t > 0 small enough we obtain from (4.61) and Corollary 4.17 the estimate e−1/2 kf kLip ε ˆ sup (Lt (f )(ξ) − Lt (f )(ξ)) ≤2 (4.119) Pr 1/2 8 t θt ξ∈Σ ) ( d+2 ε λ0 t 2 e1/2 ˆ ≤ sup (Lt (f )(ξ) − Lt (f )(ξ)) (4.120) ≤ Pr 8 kf kLip ξ∈Σ ! d+1 1/2 2 e ε λ t 0 ≤ Qt Hft , (4.121) t, t1/2 kf kL∞ , n ∈ OBC (3d + 3). 8 kf kLip On the other hand, we will make use of the estimate 4 ˆ (4.122) sup (Lt (f )(ξ) − Lt (f )(ξ)) ≤ kf kL∞ , t ξ∈Σ
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
37
for any t > 0 to obtain 1 ε 2e−1/2 kf kLip 1 ˆ Pr sup (Lt (f )(ξ) − Lt (f )(ξ)) − ≤ (4.123) 1/2 8 t ξ∈Σ θˆt θt 3/2 1/2 1 1 εt e (4.124) ≤ sup − ≤ Pr ˆt θt 64kf kLip x∈Σ θ and from (4.66)-(4.67) and Lemma 4.1 we observe that
(4.125) ε 1 2e−1/2 kf kLip 1 ˆ Pr sup (Lt (f )(ξ) − Lt (f )(ξ)) − ≤ ∈ OBC (d + 3). ˆ 8 t1/2 θ ξ∈Σ t θt We then conclude that
P3 ∈ OBC (3d + 3).
(4.126)
Finally, we consider the class FLt t f,f and let 2 2vol(Σ) −1/2 2 −1/2 (4.127) 2kf kC 2 + t kf kLip + t kf kC 1 kf kLip , Mt = e t d+1 e 2 so that in view of (4.110) we have sup klkL∞ (Σ) ≤ Mt ,
(4.128)
t l∈FL
t f,f
and then (4.129)
P4 ≤ Qt (FLt f,f , ε, t1/2 Mt , n) = OBC (3d + 1).
Using again (4.130)
o n ˆ ≥ ε ≤ P1 + P2 + P3 + P4 , Pr Γ (L , f, f ) − Γ (L , f, f ) 2 t 2 t
and from (4.115),(4.116),(4.126) and (4.129) we have o n ˆ (4.131) Pr Γ 2 (Lt , f, f ) − Γ2 (Lt , f, f ) ≥ ε ∈ OBC (3d + 3),
and it follows from the Borel-Cantelli argument given in Section 4.5 that for any 1 sequence of the form tn = n−γ where γ = 3d+4+σ and σ is any positive number we have a.s. ˆ sup Γ2 (Ltn , f, f ) − Γ2 (Ltn , f, f ) −→ 0. (4.132) ξ∈Σ
38
ANTONIO G. ACHE AND MICAH W. WARREN
5. Almost Sure Convergence in the non-Uniform Case In this section we prove Theorem 1.18 which concerns the case when the random variables in the i.i.d. sample ξ1 , . . . , ξn , ..., are not distributed according to dµ = dvolΣ , but instead according to a density of the form dµ = e−ρ dvol where ρ is a C ∞ function. Since we already explained the use of the theory of empirical processes in Section 4 for proving the convergence of the sample version of the iterated Carr´e du Champ in the uniform case, in this section we will simply assume familiarity with all the bounds derived in Section 4, and we will write all large deviation estimates in terms of the notation for describing subexponential decay introduced in Section 4.5. In particular, we will not write down any bounds on the covering numbers of totally bounded classes in L∞ (Σ) that will appear in the proof. We start by considering the following simple lemma. Recall the definition of θt,α (x), in (1.48), noting θtα (x) = [θt (x)]α 6= θt,α (x). Let 1 inf lim t−d/2 θt (x), 4 x∈Σ tց0 λ∗ = sup lim t−d/2 θt (x).
λ∗ =
(5.1) (5.2)
x∈Σ tց0
Lemma 5.1. ( Pr sup |θˆtα (x) − θtα (x)| ≥ ε ≤ Pr sup |θˆt (x) − θt (x)| ≥
ε
α (λ∗ td/2 ) d/2 ˆ . + Pr sup |θt (x) − θt (x)| ≥ λ∗ t
x∈Σ
x∈Σ
α−1
)
x∈Σ
Proof. The uniform lower bound in (4.61) depends now on the function ρ, so with the quantities defined above, we can choose t0 > 0 such that for 0 < t < t0 we have λ∗ ≤
(5.3)
θt (x) ≤ λ∗ . d/2 t
By the mean value theorem, (5.4)
α−1 |θˆtα (x) − θtα (x)| = α τ θt (x) + (1 − τ )θˆt (x) |θˆt (x) − θt (x)|,
for some 0 ≤ τ ≤ 1. Thus the event α α ˆ sup |θt (x) − θt (x)| ≥ δ , x∈Σ
is contained in the event (recall α ∈ [0, 1]) ( ) δ d/2 ˆ ˆ sup |θt (x) − θt (x)| ≥ , ∪ sup |θt (x) − θt (x)| ≥ λ∗ t α−1 x∈Σ x∈Σ α (λ∗ td/2 )
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
39
and therefore ( Pr sup |θˆtα (x) − θtα (x)| ≥ δ ≤ Pr sup |θˆt (x) − θt (x)| ≥
δ
α (λ∗ td/2 ) d/2 ˆ . + Pr sup |θt (x) − θt (x)| ≥ λ∗ t
x∈Σ
x∈Σ
α−1
)
x∈Σ
Lemma 5.2. For fixed ε > 0 we have the estimate
ˆ Pr sup θt,α (x) − θt,α (x) ≥ ε x∈Σ ( ) 1+α d2 (α+1) 2α ε (λ∗ ) t 2 ≤ Pr sup |θˆt (x) − θt (x)| ≥ ∗ d/2 2 (1 + ελ t ) x∈Σ d/2 ˆ . + Pr sup |θt (x) − θt (x)| ≥ λ∗ t x∈Σ
Proof. Letting σt (x, ξ) = e− (5.5) (5.6)
kx−ξk2 2t
we have
σt (x, ·) σt (x, ·) θˆt,α (x) − θt,α (x) = µn −µ α θtα θˆt σt (x, ·) σt (x, ·) σt (x, ·) σt (x, ·) + µn −µ − µn = µn θtα θtα θtα θˆtα
so that (5.7) (5.8) Now,
(5.9)
σt σt ε ˆ Pr sup θt,α (x) − θt,α (x) ≥ ε ≤ Pr sup µn − µn α ≥ θt 2 x∈Σ x∈Σ θˆtα σt (x, ·) σt (x, ·) ε . − µ ≥ + Pr sup µn θtα θtα 2 x∈Σ
1 σt (x, ·) 1 σ (x, ·) t µn ≤ sup − − µ . n α x∈Σ ˆ α α ˆ ˆ θ θt θt (x) θt (x) t
40
ANTONIO G. ACHE AND MICAH W. WARREN
From Lemma 5.1 we obtain ε 1 ε σt (x, ·) σ (x, ·) 1 t ≥ ≤ Pr sup − α ≥ − µn Pr sup µn ˆtα θt θtα 2 2 x∈Σ θ x∈Σ θˆtα ( ) ε (θtα )2 ˆα α 2 ≤ Pr sup θt (x) − θt(x) ≥ 1 + 2ε θtα x∈Σ ) ( ε d/2 2α 2λ t 1 ∗ ≤ Pr sup |θˆt (x) − θt (x)| ≥ 2 1 + ελ∗ td/2 α (λ∗ td/2 )α−1 x∈Σ d/2 ˆ , + Pr sup |θt (x) − θt (x)| ≥ λ∗ t x∈Σ
and the result follows.
For the rest of the section we will use the following notation • We will write a . b to indicate that a ≤ Cb for some positive constant depending on d, α and ρ. • We will write a ≈ b to indicate that there exist positive constants C1 , C2 with Ci = Ci (α, d, ρ) for i = 1, 2, such that C1 a ≤ b ≤ C2 a.
If we now replace ε > 0 in Lemma 5.2 by td ε for some β > 0 we easily obtain: Corollary 5.3. For β ∈ R we have the estimates ˆ β Pr sup θt,α (x) − θt,α (x) ≥ t ε ∈ OBC (2β + (1 + α)d) , (5.10) x∈Σ ( ) 1 1 Pr sup α (5.11) − ≥ tβ ε ∈ OBC (2β + d(1 + α)), α ˆ (x) θ x∈Σ θt (x) t ( ) 1 1 Pr sup (5.12) − ≥ tβ ε ∈ OBC (2β + d(3 − α)). ˆ θ (x) x∈Σ t,α θt,α (x)
Proof. The corollary follows from combining Lemma 5.2 with the estimate
(5.13)
θt,α (x) ≈ t(1−α)d/2 as t → 0.
For the Carr´e du Champ we have Proposition 5.4. Let ε > 0, let u, v 6= 0 be Lipschitz functions and let Φt,u,v be a function defined by Φt,u,v (x, ξ) = e− We have
kx−ξk2 2t
(u(x) − u(ξ))(v(x) − v(ξ)).
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
41
n o (x,·) t (1) The class of functions Fu,v,α = Φt,u,v : x ∈ Σ is totally bounded in L∞ θtα (·) and its covering numbers have polynomial growth, i.e., for some polynomial in two real variables p(y, z) we have the bound t N (Fu,v,α , ε) ≤ p(ε−1, t−1 ).
(2) There exists constants M1 , M2 , M3 with Mi = Mi (α, ρ, d) for i = 1, 2, 3, such that td/2 ε ˆα α t , M1 , n Pr sup Γt (u, v)(x) − Γt (u, v)(x) ≥ ε . Qt Fu,v,α , kukLipkvkLip x∈Σ ( ) d 1 εM2 t 2 (1−α) 1 + Pr sup α − ≥ x∈Σ θt (x) θˆtα (x) kukLipkvkLip ( ) 1 td/2 M3 1 + Pr sup − ε . ≥ x∈Σ θt,α (x) θˆt,α (x) kukLip kvkLip
(3) In particular, for ε > 0 fixed and β ∈ R we obtain ˆα α β Pr sup Γt (u, v)(x) − Γt (u, v)(x) ≥ t ε ∈ OBC (2β + 3d + 1). x∈Σ
Proof. Part (1) follows from arguments similar to those of Section 4.3. For part (2) we write (5.14) Φt,u,v (x, ·) Φt,u,v (x, ·) 1 1 α α ˆ µ µn − Γt (u, v)(x) − Γt (u, v)(x) = tθt,α (x) θtα tθt,α (x) θtα Φt,u,v (x, ·) Φt,u,v (x, ·) 1 1 µn µn − (5.15) + tθt,α (x) θtα tθt,α (x) θˆtα ! Φt,u,v (x, ·) 1 Φt,u,v (x, ·) 1 µn + µn (5.16) − tθt,α (x) tθˆt,α (x) θˆtα θˆtα (5.17)
= A1 (x) + A2 (x) + A3 (x).
Let A∗i be defined by A∗i = supx∈Σ |Ai |(x) for i = 1, 2, 3,. In order to bound A∗1 we observe that
Φu,v −dα/2
(5.18) ,
tθα ∞ .kukLipkvkLip t t
(1−α) d2
L
it follows that for some constant M1 = M1 (α, ρ, d) and since θt,α ≈ t n o ε Pr A∗1 ≥ (5.19) 3 1 Φt,u,v (x, ·) Φt,u,v (x, ·) ε 1 (5.20) = Pr sup µ µn − ≥ 3 θtα tθt,α (x) θtα x∈Σ tθt,α (x) t ≤ Qt Fα,u,v , td/2 ε, M1 . (5.21)
42
ANTONIO G. ACHE AND MICAH W. WARREN
For A2 we note that 1 1 Φ (x, ·) Φ (x, ·) t,u,v t,u,v − µn |A2 (x)| = α µn (5.22) tθt (x) θt,α tθt,α (x) θˆtα
Φu,v 1
sup 1 − 1 , (5.23) ≤
t ∞ θt,α ∞ ξ∈Σ θα (x) θˆα (x) t L L t
d
Φu,v . kukLipkvkLip, and kθt,α kL∞ ≈ t 2 (1−α) we have from Corollary and since t L∞ 5.3 a bound of the form ( ) d 1 o n (1−α) 2 εM t 1 ε 2 (5.24) ≤ Pr sup α , − Pr A∗2 ≥ ≥ 3 x∈Σ θt (x) θˆtα (x) kukLipkvkLip
for some constant M2 = M2 (α, ρ, d). Finally, for bounding A3 , we can easily obtain uniform bounds (in particular independent of x) on the probability of the events (5.25) {θˆt (x) ≥ λ∗ td/2 }, (5.26)
{θˆt (x) < λ∗ td/2 },
and from (5.18) we obtain a bound of the form ( ) n o d/2 ε t M3 1 1 Pr A∗3 ≥ . Pr sup (5.27) − ε ≥ ˆ 3 kukLip kvkLip x∈Σ θt,α (x) θt,α (x)
for some M3 = M3 (α, ρ, d) as needed. Part (3) is a straightforward consequence of part (2). For the operator Ltα a similar argument yields the following. Proposition 5.5. Let ε > 0, let u 6= 0 be a function in L∞ (Σ) and let Λt,u be a function defined by Λt,u (x, ξ) = e−
kx−ξk2 2t
u(ξ).
We have o n (x,·) t : x ∈ Σ is totally bounded in L∞ and (1) The class of functions Gu,α = Λt,u α θt (·) its covering numbers have polynomial growth. (2) There exists constants M1 , M2 , M3 with Mi = Mi (α, ρ, d) for i = 1, 2, 3, such that ! d +1 2 t ε ˆα , M1 , n Pr sup Lt u(x) − Lαt u(x) ≥ ε . Qt Gαt , kukL∞ x∈Σ ( ) 1 t1+ d2 (1−α) M ε 1 2 + Pr sup α − ≥ α ˆ ∞ (x) θ kuk x∈Σ L θt (x) t ( ) d 1 t1+ 2 M ε 1 3 + Pr sup . − ≥ ˆ ∞ θ (x) kuk x∈Σ t,α L θt,α (x)
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
43
(3) In particular, for ε > 0 fixed and β ∈ R we obtain ˆα α β Pr sup Lt u(x) − Lt u(x) ≥ t ε ∈ OBC (2β + 3d + 3). x∈Σ
We now prove Theorem 1.18.
5.1. Proof of Theorem 1.18. Consider the difference as ˆ 2 (Lα (f, f )(x) (5.28) 2 Γ2 (Lαt (f, f )(x) − Γ t α α α α α α α ˆα ˆ ˆ ˆ (5.29) = Lt (Γt (f, f )) − Lt (Γt (f, f )) (x) + Lt (Γt (f, f )) − Lt (Γt (f, f )) (x) ˆ α (Lα f, f ) (x) + 2 Γ ˆ α f, f ) (x) ˆ α (Lα f, f ) − Γ ˆ α (L (5.30) + 2 Γαt (Lαt f, f ) − Γ t t t t t t (5.31)
= E1 (x) + E2 (x) + E3 (x) + E4 (x).
We can immediately bound E2 and E4 in the following way. For E2 we clearly have 2 ˆ α (f, f )(x) , |E2 | ≤ sup Γαt (f, f )(x) − Γ (5.32) t t x∈Σ and for E4 (5.33)
α −1/2 α ˆ |E4 | ≤ 2t C0 sup Lt f (x) − Lt f (x) kf kLip x∈Σ
2
1
1
,
inf x∈Σ θˆt,α (x) θˆtα (x) L∞
where C0 = supρ>0 e−ρ ρ. For E1 and E3 we need L∞ estimates on Γαt and Lipschitz estimates on Lαt . An L∞ bound for Γαt for sufficiently small t is kΓαt (f, f )kL∞ ≤ 2kf k2L∞ ,
(5.34)
which again follows from the uniform convergence of Γαt to |∇f |2 as t → 0. In order to derive a Lipschitz estimate for Lαt we use the notation in Lemma 4.9 and let υt (x, ξ) = e− (5.35) (5.36)
kx−ξk2 2t
(f (ξ) − f (x)) so that
Lαt (f, f )(x)
−
′ Lαt (f, f )(x )
Z ′ υt (x, ξ) θt,α (x ) − θt,α (x) dµ(ξ) = ′ tθt,α (x)θt,α (x ) Σ θtα (ξ) Z ′ 1 (υt (x, ξ) − υt (x , ξ)) + dµ(ξ) tθt,α (x′ ) Σ θtα (ξ) ′
(5.37) (5.38)
θt,α (x) − θt,α (x ) α Lt θt,α (x′ ) Z ′ (υt (x, ξ) − υt (x , ξ)) 1 dµ(ξ) + tθt,α (x′ ) Σ θtα (ξ) =
44
ANTONIO G. ACHE AND MICAH W. WARREN
and for small t we have (5.39) (5.40) (5.41) (5.42)
kLαt kL∞ . k∆f − 2(1 − α)hDf, DρikL∞ ,
θt,α (x) − θt,α (x′ )
. t− d+1 2 vol(Σ), ′
∞ θt,α (x ) L
Z ′
1 (υt (x, ξ) − υt (x , ξ))
dµ(ξ) α
tθt,α (x′ )
∞ θt (ξ) Σ L d+2 2t ′ . vol(Σ)t− 2 kf kLip + kf kC 1 kx − x k, e
where the inequality in (5.41)-(5.42) follows from Lemma 4.9.We conclude that kLαt f kLip . t−
(5.43)
d+2 2
C(f, ρ, Σ),
where C(f, ρ, Σ) depends on the C 2 norm of f , the Lipschitz norm of ρ and the volume of Σ. As in the uniform case, we let Ei∗ = supx∈Σ |Ei (x)| and we consider the estimate α α ˆ 2 (L (f, f )(ξ) ≥ ε Pr sup Γ2 (Lt (f, f )(ξ) − Γ (5.44) t ξ∈Σ n n n n εo εo εo εo (5.45) + Pr E2∗ ≥ + Pr E3∗ ≥ + Pr E4∗ ≥ ≤ Pr E1∗ ≥ 4 4 4 4 (5.46) = P1 + P2 + P3 + P4 . From Proposition 5.5 and (5.34) it follows immediately that (5.47)
P1 = OBC (3d + 3),
as t → 0. From (5.32) and Proposition 5.4 we obtain (5.48)
P2 = OBC (3d + 3),
as t → 0. From (5.43) and Proposition 5.4 we have
(5.49)
P3 = O(4d + 3),
as t → 0. Finally, in order to bound P4 we consider the bound for E4 in (5.33) and observe that from the convergence of θˆt,α and θˆt to θt,α and θt respectively we have in probability a bound of the form
1 1
≈ t−d/2 , (5.50) ˆ inf θ (x) θˆα x∈Σ t,α
t
L∞
so that from Proposition 5.5 we see that (5.51)
P4 = OBC (4d + 4)
as t → 0. Since P4 is least decaying term in (5.46) we obtain α α ˆ Pr sup Γ2 (Lt (f, f )(ξ) − Γ2 (Lt (f, f )(ξ) ≥ ε ∈ OBC (4d + 4). (5.52) ξ∈Σ
COARSE RICCI CURVATURE WITH APPLICATIONS TO MANIFOLD LEARNING
45
1 we see from the Borel-Cantelli With the choice of scale tn = n−γ with 0 < γ < 4d+4 lemma that ˆ 2 (Lα (f, f )(ξ) sup Γ2 (Lαt (f, f )(ξ) − Γ (5.53) t ξ∈Σ
a.s.
−→ 0.
(5.54)
References [AGS] [BBI01] ´ [BE85]
[BN08]
[BS09]
[CL06] [CY96] [EM12] [FMN13] [GM13] [JKO98]
[KPP14] [Kun69] [LLY11] [LV09] [LY10] [McC97] [Mie13]
Luigi Ambrosio, Nicola Gigli, and Giusseppe Savar´e, Bakry-´emery curvature-dimension condition and riemannian ricci curvature bound. Dmitri Burago, Yuri Burago, and Sergei Ivanov, A course in metric geometry, Graduate Studies in Mathematics, vol. 33, American Mathematical Society, Providence, RI, 2001. MR 1835418 (2002e:53053) ´ D. Bakry and Michel Emery, Diffusions hypercontractives, S´eminaire de probabilit´es, XIX, 1983/84, Lecture Notes in Math., vol. 1123, Springer, Berlin, 1985, pp. 177–206. MR 889476 (88j:60131) Mikhail Belkin and Partha Niyogi, Towards a theoretical foundation for Laplacian-based manifold methods, J. Comput. System Sci. 74 (2008), no. 8, 1289–1308. MR 2460286 (2009m:68207) Anca-Iuliana Bonciocat and Karl-Theodor Sturm, Mass transportation and rough curvature bounds for discrete spaces, J. Funct. Anal. 256 (2009), no. 9, 2944–2966. MR 2502429 (2010i:53066) Ronald R. Coifman and St´ephane Lafon, Diffusion maps, Appl. Comput. Harmon. Anal. 21 (2006), no. 1, 5–30. MR 2238665 (2008a:60210) F. R. K. Chung and S.-T. Yau, Logarithmic Harnack inequalities, Math. Res. Lett. 3 (1996), no. 6, 793–812. MR 1426537 (97k:58182) Matthias Erbar and Jan Maas, Ricci curvature of finite Markov chains via convexity of the entropy, Arch. Ration. Mech. Anal. 206 (2012), no. 3, 997–1038. MR 2989449 Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan, Testing the manifold hypothesis, arXiv:1310.0425, 2013. Nicola Gigli and Jan Maas, Gromov-Hausdorff convergence of discrete transportation metrics, SIAM J. Math. Anal. 45 (2013), no. 2, 879–899. MR 3045651 Richard Jordan, David Kinderlehrer, and Felix Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal. 29 (1998), no. 1, 1–17. MR 1617171 (2000b:35258) Andreas Karcher, Richard Palais, and Bob Palais, Point clouds: distributing points uniformly on a surface., preprint, 2014. Hiroshi Kunita, Absolute continuity of Markov processes and generators, Nagoya Math. J. 36 (1969), 1–26. MR 0250387 (40 #3626) Yong Lin, Linyuan Lu, and Shing-Tung Yau, Ricci curvature of graphs, Tohoku Math. J. (2) 63 (2011), no. 4, 605–627. MR 2872958 John Lott and C´edric Villani, Ricci curvature for metric-measure spaces via optimal transport, Ann. of Math. (2) 169 (2009), no. 3, 903–991. MR 2480619 (2010i:53068) Yong Lin and Shing-Tung Yau, Ricci curvature and eigenvalue estimate on locally finite graphs, Math. Res. Lett. 17 (2010), no. 2, 343–356. MR 2644381 (2011e:05068) Robert J. McCann, A convexity principle for interacting gases, Adv. Math. 128 (1997), no. 1, 153–179. MR 1451422 (98e:82003) Alexander Mielke, Geodesic convexity of the relative entropy in reversible Markov chains, Calc. Var. Partial Differential Equations 48 (2013), no. 1-2, 1–31. MR 3090532
46
ANTONIO G. ACHE AND MICAH W. WARREN
[Nab13]
Aaron Naber, Characterization of bounded ricci curvature on smooth and nonsmooth settings, arXiv:1306.6512, 2013. [Oll09] Yann Ollivier, Ricci curvature of Markov chains on metric spaces, J. Funct. Anal. 256 (2009), no. 3, 810–864. MR 2484937 (2010j:58081) , A survey of Ricci curvature for metric spaces and Markov chains, Probabilistic [Oll10] approach to geometry, Adv. Stud. Pure Math., vol. 57, Math. Soc. Japan, Tokyo, 2010, pp. 343–381. MR 2648269 (2011d:58087) [Rot74] Jean-Pierre Roth, Op´erateurs carr´e du champ et formule de L´evy-Kinchine sur les espaces localement compacts, C. R. Acad. Sci. Paris S´er. A 278 (1974), 1103–1106. MR 0350047 (50 #2540) [Sim96] Leon Simon, Theorems on regularity and singularity of energy minimizing maps, Lectures in Mathematics ETH Z¨ urich, Birkh¨auser Verlag, Basel, 1996, Based on lecture notes by Norbert Hungerb¨ uhler. MR 1399562 (98c:58042) [Stu06a] Karl-Theodor Sturm, On the geometry of metric measure spaces. I, Acta Math. 196 (2006), no. 1, 65–131. MR 2237206 (2007k:53051a) , On the geometry of metric measure spaces. II, Acta Math. 196 (2006), no. 1, [Stu06b] 133–177. MR 2237207 (2007k:53051b) [SW12] A. Singer and H.-T. Wu, Vector diffusion maps and the connection Laplacian, Comm. Pure Appl. Math. 65 (2012), no. 8, 1067–1144. MR 2928092 [SW13] Amit Singer and Hau-Tieng Wu, Spectral convergence of the connection laplacian from random samples, arXiv:1306.1587, 2013. [Syn31] J. L. Synge, A Characteristic Function in Riemannian Space and its Application to the Solution of Geodesic Triangles, Proc. London Math. Soc. S2-32 (1931), no. 1, 241. MR 1575991 [vdVW96] Aad W. van der Vaart and Jon A. Wellner, Weak convergence and empirical processes, Springer Series in Statistics, Springer-Verlag, New York, 1996, With applications to statistics. MR 1385671 (97g:60035) [Vil09] C´edric Villani, Optimal transport, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009, Old and new. MR 2459454 (2010f:49001) [vRS05] Max-K. von Renesse and Karl-Theodor Sturm, Transport inequalities, gradient estimates, entropy, and Ricci curvature, Comm. Pure Appl. Math. 58 (2005), no. 7, 923–940. MR 2142879 (2006j:53048) Mathematics Department, Princeton University, Fine Hall, Washington Road, Princeton New Jersey 08544-1000 USA E-mail address:
[email protected] Department of Mathematics, University of Oregon, Eugene OR 97403 E-mail address:
[email protected]