Local Embeddings of Metric Spaces Ittai Abraham∗
Yair Bartal†
Ofer Neiman‡
May 30, 2007
Abstract In many application areas, complex data sets are often represented by some metric space and metric embedding is used to provide a more structured representation of the data. In many of these applications much greater emphasis is put on the preserving the local structure of the original space than on maintaining its complete structure. This is also the case in some networking applications where “small world” phenomena in communication patterns has been observed. Practical study of embedding has indeed involved with finding embeddings with this property. In this paper we initiate the study of local embeddings of metric spaces and provide embeddings with distortion depending solely on the local structure of the space.
1
Introduction
The field of metric space embedding studies embeddings that “faithfully” preserve distances of the source space in the host space. There are many ways to formally measure the “faithfulness” of an embedding. In this paper we suggest a new and quite natural paradigm of local distortion embeddings: I.e. embeddings that preserve the local structure of the space, distances of close neighbors are preserved better than those of distant neighbors. Metric embedding has emerged as powerful tool in several applications areas. Typically, an embedding takes a “complex” metric space and maps it into a “simpler” one. For example embedding of metric spaces into trees and ultrametrics found a large number of algorithmic applications (e.g. [20]). In many fields that use high dimensional data (e.g. computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology), embeddings are used to map complex data sets into simpler and more compact representations [13]. In distributed network settings, embedding has been used to map the Internet latencies into a simpler metric structure. Often, the embedding can then be distributed as a labeling scheme in a distributed system [26, 16]. In many important applications of embedding, preserving the distances of nearby points is much more important than preserving all distances. Indeed, it is sometimes the case in distance estimation, that determining the distance of nearby objects can be done easily, while far away objects may just be labeled as “far” and only a rough estimate of the distance between them will be given. Thus large distances may already incorporate an inherently larger error factor. In such scenarios it is natural to seek local embeddings that maintain only distances of close by neighbors. Indeed both [13] and [27] study low dimensional embeddings that maintain distances only to the k nearest neighbors. The revolution of large scale Social Networking in the Internet has increased the interest in new areas of research that emerged from issues in the border of Sociology and Network theory. One aspect studied by ∗ School
of Engineering and Computer Science, Hebrew University, Israel. Email:
[email protected]. of Engineering and Computer Science, Hebrew University, Israel and Center of the Mathematics of Information, Caltech, CA, USA. Email:
[email protected]. Supported in part by a grant from the Israeli Science Foundation (195/02). ‡ School of Engineering and Computer Science, Hebrew University, Israel. Email:
[email protected]. Supported in part by a grant from the Israeli Science Foundation (195/02). † School
1
Kleinberg [21] is the algorithmic aspects of the “small world” phenomena: how messages are greedily routed in networks that arise from a social and geographical structure. In this model the network is assumed to have a local property: the probability of choosing a close neighbor as an associate is larger than that of choosing a far away neighbor. Specifically, the probability of choosing a neighbor is inversely proportional to its distance from the source. Liben-Nowell et. al. [23] consider a related model where the probability of choosing the k-th nearest neighbor is chosen proportional to ∝ k1α for some parameter α > 1, they validate this model experimentally. A person would have more interaction with his close associates than with far away ones. In the context of using metric space embedding in “small world” networks it is natural to require that the embedding of a close neighbor would be better than that of a far away neighbor. Kleinberg, Slivkins, and Wexler [22] study network embedding as a means to provide distance estimation of the Internet latency without need to measure all distances. They note a discrepancy between theory and practice: while known theoretical embedding results guarantee very weak bounds, practical network coordinates perform quite well. In order to overcome this gap, the authors suggest to study embeddings with slack, where the distortion bounds are provided only for distant neighbors but not of close by ones. Strong results have been obtained in this model and its generalization [2, 1]. However, for certain applications, one might claim that preserving only distances to far-away neighbors defeats the purpose. For example, an Internet application that is induced by a social structure might interact mostly amongst local neighbors and so on. Our study on local embedding can be viewed as addressing the same question of [22] when indeed preserving local distances is more important than preserving far away distances. In the context of data compression, our results can be viewed as a new type of dimension reduction technique. Typically, dimension reduction causes a uniform error over all points. A high dimensional data set X in L2 can be faithfully mapped into O(log |X|) dimensions. Our techniques allow to map metric spaces into constant dimensional Euclidean space which faithfully preserves distances between all nearby neighbor points, i.e. the local structure of the space. In large scale systems it is often the case that one wants to maintain a compact data structure known as a Distance Oracle [26]. More demanding tasks are name independent compact routing schemes where the name of the node is independent of its location [6] and mobile user schemes which are competitive distributed protocols for routing when the target may be mobile. In all these settings it is clearly desirable to obtain improved results for close by neighbors.
1.1
Local embeddings
We now formally define new notions of local distortion. Given a metric space (X, d), let B(u, r) = {v | d(u, v) ≤ r}. For any point x let <x be an order relation on the points in X \ {x} such that for any u, v ∈ X \ {x} if d(u, x) < d(v, x) then u <x v. For any k ∈ N let Nk (x) be the set of first k elements of X \ {x} according to <x , i.e., Nk (x) is the set of k nearest neighbors of x. Let rk (x) be minimal radius such that Nk (x) ⊆ B(x, rk (x)). Definition 1. Let (X, dX ) be a metric space on n points, (Y, dY ) a target metric space and k ∈ N, let f : X → Y be an embedding. • f is non-expansive if for any u, v ∈ X, dY (f (u), f (v)) ≤ dX (u, v). • f is an embedding with k-local distortion α if f is non-expansive and for any u, v ∈ X such that v ∈ Nk (u), dX (u, v) dY (f (u), f (v)) ≥ . α • f is an embedding with strong k-local distortion α if f is non-expansive and for any u, v ∈ X, dY (f (u), f (v)) ≥
min{dX (u, v), rk (u)} α
• f is an embedding with (strong) scaling local distortion α, for a non-decreasing function α : N → R+ , if f has (strong) k-local distortion α(k), for all k ∈ N simultaneously. 2
• Given a distribution D on maps f : X → Y , we say that D has probabilistic (strong) {k, scaling}-local distortion if the appropriate lower bound holds and the appropriate upper bound holds in expectation over D. We also study a related notion of proximity distortion. Definition 2. Let (X, dX ) be a metric space on n points with minx,y∈X {d(x, y)} ≥ 1, let (Y, dY ) be a target metric space, let t ≥ 1, let f : X → Y be an embedding. • f is an embedding with t-proximity distortion α if for any u, v ∈ X such that d(u, v) ≤ t, dX (u, v) ≥ dY (f (u), f (v)) ≥
dX (u, v) . α
• f is an embedding with scaling proximity distortion α, for non-decreasing function α : R+ → R+ , if it has t-proximity distortion α(t), for all t simultaneously.
1.2
Overview of results
We begin by providing some basic results in this model. All of our scaling results are strong. Theorem 1 shows that any metric space can be embedded into a single tree (in fact, an ultrametric) with scaling local distortion k. Using a variant of [2] embedding, Theorem 2 shows that strong k-local embeddings with distortion O(log k) are possible for any fixed k using O(log n) dimensions. In Theorem 3 we give an embedding with scaling ˜ local distortion of O(log k) using a variation of Bourgain’s embedding method. Another aspect of k-local embeddings is that the dimension can be bounded in terms of k (for non-strong local embeddings). In this introductory section we demonstrate √ this phenomenon for the special case of k = 1: Theorem 4 shows that 1-local embeddings into `p with p 3 distortion requires only 3 dimensions. After the basic results we study embeddings into ultrametrics and into distributions of ultrametrics. Theorem 5 shows a strong k-local distortion O(log k) into a distribution of ultrametrics. Its scaling counterpart, ˜ Theorem 6 obtains scaling local distortion O(log k), and worst case distortion O(log n). The proof of Theorem 6 is unique, all known embeddings into ultrametrics are non-contracting, we provide an embedding into a distribution of non-expanding ultrametrics, this requires to make subtle modifications of known ultrametric construction algorithms. While Theorem 2 provides an embedding into Lp with k-local distortion, its dimension is a function of the size of the data set. In Theorem 8 we significantly improve this result and provide a novel form of dimension reduction resulting in embeddings that require only O(log k) dimensions. Our result requires that the metric space obeys a very weak form of growth bound (formally defined later). Using a subtle argument that requires an iterative application of the Lov´asz local lemma, we prove the existence of embeddings with k local distortion O(log k) in dimension O(log k). Our result shows that the k-local structure of the space can be embedded in its natural dimension which is independent of the size of the original space. Using embeddings based on partitions, Theorem 9 provides better local scaling distortion for metrics with improved dimension and embeddings with improved distortion for decomposable (including doubling) metrics. We also provide stronger guarantees for the Metric Ramsey Partitions which depend on the local neighborhood of a node, which we later use for application to proximity problems. Another natural property one may desire is to have embeddings whose distortion depends on the distance between points and not on the cardinality of the closer neighbors. For example, in a social network it may be desirable to obtain good distortion to all neighbors of distance t away, as a function of t. In the context of “small world” networks, [21] studied a distribution that depends on the distance with exactly this type of local behavior. In Section 9 we study embeddings with proximity distortion – in which the distortion bound of a pair x, y is a function of d(x, y). Theorem 11 is our main result using this notion. We show that ˜ embeddings into Lp with scaling proximity distortion O(log t) are given for decomposable metrics (these include in particular doubling and planar metrics). 3
In Section 10 we discuss some applications of our local embeddings. We show that in systems using a “small world” distribution, our local embeddings provide constant average distortion. We also discuss the application of our probabilistic embedding into ultrametrics to online problems with local structure of the request sequence. Finally, we discuss how our techniques can be used to provide better distance oracles and proximity ranking data structures. For example, we provide distance oracles with linear storage and ˜ scaling local stretch O(log k) (that is the stretch for the kth nearest neighbor) under weak growth bound assumptions.
2
Preliminaries
Some of our results apply to a restricted family of metric spaces with bounded growth rate. We use the following: Definition 3 (Growth bound). Let (X, d) be a metric space and χ ≥ 1 a fixed real constant. • X has a χ growth bound if |B(u, 2r)| ≤ 2χ |B(u, r)| for all u, r > 0. • X has a χ weak growth bound if |B(u, log |B(u, r)|r)| ≤ |B(u, r)|χ for all u, r > 0 such that |B(u, r)| > 1. • X has a χ very weak growth bound if |B(u, 2r)| ≤ |B(u, r)|χ for all u, r > 0 such that |B(u, r)| > 1. Note that the very weak growth bound is an extremely weak property that even constant-degree expanders satisfy. In many of our scaling results we shall use the following family Ξ of functions: A function ϑ : R+ → R+ is in Ξ if it is a monotone non-decreasing function satisfying Z ∞ dx =1. (1) ϑ(x) 1 For example if we define log(0) n = n, and for any i > 0 define recursively log(i) n = log(log(i−1) n), 1+θ Qt−1 then we can take for any constants θ > 0, t ∈ N the function ϑ(n) = cˆ j=0 log(j) (n) · log(t) (n) , for sufficiently small constant cˆ > 0, and it will satisfy the conditions.
3
Local Probabilistic Partitions
Several of our results use probabilistic partitions [7]. In this section we review some definitions and results concerning these tools, extending the notions of [2]. Definition 4. The local growth rate of x ∈ X at radius r > 0 for a given scale γ > 0 is defined as ρ(x, r, γ) = |B(x, rγ)|/|B(x, r/γ)|. Given a subspace Z ⊆ X, the minimum local growth rate of Z at radius r > 0 and scale γ > 0 is defined as ρ(Z, r, γ) = min ρ(x, r, γ). x∈Z
the minimum local growth rate at radius r > 0 and scale γ > 0 is defined as ρ¯(x, r, γ) = ρ(B(x, r), r, γ). Claim 1. Let x, y ∈ X, let γ > 0 and let r be such that 2(1 + 1/γ)r < d(x, y) ≤ (γ − 2 − 1/γ)r, then max{¯ ρ(x, r, γ), ρ¯(y, r, γ)} ≥ 2. 4
Definition 5 (Ball Partition). Let (X, d) be a finite metric space. A ball partition P of X is a collection of disjoint set of clusters C(P ) = {C1 , C2 , . . . , Ct } such that X = ∪j Cj and there exist c1 , c2 , . . . , ct ∈ X and Si−1 r1 , r2 , . . . , rt ∈ R+ such that for all i ∈ [t]: Ci = B(ci , ri ) \ j=1 Cj . The sets Ci are called clusters. For x ∈ X we denote by P (x) the cluster containing x. Define υ : P → X as υ(Ci ) = ci . Given ∆ > 0, a ball partition is ∆-bounded if for all 1 ≤ j ≤ t, diam(Cj ) ≤ ∆. A function f defined on X is called uniform with respect to P if for any x, y ∈ X such that P (x) = P (y) we have f (x) = f (y). Definition 6 (Probabilistic Partition). A probabilistic partition Pˆ consists of a probability distribution over a set of ball partitions P. Pˆ is called ∆-bounded if every P ∈ P is ∆-bounded. Given a collection of functions η = {ηP : X → [0, 1] | P ∈ P}, a ∆-bounded probabilistic partition Pˆ is called η-padded if for every x ∈ X: Pr[B(x, ηP (x)) ⊆ P (x)] ≥ 1/2 Definition 7 (Uniformly Padded Local PP). Given ∆ > 0, let Pˆ be a ∆-bounded probabilistic partition of (X, d). Given collection of functions η = {ηP : X → [0, 1]|P ∈ P} such that ηP is a uniform function with respect to P . We say that Pˆ is a η-uniformly padded local probabilistic partition if the event B(x, ηP (x)∆) ⊆ P (x) occurs with probability 1/2 and is independent of all random choices outside of B(x, 2∆). Formally for all C ⊆ X \ B(x, 2∆) and all functions r : C → R+ , ^ Pr[B(x, ηP (x)∆) ⊆ P (x) | ri = r(ci )] ≥ 1/2 i|ci ∈C
Definition 8. Let (X, d) be a finite metric space. Let τ ∈ (0, 1]. We say that X admits a τ -decomposition if for every 0 < ∆ ≤ diam(X) there exists a ∆-bounded probabilistic partition Pˆ of X such that Pˆ is τ -padded. It is known that λ-doubling metrics admit a Ω(log−1 λ)-decomposition and Kr -minor excluded graphs admit a Ω(r−2 )-decomposition. The following Lemma with local properties is proven in [3]. Lemma 2 (Uniform Padding Lemma). Let (X, d) be a finite metric space. Let 0 < ∆ ≤ diam(X). Let Γ ≥ 64. There exists a ∆-bounded probabilistic partition Pˆ of (X, d) and a collection of uniform functions {ξP : X → {0, 1} | P ∈ P} and η = {ηP : X → (0, 1] | P ∈ P} such that the probabilistic partition Pˆ is a η-uniformly padded local probabilistic partition; and the following conditions hold for any P ∈ P and any x ∈ X: 1. If ξP (x) = 1 then: 2−7 / ln ρ(x, 8∆, Γ) ≤ ηP (x) ≤ 2−7 . 2. If ξP (x) = 0 then: ηP (x) = 2−7 and ρ¯(x, 8∆, Γ) < 2. Furthermore, if X admits a τ -decomposition then the probabilistic partition Pˆ is a η-uniformly padded probabilistic partition; and in addition to conditions 1. and 2. the following additional condition holds for any P ∈ P and any x ∈ X: 3. ηP (x) ≥ τ /2.
4 4.1
Basic Results Embedding into an ultrametric with scaling local distortion
The following theorem is a strengthening of the known embeddings of metrics into an ultrametric [7, 12, 19]: Theorem 1. For any finite metric space (X, d) on n points there exists an embedding into an ultrametric T with strong scaling local distortion k. 5
Proof. Let G = (V, E, w) be some graph with weights w on the edges, whose shortest-path metric is the metric of X, and let M = (V, E 0 , w) be its minimum spanning tree. We now define a recursive process for constructing the ultrametric T given M : Let e ∈ E 0 be the largest edge in M . Create the root of T with label w(e1 ), the root will have 2 children which are the ultrametrics created recursively on the spanning trees M1 and M2 , which are the 2 connected components of M \ {e}. Note that the removal of e divides M into 2 disjoint trees M1 , M2 , and each is indeed a minimum spanning tree on its vertices. From the choice of e any e0 ∈ M \ {e} satisfy w(e0 ) ≤ w(e), hence T is indeed an ultra-metric. It remains to show that the k-local distortion is at most k, for all k > 0 simultaneously. Let u, v ∈ X such that v ∈ B(u, rk (u)), and assume they were first separated in the i-th step, meaning that if M (i) is the MST of a connected component containing u and v after i − 1 recursive applications of the process, then their distance (i) in T is fixed to be ` = w(ei ) (where ei is the largest edge in M (i) ). Removing ei from M (i) divides it to M1 (i) and M2 . A known property of minimum spanning trees is that the largest edge in any cycle of G cannot be in (i) (i) M , hence it can be inferred that there is no edge shorter than ` connecting M1 to M2 , hence dT (u, v) = ` ≤ dX (u, v). Assume by contradiction that dX (u, v) > k · dT (u, v) = k · `, then for each i = 0, . . . , k let Bi = BX (u, i · `) and consider the shells Si = Bi+1 \ Bi . Since there are at most k vertices in B(u, dX (u, v)) and dX (u, v) > k · `, there are at most k − 1 points other than u in B(u, k · `). Hence there exists i ∈ [k] such that Si is empty. Therefore to connect u and v by a path in M (i) we must use an edge greater than ` in M (i) - contradiction to the choice of ei . To get a strong scaling local distortion, one simply replace in the lower bound analysis dX (u, v) with rk0 (u) for some k 0 ≤ k.
4.2
Embedding into `p with k-local distortion
Theorem 2. For any finite metric space X on n points there exists an embedding into `p with strong k-local distortion O( logp k ) and dimension O(2p log n). The embedding is a variant of the partition-based embedding of [10] (similar techniques appear in [2]). A simpler proof based on Bourgain’s embedding [14] and Matouˇsek’s improvements for `p [24] gives a slightly larger dimension of O(2p log n log k). The most basic result is obtained simply by taking the first θ(log k) coordinates of Bourgain’s embedding. We defer the proof to Appendix A. We give stronger results in Section 6 where the dimension is only O(log k) given a very weak growth bound.
4.3
Embedding into `p with scaling local distortion
Our embedding is based on Bourgain’s embedding [14] and Matouˇsek’s improvements for `p [24]. We use a function ϑ ∈ Ξ to scale each coordinate as described below. Theorem 3. For any finite metric space d) on n points and ϑ ∈ Ξ there exists an embedding into `p (X, 1− p1 p1 log k log k with strong scaling local distortion O ϑ p , worse case distortion O((1/p) log n) and p dimension O(2p log2 n). Proof. Let s = 2p ,t = logs n, T = {i | 1 ≤ i ≤ t}, q = O(s log n) and Q = {j | 1 ≤ j ≤ q}. Choose random subsets Aij for i ∈ T , j ∈ Q, such that each point included in Aij independently with probability s1i . We
6
now define the embedding φ : X → `t·q p by defining for each i ∈ T , j ∈ Q a function φi,j : X → R+ by φi,j (u) =
d(u,Aij ) , ϑ(i)1/p
and
φ(u) =
q t M M
φi,j (u)
i=1 j=1
For any k, let u, v ∈ X be such that v ∈ B(u, rk (u)). Let rsi = max{rsi (u), rsi (v)} and set δi = rsi −rsi−1 . Let t0 be the smallest such that rst0 + rst0 −1 ≥ d(u, v)/4. If rst0 ≥ d(u, v)/2 set rst0 = d(u, v)/2. Notice that Pt0 t0 ≤ dlogs ke and that rst0 ≥ d(u, v)/8, hence i=1 δi = rst0 ≥ d(u, v)/8. As v ∈ B(u, rk (u)) we have for all 1 ≤ i ≤ t0 that si ≤ k. By standard arguments it can shown that with constant probability for any such n pair u, v and scale i ∈ T there exists a subset J = J(u, v, i) ⊆ Q such that |J| ≥ log 16 and for any j ∈ J : |d(u, Aij ) − d(v, Aij )| ≥ δi . q X t X d(u, Aij ) − d(v, Aij ) p kφ(u) − φ(v)kpp = ϑ(i)1/p j=1 i=1 t X 1 = O(q) · d(u, v)p ϑ(i) i=1 q X t X d(u, Aij ) − d(v, Aij ) p = 1/p ϑ(i) j=1 i=1
≤ q · d(u, v)p kφ(u) − φ(v)kpp
t0
≥
X p log n δ 16ϑ(logs k) i=1 i
≥
0 p t X log n δi 16(t0 )p−1 ϑ(logs k) i=1
≥
log n 16(t0 )p−1 ϑ(log
s
k)
(d(u, v)/8)p ,
after appropriate scaling we get the claimed local distortion. ¯ = min{ϑ(i), log n} and use ϑ¯−1/p as the scaling To see the worse case distortion of O(logs n), let ϑ(i) s ¯ factor in the embedding. Since ϑ ≤ ϑ the lower bound holds, moreover, after the scaling kφ(u) − φ(v)kp ≥ d(u,v) 0 0 logs n for all u, v as required. The upper holds as well: Let i be the largest integer such that ϑ(i ) ≤ logs n, then t i0 t X X X 1 1 1 ≤ + ≤ O(1) + 1. ¯ ϑ(i) log ϑ(i) sn 0 i=1 i=1 i=i +1
We showed that the embedding has scaling local distortion, the strong version follows similarly. Note that for any > 0 there exists cˆ such that ϑ(k) = cˆ · k(log k)1+ and ϑ ∈ Ξ, hence Corollary 3. For any finite metric space (X, d) on n points and any constant > 0 there exists an embedding into `p with scaling local distortion 1+ log k O (log log k) p p
7
4.4
Lower dimension for 1-local distortion
Theorem 4. For any finite metric space X there exists an embedding into `3p with 1-local distortion and an embedding into the line which is an isometry on nearest neighbors.
√ p
3,
Proof. Let G = (V, E) be an unweighed graph with vertices corresponding to the points of X, and a pair (u, v) ∈ E, iff v ∈ N1 (u) of u. Since the outgoing degree of each node is one, each connected component in G has at most one cycle. Fix some component H and, let rH be an arbitrary node of H, and if there is a cycle, let wH be the farthest point on the cycle from rH (breaking ties arbitrarily). Define 2 sets A1 , A2 as follows: for any connected component H in G, insert into A1 all the vertices in even distance from rH , and into A2 all the vertices in odd distance from rH . Define the embedding into R3 as f (u) = (d(u, A1 ), d(u, A2 ), g(u)) where g(u) is d(u, N1 (u)) if u = wH and 0 otherwise. It can be checked that f is non-expansive and that √ the distortion of nearest neighbors is at most p 3 We note that if the requirement that the embedding is contractive is removed then it is easy to see that 1 dimension suffices.
5
Probabilistic Local Embedding into Ultrametrics
Probabilistic embedding of metrics into ultrametrics [7] has many applications in online and approximation algorithms. The basic theorem states that every metric space probabilistically embeds into an ultrametrics with O(log n) distortion [8, 18, 9]. Here we extend this result to local embeddings.
5.1
Probabilistic embedding into trees with k-local distortion
Theorem 5. For any finite metric space (X, d) on n points there exists a probabilistic embedding into a distribution of ultrametrics with strong k-local distortion O(log k). We defer the proof to Appendix B.
5.2
Probabilistic embedding into trees with scaling local distortion
Theorem 6. For any finite metric space (X, d) on n points and ϑ ∈ Ξ there exists a probabilistic embedding into a distribution of ultra-metrics with strong scaling local distortion O(ϑ(log k)), and worst case distortion O(log n). We recall the following lemma implicitly proved in [15, 18] (a similar lemma appears in [2]). Lemma 4. Given a finite metric space (X, d) and 0 < ∆ ≤ diam(X), there exists a ∆-bounded probabilistic ball partition Pˆ of X such that for any x ∈ X and any 0 < η ≤ 1/8: |B(x, ∆)| Pr[B(x, η∆) ⊆ P (x)] ≥ 1 − η log |B(x, ∆/8)| Let ∆0 = diam(X) and for any integer i > 0 let ∆i = ∆0 2−i . For all i > 0 create a ∆i -bounded probabilistic ball partition Pˆi as in Lemma 4, and define for each Pi ∈ Pi and any cluster C ∈ Pi b(C) = log (|B(υ(C), 2∆i )|) . Fix some collection of ball partitions P = {Pi ∈ Pi | i > 0}, and let the label of a cluster C ∈ Pi be ∆i α(C) = ϑ(b(C)) . 8
Claim 5. For all i > 0, if C ∈ Pi , D ∈ Pi−1 and C ∩ D 6= ∅ then α(D) ≤ 2α(C). Proof. Since d(υ(C), υ(D)) ≤ ∆i−1 /2+∆i /2 ≤ ∆i−1 and 2∆i = ∆i−1 , we get B(υ(C), 2∆i ) ⊆ B(υ(D), 2∆i−1 ) ∆i−1 ∆i which suggests that b(C) ≤ b(D) hence α(C) = ϑ(b(C)) = 2ϑ(b(C)) ≥ α(D)/2. Note that a cluster C ∈ Pi may have a label smaller than a cluster D ∈ Pj for j > i and C ∩ D 6= ∅, hence creating a laminar family from the partition in the usual manner will not maintain the weak monotonicity property of labels. To overcome this hurdle we recursively define a sequence of hierarchical partitions Q(1) , . . . , Q(log ∆0 ) where for each i, Q(i) is a sequence of i partitions for scales ∆1 to ∆i . Initially Q(1) = (1) (i−1) (i−1) {Q1 = P1 }. Given a hierarchical partition Q(i−1) = {Q1 , . . . , Qi−1 } and Pi we define a hierarchical (i)
(i)
partition Qi = {Q1 , . . . , Qi } in the following manner. (i−1)
1. “Beam up” phase: | D ∩ C 6= ∅ ∧ α(D) < α(C)}, S For any C ∈ Pi and j < i let Rj (C) = {D ∈ Qj let sj (C) = C ∩ D∈Rj (C) D. Intuitively, We want to “beam up” each sj (C) to be a cluster in Qj . S (i) (i−1) Formally, for any j < i, let Qj = {D \ {C|D∈Rj (C)} C | D ∈ Qj } ∪ {sj (C) | C ∈ Pi }. The labels S are naturally maintained: each cluster D \ {C|D∈Rj (C)} C gets label α(D) and each cluster sj (C) gets label α(C). (i) (i) 2. “Laminarization” phase: Let Qi = {C ∩ D | C ∈ Pi , D ∈ Qi−1 }. Each cluster C ∩ D gets label α(C). For each set of partitions P ∈ P denote Q = Q(log ∆0 ) = {Q1 , . . . , Qlog ∆0 }. Note that the “laminarization” phase guarantees that Q is indeed hierarchical. Construct a labelled tree T from Q and its labels in the natural manner. Note that T indeed represents an ultrametric, since the “beam up” phase guarantees that if C ∈ Qi , D ∈ Qi−1 such that C ⊆ D then α(D) ≥ α(C). d(x,y) Claim 6. For any pair (x, y), such that y ∈ B(x, rk (x)): dT (x, y) ≥ Ω ϑ(log k) . Proof. Let i be the smallest integer such that 3∆i ≤ d(x, y), it is immediate that (x, y) are separated in the partition Pi . Let v = υ(Pi (x)), then B(v, 2∆i ) ⊆ B(x, 3∆i ) ⊆ B(x, d(x, y)) ⊆ B(x, rk (x)), d(x,y) ∆i which implies that b(Pi (x)) ≤ log k, hence α(Pi (x)) ≥ ϑ(log ≥ Ω k) ϑ(log k) . Now we claim that α(Qi (x)) ≥ α(Pi (x)), which holds since if some cluster replaced the part of Pi (x) that contained x it must have had a larger label than α(Pi (x)), and its radius is smaller than the radius of Pi (x) therefore it cannot contain y. This implies that d(x, y) . dT (x, y) ≥ Ω ϑ(log k)
Claim 7. For any pair (x, y): E[dT (x, y)] ≤ O(d(x, y)). Proof. Fix some partition P ∈ P. For every x, y ∈ X, let αi (x, y) = max{α(Pi (x)), α(Pi (y))}. Define the events Ci (x, y) = {Pi (x) 6= Pi (y)} , ^ Mi (x, y) = Ci (x, y) ∧ αi (x, y) ≥ αj (x, y) . j>i
Notice that if for some scale i event Ci (x, y) holds but Mi (x, y) does not then consider the scale j > i maximizing αj (x, y) (w.l.o.g αj (x, y) = α(Pj (x))). Then some part of Pi (x) that contains x will be replaced 9
by some part of Pj (x) containing x, and the event that Qi (x) 6= Qi (y) will depend only on the event Pj (x) 6= Pj (y). So when considering the sum over all scales we need not take into account scale i since we already accounted for scale j. Let bi = bi (x) = log |B(x, ∆i )|, bi is monotonic decreasing with i. Note that we can bound Ci (x, y) by the probability that either B(x, d(x, y)) or B(y, d(x, y)) is cut (using Lemma 4), hence if αi (x, y) = α(Pi (x)) we will take B(x, d(x, y)) and vice versa. this implies that we can assume w.l.o.g that αi (x, y) = α(Pi (x)). For all i > 0 since d(x, υ(Pi (x))) ≤ ∆i it follows that b(Pi (x)) ≥ bi . Let ` be the largest integer such that ∆` > 8d(x, y). ET [dT (x, y)] ≤
X
Pr[Mi (x, y)]E[αi−1 (x, y) | Mi (x, y)]
i>0
≤
X
Pr[Ci (x, y)]2E[αi (x, y) | Mi (x, y)]
i>0
` X X ∆i d(x, y) · (bi − bi+3 ) E | Mi (x, y) + 2 ∆i ≤ 4 ∆i ϑ(b(Pi (x))) i=1 i>`
≤ 4d(x, y)
` X
bi X
1/ϑ(bi ) + 2∆`
i=1 j=bi+3 +1
≤ 4d(x, y)
` X
bi X
1/ϑ(j) + 32d(x, y)
i=1 j=bi+3 +1
≤ 26 d(x, y)
X
1/ϑ(j) = O(d(x, y)).
j>0
The second inequality follows from Claim 5, the third inequality follows from Lemma 4. This concludes the proof of scaling local distortion O(ϑ(log k)). ¯ Now we show that the worst case distortion can be bounded by O(log n). Let ϑ(k) = min{ϑ(k), log n}, ¯ and use ϑ instead of ϑ in the embedding. Notice that we increased the labels, hence the lower bound remains true, further more for any pair x, y we have dT (x, y) ≥ d(x,y) log n . It remains to show that the upper bound remains constant. Let t be the largest integer such that ϑ(bt ) > log n. Then it is enough to bound t X bi − bi+3 3(b1 − bt+3 ) 6 log n ≤ ≤ = O(1), ¯ log n log n ϑ(bi ) i=1
and bi ` ` X X X bi − bi+3 ≤ 3 1/ϑ(j) = O(1). ¯ i) ϑ(b i=t+1 i=t+1 j=b i+3
We showed that the embedding has scaling local distortion, the strong version follows similarly.
5.3
Lower bound for spanning trees
An important variant in embedding into trees occurs in a graph setting, when we seek an embedding into a spanning tree of the graph. Probabilistic embedding into spanning trees has been studied in [5, 17]. In [4] embeddings into a single spanning tree and into a distribution on spanning trees, with constant average distortion are shown. However, local embedding into a single spanning tree can incur distortion n − 1 even for k = 1 (take the cycle graph, finding a spanning tree is done by removing some edge, which will incur the distortion for an adversarial choice of nearest neighbors). Probabilistic embedding into a distribution of spanning trees cannot overcome the Ω(log n) lower bound even for k = 1. 10
Theorem 7. There exists a metric space (X, d) derived from a graph G such that any embedding into a distribution of spanning trees of G will incur 1-local distortion of Ω(log n).
6
Embedding into `p with k-local Distortion and Low Dimension
To achieve low dimension for k-local embeddings into `p we use local probabilistic partitions, a method of embedding based on uniform probabilistic partitions [2] , and the Lov´asz Local Lemma. We defer the proof to Appendix C. Theorem 8. For any finite metric space (X, d) on n points with a χ very weak growth bound there exists an embedding into `p with k-local distortion O(log k) 1 and dimension O(log k).
7
Partition Based Embeddings into `p
For decomposable metrics, we improve the scaling local distortion. Using a partition based embedding [2] we get the following: Theorem 9. For any metric space X on n points admitting a τ -padded decomposition, for any p ≥ 1 and ϑ ∈ Ξ there exists an embedding into `p with strong scaling local distortion O τ −1+1/p ϑ(log k)1/p , worse case distortion O(τ −1+1/p (log n)1/p ) and dimension O(log n log Φ), where Φ denotes the aspect ratio of X 2 . We defer the proof to Appendix D. In a similar manner we can show the following: Theorem 10. For any finite metric space (X, d) on n points and any ϑ ∈ Ξ there exists an embedding into k) `p with strong scaling local distortion O( ϑ(log ), worse case distortion O( logp n ) and dimension O(2p log n). p
8
Local Ramsey Partitions
In this section we extend the results of [25] to give Ramsey partitions with improved local guarantees. These are later used to get improved local approximations for distance oracles and for approximate ranking. The following lemma follows directly from the uniform padding lemma of [2], a different proof also appears in [25]: Lemma 8. For any metric space (X, d), ∆ > 0 there exists a ∆-bounded probabilistic partition Pˆ of X such that for all η ∈ (0, 1/C] and x ∈ X: Pr[B(x, η∆) ⊆ P (x)] ≥ ρ(x, 4∆, Γ)−Cη , where Γ = 16,C = 64. Definition 9. Let (X, d) be a metrics space and ϑ ∈ Ξ. Let P be a hierarchical partition of X, let t be a parameter. • A point x ∈ X is completely locally padded with parameter t if B(x, 2i /ti ) ⊆ Pi (x) for ti = min{t, ϑ(log |B(x, 2i )|)} for all i. • A point x ∈ X is k-locally padded with parameter t if B(x, 2i /t) ⊆ Pi (x) for all i > 0 such that 2i ≤ rk (x). 1 In
fact we show this bound for all u, v ∈ X such that v ∈ B(u, rk (u)) dimension can be bounded by O(log2 n) using a more involved argument.
2 The
11
The following Lemma extends a similar lemma of [25], by giving better padding parameters depending on the locality. Lemma 9. For any finite metric space (X, d) and parameter t > 1, there exists a distribution on ultrametrics such that any point x ∈ X is completely locally padded with parameter t with probability nΩ(−1/t) . We also have the following k-local variation Lemma 10. For any finite metric space (X, d), k ∈ N and parameter t > 1, there exists a distribution on ultrametrics such that any point x ∈ X is k-locally padded with parameter t with probability k Ω(−1/t) . Proof of Lemma 9. Create 2i -bounded probabilistic partitions as in Lemma 8 independently for each scale i > 0. Let i(t) be the largest such that ϑ(log |B(x, 2i(t) )|) ≤ t
Pr[∀i, B(x, 2i /ti ) ⊆ P (x)] ≥
Y |B(x, 2i )| −16/ti |B(x, 2i /8)| i>0
−16/ϑ(log |B(x,2 i(t) Y |B(x, 2i )| ≥ |B(x, 2i /8)| i=1 Pi(t)
≥ 2
i=1
log
|B(x,2i )| |B(x,2i /8)|
i
)|)
Y |B(x, 2i )| −16/t · |B(x, 2i /8)|
i>i(t) i (−16/ϑ(log |B(x,2 )|)) −48/t
·n
We now bound the exponent i(t) X i=1
log
|B(x, 2i )| |B(x, 2i /8)|
i
(16/ϑ(log |B(x, 2 )|))
=
≤
i(t) X
log |B(x,2i )|
i=1
j=log |B(x,2i /8)|+1
i(t) X
log |B(x,2i )|
X
16/ϑ(log |B(x, 2i )|)
X
16/ϑ(j)
i=1 j=log |B(x,2i /8)|+1
≤ 3
X
16/ϑ(j) = O(1).
j>0
Which gives probability nΩ(−1/t) as required. Proof of Lemma 10. Similarly to the previous lemma, create 2i -bounded probabilistic partitions as in Lemma 8 independently for each scale i > 0. For each x ∈ X let i(x) = max{i | 2i ≤ rk (x)}. Pr[∀i ∈ [1, i(x)], B(x, 2i /t) ⊆ P (x)] ≥
i(x)
Y
i=1
9
|B(x, 2i )| |B(x, 2i /8)|
−16/t
≥ k −48/t .
Embedding with Proximity Distortion
Embeddings with local distortion provide bounds on the distortion of a pair x, y as a function of how many neighbors are closer. It is natural to ask if one can provide distortion bounds that are simply a function 12
of the distance between x and y. We call such embeddings: embeddings with proximity distortion. See Definition 2. For decomposable metrics we have the following result. We defer the proof to Appendix E. Theorem 11. For any finite metric (X, d) on n points that admits a τ -padded decomposition and ϑ ∈ Ξ there exists an embedding into `p with scaling proximity distortion O(τ −1 ϑ(log t)) and dimension O(log n). We also consider growth bounded metrics. For such metrics the local distortion results can be translated into proximity distortion. Recall that a metric (X, d) is said to be χ-growth bounded if for all x ∈ X, r > 0 : |B(x, 2r)| ≤ 2χ |B(x, r)|. Claim 11. Let (X, d) be an χ-growth bounded metric, then there exists an embedding into `p with scaling proximity distortion O(ϑ(χ log t)). Proof. By definition of growth bound if x, y ∈ X such that d(x, y) ≤ t then |B(x, t)| ≤ tχ hence y ∈ B(x, rtχ (x)), hence there exists an embedding where the distortion of x, y is bounded by O(ϑ(log(tχ ))) = O(ϑ(χ log t)). All the other results translate into proximity distortion for growth-bounded metrics in a similar manner.
10 10.1
Applications Small world model
Given a metric space (X, d) and a distribution Π on X 2 , such that local pairs are given higher probability, then using our embedding techniques yields constant average distortion with respect to Π. For example, for α > 0, our embedding gives constant average distortion for any of Kleinberg’s “small −(1+α) world” distributions Π(x, y | x) = Pnk i−(1+α) . i=1
Lemma 12. Let (X, d) be a metric space, and Π a probability distribution satisfying that given x, the 1 conditional probability Π(x, y | x) to choose y with d(x, y) = rk (x) is bounded by ϑ(k)·ϑ(log there exists k) . Then i h d(x,y) (Π) an embedding f into `p or a distribution over ultrametrics with avgdist (f ) = E(x,y)∼Π kf (x)−f (y)kp = O(1). Proof. Let f : X → `p be a scaling local embedding with distortion c · ϑ(log k). avgdist(Π) (f ) X X = Π(x, ·) Π(x, y | x) x∈X
≤
X
y∈X
d(x, y) kf (x) − f (y)kp
n X X Π(x, ·) Π(x, y | x ∧ d(x, y) = rk (x))cϑ(log k)
x∈X n X
≤ c
k=1
k=1y∈X
ϑ(log k) ϑ(k) · ϑ(log k)
= c
n X
1/ϑ(k) = O(1).
k=1
Lemma 13. Let (X, d) be a metric space, and Π a probability distribution satisfying that given x, the Z conditional probability Π(x, y | x) to choose y with d(x, y) = rk (x) is d(x,y)·ϑ(k)·ϑ(log k) , where Z is a scaling factor. Then there exists an embedding f into `p or a distribution over ultrametrics with distavg(Π) (f ) = E(x,y)∼Π [d(x,y)] E(x,y)∼Π [kf (x)−f (y)kp ] = O(1). 13
Proof. Let f : X → `p be a scaling local embedding with distortion O(ϑ(log k)). P x,y∈X Π(x, y) · d(x, y) (Π) distavg (f ) = P x,y∈X Π(x, y) · kf (x), f (y)kp P Pn x∈X Π(x, ·) k=1 Π(x, y | x ∧ d(x, y) = rk (x))d(x, y) Pn = P x∈X Π(x, ·) k=1 Π(x, y | x ∧ d(x, y) = rk (x))d(x, y)/(cϑ(log k)) Pn 1 Z k=1 ϑ(k)·ϑ(log k) Pn ≤ ≤ O(1). 1 Z k=1 cϑ(k)
10.2
Online problems
Consider any online problem defined on a metric space, which has poly-logarithmic competitive ratio algorithm based on probabilistic embedding into a distribution of ultrametrics, e.g. the metrical task system, file allocation. Obtaining poly-logarithmic approximation is desirable, but it may be desirable, in addition, to obtain better results if the demand sequence happens to have a local nature. Instead of using the standard embedding of [7, 18] we can use the embedding given in Theorem 6. This provides the following local strengthening to the standard competitive ratio bound: if the request sequence is such that the objective function contains only distances between pairs u, v such that v is the kth nearest neighbor of u then the competitive ratio improves as a function of k, that is the O(log n) overhead due to the embedding is replaced by an overhead of only ϑ(log k).
10.3
Local distance oracles
In [25, 11] it is shown how Ramsey partitions can be used to obtain efficient proximity data structures. Using the same data structure as [25] we can get local variations on distance oracles under a weak growth bound assumption. Claim 14. Let (X, d) be a metric space with χ weak growth bound, fix some t > 1, x ∈ X and let T be an ultra metric in which x is completely locally padded with parameter t, then for all y ∈ X, if k = k(y) ∈ N is such that d(x, y) = rk (x) then d(x, y) ≤ dT (x, y) ≤ O(d(x, y)ϑ(log k)). If the metric space does not have the weak growth bound, we can use our embedding into Euclidean space as distance oracles. Theorem 12. For any finite metric space there exists the following type of distance oracles: 1. For a fixed k: O(log k) stretch for any y ∈ B(x, rk (x)), O(log n) query time, and O(n log n) memory. 2. Scaling: O(ϑ(log k)) stretch for all y ∈ B(x, rk (x)) for all k ∈ N, O(log n) query time, O(n log n) memory. 3. Scaling: For all x, y such that y ∈ B(x, rk (x)) : k stretch, O(1) query time, O(n) memory. If the metric space has a χ weak growth bound then for any t > 1 there exists a distance oracle 4. For a fixed k: O(t) stretch for all y ∈ B(x, rk (x))3 , O(1) query time, O(n · k χ/t ) memory. 5. Scaling: min{O(χ · ϑ(log k)), O(t)} stretch for all y ∈ B(x, rk (x)) for all k ∈ N, O(1) query time, O(n1+1/t ) memory. (1) follows from Theorem 2, the proof of (2) using Theorem 10 (3) follows simply from Theorem 1. (4) using Lemma 10; (5) using Lemma 9 and Claim 14. 3 If
y∈ / B(x, rk (x)), then the query will return value at least rk (x)/O(t)
14
10.4
Approximate ranking
The ranking problem is defined as follows: Given a metric space (X, d) on {1, . . . , n} points, find for any x ∈ X a permutation π (x) of X, such that for all y, z ∈ X: if y = π (x) (i), z = π (x) (j) and i < j then d(x, y) ≤ d(x, z). A k-approximate ranking is relaxing the last condition to d(x, y) ≤ k · d(x, z). It is shown in [25, 11] that for any k > 1 there exist a data structure with O(k)-approximate ranking which can be preprocessed in O(kn2+1/k log n) time, uses O(kn1+1/k ) space, and support queries such as π (x) (i) or finding i ∈ [n] such that π (x) (i) = y in O(1) time. We show a variation on this result, in which the approximation factor scales in a relative manner to the locality of the query points. Our data structure is be based on local Ramsey partitions or embedding the points with local distortion into an ultrametric or a distribution over ultrametrics. We provide a theorem similar to Theorem 12 for approximate ranking. The complete details appear in the full version of the paper. Theorem 13. For any finite metric space (X, d) there exist data structures for approximate ranking, such that for any x ∈ X: 1. For a fixed k: O(log k) stretch for any y, z ∈ B(x, rk (x)), O(log n) query time, and O(n log n) memory. 2. Scaling: O(ϑ(log k)) stretch for any y, z ∈B(x, rk (x)) and all k ∈ [n], O(log n) query time with O(n log n) memory. 3. Scaling: For all x, y such that y, z ∈ B(x, rk (x)) : k stretch, O(1) query time, O(n) memory. If (X, d) was a χ weak growth bound, there exist data structures for approximate ranking, such that for any t > 1 and x ∈ X: 4. For a fixed k: O(t) stretch for any y ∈ B(x, rk (x)) and any z ∈ X, O(1) query time, and O(tn · k χ/t ) memory. 5. Scaling: O(min{χ · ϑ(log k), t}) stretch for any y ∈ B(x, rk (x)), any z ∈ X and all k ∈ [n], O(1) query time with O(tn1+1/t ) memory. Notice that the approximations for weak growth bounded spaces is better in a sense that it works for all z ∈ X, and the query time is constant. The results (1) and (2) for general spaces are obtained using Theorem 5 and Theorem 6 respectively, by sampling O(log n) ultrametrics and answering queries by returning the maximal distance from over all the trees. (3) follows from Theorem 1. The results (4) and (5) follow from the distance oracle results.
11
Open Problems
Most of the results in this paper are either tight or nearly tight. The tightness of our k-local results follows from known metric embedding lower bounds. There are several obvious questions. The is a small gap between our scaling local distortion upper bounds (such as in theorems 3,6) of O(ϑ(log k)) and the obvious lower bound of O(log k). Is the very weak growth bound assumption in Theorem 8 necessary? Are the weak growth bound assumptions of theorem 12 necessary? Is there a k-local analogue to the Johnson-Lidenstrauss lemma: does every finite metric space in `2 have a k-local embedding into `d2 with (1 + ) distortion, where d = O(log k/2 ) ?
References [1] Ittai Abraham, Yair Bartal, Hubert T.-H. Chan, Kedar Dhamdhere, Anupam Gupta, Jon M. Kleinberg, Ofer Neiman, and Aleksandrs Slivkins. Metric embeddings with relaxed guarantees. In FOCS, pages 83–100. IEEE Computer Society, 2005.
15
[2] Ittai Abraham, Yair Bartal, and Ofer Neiman. Advances in metric embedding theory. In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 271–286, New York, NY, USA, 2006. ACM Press. [3] Ittai Abraham, Yair Bartal, and Ofer Neiman. Embedding metric spaces in their intrinsic dimension, 2007. Manuscript. [4] Ittai Abraham, Yair Bartal, and Ofer Neiman. Embedding metrics into ultrametrics and graphs into spanning trees with constant average distortion. In SODA ’07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007. [5] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic game and its application to the k-server problem. SIAM J. Comput., 24(1):78–100, 1995. [6] Baruch Awerbuch and David Peleg. Sparse partitions. In Proceedings of the 31st IEEE Symposium on Foundations of Computer Science (FOCS), pages 503–513, 1990. [7] Y. Bartal. Probabilistic approximation of metric spaces and its algorithmic applications. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996), pages 184–193. IEEE Comput. Soc. Press, Los Alamitos, CA, 1996. [8] Y. Bartal. On approximating arbitrary metrics by tree metrics. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pages 183–193, 1998. [9] Y. Bartal. Graph decomposition lemmas and their role in metric embedding methods. In 12th Annual European Symposium on Algorithms, pages 89–97, 2004. [10] Y. Bartal. On embedding finite metric spaces in low-dimensional normed spaces, 2005. Tech. Report. [11] Y. Bartal. Ramsey decompositions and their applications, 2007. Manuscript. [12] Y. Bartal, N. Linial, M. Mendel, and A. Naor. On metric ramsey-type phenomena. Annals Math, 162(2):643–709, 2005. [13] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373–1396, 2003. [14] J. Bourgain. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math., 52(1-2):46–52, 1985. [15] Gruia Calinescu, Howard J. Karloff, and Yuval Rabani. Approximation algorithms for the 0-extension problem. In Symposium on Discrete Algorithms, pages 8–16, 2001. [16] Manuel Costa, Miguel Castro, Antony I. T. Rowstron, and Peter B. Key. Pic: Practical internet coordinates for distance estimation. In 24th International Conference on Distributed Computing Systems, pages 178–187, 2004. [17] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-stretch spanning trees. In STOC ’05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 494–503, New York, NY, USA, 2005. ACM Press. [18] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. In STOC ’03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 448–455. ACM Press, 2003. [19] Sariel Har-Peled and Manor Mendel. Fast construction of nets in low-dimensional metrics and their applications. SIAM J. Comput, 35(5):1148–1184, 2006. [20] P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science, pages 10–33, 2001. [21] Jon Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC ’00: Proceedings of the thirtysecond annual ACM symposium on Theory of computing, pages 163–170, New York, NY, USA, 2000. ACM Press. [22] Jon M. Kleinberg, Aleksandrs Slivkins, and Tom Wexler. Triangulation and embedding using small sets of beacons. In FOCS, pages 444–453, 2004. [23] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of Science, 102:11623–11628, August 2005.
16
[24] J. Matouˇsek. Note on bi-lipschitz embeddings into low-dimensional euclidean spaces. Comment. Math. Univ. Carolinae, 31:589–600, 1990. [25] Manor Mendel and Assaf Naor. Ramsey partitions and proximity data structures. In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 109–118, Washington, DC, USA, 2006. IEEE Computer Society. [26] David Peleg. Distributed Computing: A Locality-Sensitive Approach. SIAM Monographs on Discrete Mathematics and Applications, 2000. [27] Lin Xiao, Jun Sun, and Stephen Boyd. A duality view of spectral methods for dimensionality reduction. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 1041–1048, New York, NY, USA, 2006. ACM Press.
A
Proof of Theorem 2
Let s = 2p , t = logs k, T = {i | 1 ≤ i ≤ t}, q = O(s log n) and Q = {j | 1 ≤ j ≤ q}. Choose random subsets Aij independently for every i ∈ T , j ∈ Q, such that each point is included in Aij with probability s1i . We now define the embedding φ : X → `t·q p by defining for each i ∈ T , j ∈ Q a function φi,j : X → R+ by φi,j (u) = d(u, Aij ), and φ(u) =
q t M M
φi,j (u)
i=1 j=1
Let u, v ∈ X be such that v ∈ B(u, rk (u)). Let rsi = max{rsi (u), rsi (v)} for all i ∈ T , and set δi = rsi − rsi−1 . Since v ∈ B(u, rk (u)) we have P t sek we have i=1 δi ≥ dX (u, v). By the standard proof of Bourgain with improvements for high p by Matouˇ 1 that for any i ∈ T , j ∈ Q the following event holds with probability 8s : |d(u, Aij ) − d(v, Aij )| ≥ δi Hence by Chernoff bound there is a constant probability that for any such pair u, v and scale i ∈ T there (i) (i) (i) |Q| exists a subset Ju,v ⊆ Q such that |Ju,v | ≥ 16s and for any j ∈ Ju,v : |d(u, Aij ) − d(v, Aij )| ≥ δi . kφ(u) − φ(v)kpp
=
q X t X
|d(u, Aij ) − d(v, Aij )|p ≤ q · t · d(u, v)p
j=1 i=1
kφ(u) − φ(v)kpp
≥
t X X
t
|d(u, Aij ) − d(v, Aij )|p ≥
i=1 j∈J (i) u,v
log n 16tp−1
t X
log n X p δ 16 i=1 i
!p
log(n) (d(u, v))p p−1 16t i=1 after appropriate scaling (by (t · q)−1 ), we get distortion O(t) = O logp k . ≥
B
|δi |
≥
Proof of Theorem 5
Let ∆0 = diam(X). For each i > 0 define ∆i = ∆2i0 , and create a ∆i -bounded probabilistic ball partition ˆ i given by Lemma 4. For each i > 0, Qi ∈ Qi and cluster C ∈ Qi such that |B(υ(C), 2∆i )| > k we shall Q ignore the cluster C and define δ(C) = 0, otherwise let δ(C) = 1. We now create a hierarchical collection of clusters P, i.e., for each i > 0 the collection Pi is a refinement of Pi−1 . For any i ∈ I fix some Qi ∈ Qi , (1) (0) (0) and let Qi = {C ∈ Qi | δ(C) = 1}, Qi = {C ∈ Qi | δ(C) = 0} and Ci = ∪C∈Q(0) C. We conduct the i
(1)
(1)
(0)
following iterative process: for i > 0 define Pi = {C | ∃D ∈ Qi , ∃D0 ∈ Pi−1 , C = D ∩ D0 ∨ C = D ∩ Ci−1 } 17
Observation 15. Fix some i > 0, and let C ∈ Pi . Then either: 1. There exists D ∈ Pi−1 such that C ⊆ D, 2. Or for all j < i no D ∈ Pj contains C. Note that the clusters of P form a disjoint set of laminar families, which naturally induce a forest, with labels ∆i for cluster in Pi . Note that a cluster satisfying the second condition of Observation 15 will be a root in the forest. Now create an ultra-metric T from the partition in the following manner: Denote T1 , T2 , . . . , T` the trees in the forest with roots labelled w.l.o.g a1 ≤ a2 ≤ · · · ≤ a` . For i = 2, . . . , ` replace Ti by a tree with a new root labelled ai , who has 2 children which are Ti−1 , Ti . It can be checked that T ’s labels are weakly monotone, hence T represents an ultrametric. Claim 16. For any pair (x, y), dT (x, y) ≥ min{d(x, y)/12, rk (x)/12} Proof. If y ∈ B(x, rk (x)) then let i be the smallest integer such that 3∆i ≤ d(x, y), it is immediate that Qi (x) 6= Qi (y). Let v = υ(Pi (x)) then we have B(v, 2∆i ) ⊆ B(x, 3∆i ) ⊆ B(x, d(x, y)) ⊆ B(x, rk (x)), which implies that δ(Qi (x)) = 1, therefore there exists Pi (x) labelled ∆i , and y ∈ / Pi (x), since T represents an ultrametric d(x, y) . dT (x, y) ≥ ∆i ≥ 12 In the case y ∈ / B(x, rk (x)), let i be the smallest integer such that 3∆i ≤ rk (x), we obtain in the same way dT (x, y) ≥ rk (x)/12. Claim 17. For any pair (x, y), E[dT (x, y)] ≤ O(log k)d(x, y) Proof. Let ` be the largest integer such that 8d(x, y) ≤ ∆` . Let C0 (x, y) be the empty event and recursively define the event ^ Ci (x, y) = Qi (x) 6= Qi (y) ∧ δ(Qi (x)) + δ(Qi (y)) ≥ 1 ∧ ¬Cj (x, y) , j
note that if Ci (x, y) holds then dT (x, y) = ∆i . Using Lemma 4: X E[dT (x, y)] = Pr[Ci (x, y)]∆i−1 i>0
≤
X
Pr[Qi (x) 6= Qi (y)] (δ(Qi (x)) + δ(Qi (y))) 2∆i
i>0
≤ 2
X
Pr[B(x, d(x, y)) * Qi (x)]∆i · δ(Qi (x)) + 2
i>0
≤ 4
≤ 4d(x, y)
Pr[B(y, d(x, y)) * Qi (y)]∆i · δ(Qi (y))
i>0
` X d(x, y)∆i i=1
X
∆i ` X i=1
log
X |B(x, ∆i )| |B(y, ∆i )| · δ(Qi (x)) + log · δ(Qi (y)) + ∆i |B(x, ∆i /8)| |B(y, ∆i /8)| i≥`+1
|B(x, ∆i )| |B(y, ∆i )| log · δ(Qi (x)) + log · δ(Qi (y)) + ∆` |B(x, ∆i /8)| |B(y, ∆i /8)|
≤ O(log k)d(x, y) The last inequality holds since the sum is telescopic, and since δ(Qi (x)) = 0 where |B(x, ∆i )| > k we have that it telescopes to O(log k) which gives the desired result. 18
Theorem 5 follows by normalizing the labels of T by a factor of Θ(log k).
C
proof of Theorem 8
Let D = c0 ln k, for some constant c0 to be defined later. We will define an embedding f : X → `D p with (t) k-local distortion O(log k). We define f by defining for each 1 ≤ t ≤ D, a function f : X → R + , and L letting f = D−1/p 1≤t≤D f (t) . Fix t, 1 ≤ t ≤ D. In what follows we define f (t) . Let ∆0 = diam(X), let I = {i ∈ N | i ∈ [1, log8 ∆0 ]} and for every i ∈ I let ∆i = ∆0 8−i . We construct for all i ∈ I a uniformly ∆i -bounded ηi -padded probabilistic partition Pˆi as in Lemma 2 with parameter Γ = 83 . Let ξi be as defined in the lemma. Denote by Ω the probability space of all possible embeddings f . Now for every i ∈ I fix a partition Pi ∈ Pi . We define the embedding by defining the coordinates for each x ∈ X. Let `(x) ∈ I be the 7 minimal such that |B(υ(P`(x) (x)), 9Γ∆`(x) )| ≤ k χ . We now define ξ¯ in the following manner : 0 i < `(x) ξ¯P,i (x) = ξP,i (x) otherwise (t) (t) Define for x ∈ X, 0 < i ∈ I, φi : X → R+ , by φi (x) = ξ¯P,i (x)ηP,i (x)−1 . Lemma 2 and the definition of `(x) ensures that ξ¯i and ηi are uniform functions with respect to Pˆi so we have: (t)
(t)
Claim 18. For any x, y ∈ X and i ∈ I if Pi (x) = Pi (y) then φi (x) = φi (y). Claim 19. For any x ∈ X, t ∈ [D], X
(t)
φi (x) ≤ 210 χ7 log k
i∈I
Proof. X
(t)
φi (x)
X
=
−1 ηP,i (x)
i∈I|ξ¯P,i (x)=1
i∈I
≤
X
27 log
i≥`(x)
=
2 X X
7
2
i≥`(x) h=−3
|B(x, 8Γ∆i )| |B(x, 8∆i /Γ) log
|B(x, 8∆i )| |B(x, ∆i )
≤ 210 · χ7 log k. 7
The last in-equation holds since for every i ≥ `(x) we have |B(x, 8∆i−3 )| ≤ |B(υ(P`(x) (x)), 9Γ∆`(x) )| ≤ k χ , so the sum telescopes to χ7 log k. P (t) (t) For each 0 < i ∈ I we define a function fi : X → R+ and for x ∈ X, let f (t) (x) = i∈I fi (x). Let (t) {σi (C)|C ∈ Pi , 0 < i ∈ I} be i.i.d symmetric {0, 1}-valued Bernoulli random variables. The embedding is defined as follows: for each x ∈ X: (t)
(t)
(t)
• For each 0 < i ∈ I, let fi (x) = σi (Pi (x)) · min{φi (x) · d(x, X \ Pi (x)), ∆i }. The following claim was proved in [2] (t)
(t)
(t)
Claim 20. For any x, y ∈ X and 0 < i ∈ I : fi (x) − fi (y) ≤ min{φi (x) · d(x, y), ∆i }.
19
Lemma 21. There exists a universal constant C1 = C1 (χ) > 0 such that for any x, y ∈ X: |f (t) (x) − f (t) (y)| ≤ C1 log k · d(x, y). Proof. From Claims 19 and 20 we get f (t) (x) − f (t) (y)
=
X
≤
X
(t)
(t)
fi (x) − fi (y)
i∈I (t)
φi (x) · d(x, y)
i∈I 10 7
≤ 2 χ log k · d(x, y)
C.1
Lower bound analysis
Let Ni = {(x, y) | x, y ∈ X ∧ y ∈ B(x, rk (x)) ∧ d(x, y) ∈ [∆i−2 , ∆i−3 )} and let N = of the pairs in N . For all i > 0, x, y ∈ Ni and t ∈ [D] let Z(i,x,y,t) be the event that X ∆i _ (t) (t) (t) (t) |fi (x) − fi (y)| ≥ ∆i ∧ fj (x) − fj (y) ≤ 2 j 2 j0
Ni be a partition
For any embedding f define a function gi : Ni → 2D as follows gi (x, y) = t ∈ [D] | Z(i,x,y,t) , and let Z(i,x,y) be the event that |gi (x, y)| ≥ D/16. Notice that for both events Z(i,x,y) ,Z(i,x,y,t) we could have omitted the index i since T it can be inferred uniquely from d(x, y), we explicitly write this index for ease of presentation. Let Z = (x,y)∈N Z(i,x,y) , then Lemma 22. Pr[Z] > 0. Before showing the proof of this Lemma, let us see that it is sufficient to show the lower bound. Claim 23. Fix x, y ∈ Ni and t ∈ gi (x, y), then |f (t) (x) − f (t) (y)| > d(x, y)/211 . P P (t) (t) Proof. By Claim 20 and since ∆j is a geometric series | j>i fj (x) − fj (y)| ≤ j>i ∆j ≤ ∆i /4.... P P (t) (t) (t) (t) Since Z(i,x,y,t) holds | j≤i fj (x)−fj (y)| ≥ ∆i /2 so | j>0 fj (x)−fj (y)| ≥ ∆i /4 ≥ d(x, y)/211 . Lemma 24. There exists a universal constant C2 and an embedding f ∈ Ω such that for any x, y ∈ N : |f (x) − f (y)| ≥ C2 d(x, y). Proof. Using Lemma 22, let f ∈ Ω such that event Z took place. Consider any (x, y) ∈ Ni and t ∈ gi (x, y). Using Claim 23 and the fact that |gi (x, y)| ≥ D/16 1 X (t) kf (x) − f (y)kpp = |f (x) − f (t) (y)| D t∈D 1 X ≥ |f (t) (x) − f (t) (y)| D t∈gi (x,y)
≥ (1/16) 20
d(x, y) 211
Proof of Lemma 22. We shall make use the following variation of the Local Lemma due to Erdos and Lov´asz. Lemma 25 (Local Lemma). Let A1 , A2 , . . . An be events in some probability space. Let G(V, E) be a directed graph on n vertices with out-degree at most d, each vertex corresponding to an event. Let c : V → [m] be a rating function of events, such that if (Ai , Aj ) ∈ E then c(Ai ) ≤ c(Aj ). Assume that for any i = 1, . . . , n ^ Pr Ai | ¬Aj ≤ p j∈Q
for all Q ⊆ {j : (Ai , Aj ) ∈ / E ∧ c(Ai ) ≥ c(Aj )}. If ep(d + 1) ≤ 1, then " n # ^ Pr ¬Ai > 0 i=1
Define a graph G = (V, E) where V = {Z(i,x,y) | i > 0 ∧ (x, y) ∈ Ni }, define the ranking function as c(Z(i,x,y) ) = i and Z(i,x,y) , Z(i0 ,x0 ,y0 ) ∈ E ⇔ d(x, x0 ) ≤ 16rk (x) ∧ c(Z(i,x,y) ) ≤ c(Z(i0 ,x0 ,y0 ) ), note that the definition is not symmetric, G is a directed graph, and that the rating matches the requirements of Lemma 25. 5
Claim 26. The out-degree of G is bounded by k χ . 4
Proof. Fix any vertex Z(i,x,y) ∈ V . By the very weak growth bound condition |B(x, 16rk (x))| ≤ |B(x, rk (x))|χ = 4 k χ , and for each x0 ∈ B(x, 16rk (x)) there are at most k possible values of y 0 such that (x0 , y 0 ) ∈ N , hence 5 the out degree is bounded by k χ . Claim 27. Let (x, y) ∈ Ni . If (x0 , y 0 ) ∈ Ni such that (Z(i,x,y) , Z(i,x0 ,y0 ) ) ∈ / E then d({x, y}, {x0 , y 0 }) > 2∆i . Proof. By definition of Ni both d(x, y), d(x0 , y 0 ) < ∆i−3 . By definition of G, d(x, x0 ) > 16rk (x). As y ∈ rk (x) we have d(x, y) ≤ rk (x), hence d(x0 , y) ≥ d(x, x0 ) − d(x, y) ≥ 15d(x, y), and d(y 0 , y) ≥ d(x0 , y) − d(y 0 , x0 ) ≥ 15d(x, y) − ∆i−3 ≥ 15∆i−2 − 8∆i−2 > 2∆i . The proof for x is similar and easier. We now show a claim about event Z(i,x,y,t) which lies in the heart of using the local Lemma. Claim 28. For all (x, y) ∈ Ni : Pr ¬Z(i,x,y,t) |
^
Z(i0 ,x0 ,y0 ) ≤ 7/8
(i0 ,x0 ,y 0 )∈Q
for all Q ⊆ {(i0 , x0 , y 0 ) : (Z(i,x,y) , Z(i0 ,x0 ,y0 ) ) ∈ / E ∧ c(Z(i,x,y) ) ≥ c(Z(i0 ,x0 ,y0 ) )}. Proof. Fix (x, y) ∈ Ni , t ∈ [D]. For i0 < i and any x0 , y 0 ∈ Ni0 the events Z(i0 ,x0 ,y0 ) depend only on the first i0 scales of the probabilistic partition, so the padding in scale i and choice of σi will be independent of these events. Otherwise i0 = i. For any x0 , y 0 ∈ Ni such that (Z(i,x,y) , Z(i,x0 ,y0 ) ) ∈ / E, by Claim 27 d({x, y}, {x0 , y 0 }) > 0 0 2∆i . This suggests that x, y and x , y fall into different clusters in scale i, hence the choice of σi is independent for each. By the locality of our partition we have that the padding in scale i for x, y is independent of the padding for x0 , y 0 .
21
Even though the event Z(i,x,y) depend on scales j < i, we will show that there is probability at least 1/8 to succeed no matter what partitions were created in scales j < i: 7 ρ(x, ∆i−1 , Γ), ρ¯(y, ∆i−1 , Γ)} ≥ 2, we claim that |B(υ(Pi (y)), 9Γ∆i )| ≤ k χ . As d(x, υ(Pi (y))) ≤ By Claim 1 max{¯ d(x, y) + ∆i ≤ 2d(x, y), due the very weak growth bound assumption, 7
7
|B(υ(Pi (y)), 9Γ∆i )| ≤ |B(x, 72∆i−2 +2d(x, y))| ≤ |B(x, 74d(x, y))| ≤ |B(x, 27 rk (x))| ≤ |B(x, rk (x))|χ = k χ . The same argument holds for |B(υ(Pi (x)), 9Γ∆i )|. This suggests that ξ¯i (y) = ξi (y) and ξ¯i (x) = ξi (x). (t)
−1 W.l.o.g that ρ¯(x, ∆i−1 , Γ) ≥ 2 which suggests that ξi (x) = 1, hence φi (x) = ηP,i (x). If it is the case that X ∆i (t) (t) , fj (x) − fj (y) ≤ 2 j
then it is enough that the following will hold • B(x, ηP,i (x)∆i ) ⊆ Pi (x), (t)
• σi (Pi (x)) = 1. (t)
• σi (Pi (y)) = 0. By Lemma 2, the definition of σ and the fact that Pi (x) 6= Pi (y), the probability of each of these events (t) (t) −1 is independently at least 1/2. If all these events occur then |fi (x) − fi (y)| ≥ min{ηP,i (x) · d(x, X \ Pi (x)), ∆i } ≥ ∆i . If on the other hand X ∆i (t) (t) fj (x) − fj (y) > , 2 j 0
22
D
Proof of Theorem 9 0
D Let ∆0 = diam(X). Let D = Θ(log n), D0 = dlog ∆0 e. We will define an embedding f : X → `D , by p L 0 (t) D −1/p (t) defining for each 1 ≤ t ≤ D, an embedding f : X → `p and let f = D . Fix t, 1 ≤ t ≤ D. 1≤t≤D f L (t) (t) In what follows we define f (t) . For all 1 ≤ i ≤ D0 let fi : X → R+ and define f (t) = 1≤i≤D0 fi . Let I = {i ∈ N | i ∈ [1, log8 ∆0 ]} and for every i ∈ I let ∆i = ∆0 8−i . We construct for all i ∈ I a uniformly ∆i -bounded ηi -padded probabilistic partition Pˆi as in Lemma 2 with parameter Γ = 64, and let ξi be as defined in the lemma. For every i ∈ I fix some Pi ∈ Pi . In the usual embedding via partitions scheme we obtain a lower bound for every pair x, y ∈ X from only one ”critical” scale (which is approximately d(x, y)). Here, we use the same idea, but since the cluster in the critical scale may contain too many points, we get contribution from two scales lower than the critical one, which is guaranteed to be small enough. For this reason we define a new function ξ¯ as follows, for each i ∈ I, P ∈ H: ( |B(υ(Pi (x)),Γ3 ∆i )| 1 |B(υ(Pi (x)),8∆i )| ≥ 2 ξ¯P,i (x) = ξP,i (x) otherwise
It can be seen that the function ξ¯ is uniform as well. We define the embedding by defining the coordinates (t) for each x ∈ X. Define for x ∈ X, 0 < i ∈ I, φi : X → R+ , by ξ¯P,i (x)
(t)
φi (x) =
1/p
(ηP,i (x) · ϑ(log |B(υ(Pi (x)), 9Γ∆i )|))
Claim 29. For any x ∈ X, 1 ≤ t ≤ D we have X i∈I|ξ¯P,i (x)=1∧ξP,i (x)=0
1 ≤ O(1). ηP,i (x) · ϑ(log |B(υ(Pi (x)), 9Γ∆i )|)
Proof. Let I 0 = {i ∈ I | ξ¯P,i (x) = 1 ∧ ξP,i (x) = 0}. First note that if ξP,i (x) = 0 then by the properties of Lemma 2 ηP,i (x) = 2−7 . If i, j ∈ I such that i ≤ j − 6 then since d(υ(Pi (x)), υ(Pj (x))) ≤ ∆i /2 + ∆j /2 ≤ ∆i follows that B(υ(Pj (x)), Γ3 ∆j ) ⊆ B(υ(Pj (x)), ∆i ) ⊆ B(υ(Pi (x)), 8∆i ). It follows that when considering i ∈ I 0 which are multiples of 6, then |B(υ(Pi (x)), 8∆i )| increases by a factor of 2 for each of those i ∈ I 0 (relatively to the previous one). X i∈I 0
1 ηP,i (x) · ϑ(log |B(υ(Pi (x)), 9Γ∆i )|)
≤ 27
X i∈I 0
≤ 27
1 ϑ(log |B(υ(Pi (x)), 9Γ∆i )|)
5 X
X
h=0 i∈I 0 ,i=h(mod 6)
≤ 27
5 X X 1 ϑ(i) i>0
h=0
≤ O(1).
Claim 30. For any x ∈ X, 1 ≤ t ≤ D we have X
(t)
φi (x)p ≤ O(1)
i∈I
23
1 ϑ(log |B(υ(Pi (x)), 8∆i )|)
Proof. Fix some x ∈ X. For any i ∈ I let vi = υ(Pi (x)). Let bi = log(|B(x, ∆i )|). Notice that since d(vi , x) ≤ ∆i follows that log |B(vi , 9Γ∆i )| ≥ log |B(x, Γ∆i−1 )| = bi−4 . Using the minimality of vi and Claim 29 we have: X
φi (x)p
=
i∈I
X i∈I
≤
ξ¯P,i (x) ηP,i (x) · ϑ(log |B(vi , 9Γ∆i )|) X 1
i∈I|ξP,i (x)=1
X
≤
ηP,i (x) · ϑ(log |B(vi , 9Γ∆i )|) 27
i∈I|ξP,i (x)=1
≤ 27
X
+
i∈I|ξ¯P,i (x)=1∧ξP,i (x)=0
1 ηP,i (x) · ϑ(log |B(vi , 9Γ∆i )|)
bi−4 − bi+2 + O(1) ϑ(bi−4 )
X bi−4 X−1 1 + O(1) ϑ(j) i∈I j=bi+2
≤ 6 · 27
∞ X 1 + O(1) = O(1). ϑ(j) j=1
(t)
Let {σi (C)|C ∈ Pi , 0 < i ∈ I} be i.i.d symmetric {0, 1}-valued Bernoulli random variables. The embedding is defined as follows: for each x ∈ X: (t)
(t)
(t)
• For each 0 < i ∈ I, let fi (x) = σi (Pi (x)) · φi (x) · d(x, X \ Pi (x)). The following claim was proved in [2] (t)
(t)
(t)
Claim 31. For any x, y ∈ X and 0 < i ∈ I : fi (x) − fi (y) ≤ φi (x) · d(x, y). Lemma 32. There exists a universal constant C1 > 0 such that for any x, y ∈ X: kf (x) − f (y)kp ≤ C1 · d(x, y). Proof. From Claims 30 and 31 we get 0
kf (x) −
f (y)kpp
= D
−1
D D X X
(t)
(t)
|fi (x) − fi (y)|p
t=1 i=1 0
≤ D
−1
D X D X
(t)
φi (x)p · d(x, y)p
t=1 i=1
≤ D−1 · d(x, y)p
D X
O(1)
t=1
= O(d(x, y)p ).
Lemma 33. There exists a universal constant C2 > 0 such that for any x, y ∈ X where y ∈ B(x, rk (x)), with probability at least 1/8: kf (t) (x) − f (t) (y)kpp ≥ τ p−1 ϑ(log k)−1 · (C2 · d(x, y))p . Proof. Let 0 < ` ∈ I be such that ∆`−4 ≤ d(x, y) < ∆`−5 . 24
Claim 34. ξ¯P,` (x) + ξ¯P,` (y) ≥ 1. Proof. The proof is a simple variation of Claim 1, it is enough to observe that B(υ(P` (x)), 8∆` ) ⊆ B(υ(P` (y)), Γ3 ∆` ) (and vice versa), and that B(υ(P` (x)), 8∆` ) ∩ B(υ(P` (y)), 8∆` ) = ∅. Case 1: ξ¯P,` (x) = 1. First note that since d(υ(P` (x)), x) ≤ ∆` and 10Γ∆` < d(x, y) we have |B(υ(P` (x)), 9Γ∆` )| ≤ |B(x, d(x, y))| ≤ k. As P` is η` -padded, with probability 1/2, d(x, X \ P` (x)) ≥ ηP,` (x)∆` . Since P` (x) 6= P` (y) with probability 1/4 we have σ` (P` (x)) = 1 and σ` (P` (y)) = 0 hence (t)
(t)
|f` (x) − f` (y)|p ≥
(ηP,` (x)∆` )p 8−5p ηP,` (x)p−1 d(x, y)p ≥ . ηP,` (x) · ϑ(log |B(υ(P` (x)), 9Γ∆` )|) ϑ(log k)
Case 2: ξ¯P,` (y) = 1. Then we claim that |B(υ(P` (y)), 9Γ∆` )| ≤ 2k, since otherwise d(υ(P` (x)), υ(P` (y))) ≤ ∆` /2 + ∆` /2 + ∆`−5 ≤ 2∆`−5 which suggests that |B(υ(P` (x), Γ3 ∆` ))| ≥ |B(υ(P` (y)), 9Γ∆` )| > 2k, however |B(υ(P` (x), 8∆` ))| ≤ |B(x, d(x, y))| ≤ k, therefore ξ¯P,` (x) = 1 and we should be in Case 1. As P` is η` -padded with probability 1/2, d(y, X \ P` (y)) ≥ ηP,` (y)∆` . Since P` (x) 6= P` (y) with probability 1/4 we have σ` (P` (y)) = 1 and σ` (P` (x)) = 0 hence (t)
(t)
|f` (x) − f` (y)|p ≥
8−5p ηP,` (y)p−1 d(x, y)p (ηP,` (y)∆` )p ≥ . ηP,` (y) · ϑ(log |B(υ(P` (y)), 9Γ∆` )|) ϑ(log 2k)
Since for all ` ∈ I and x ∈ X: ηP,` (x) ≥ τ /2 we conclude that with probability 1/8 (t)
(t)
kf (t) (x) − f (t) (y)kpp ≥ |(f` (x) − f` (y))|p ≥ 2−17p τ p−1 ϑ(log k)−1 · d(x, y)p .
Lemma 35. There exists a universal constant C > 0 such that w.h.p for any (x, y) ∈ X such that y ∈ B(x, rk (x)): C · τ 1−1/p · d(x, y) ≤ kf (x) − f (y)kp ≤ d(x, y). ϑ(log k)1/p Proof. By definition kf (x) − f (y)kpp = D−1
X
|f (t) (x) − f (t) (y)|p .
1≤t≤D
Lemma 32 implies that after scaling by C1 kf (x) − f (y)kpp ≤ d(x, y)p . Using Lemma 33 and applying Chernoff bounds we get w.h.p for any x, y ∈ X: kf (x) − f (y)kpp ≥ C · τ p−1 · ϑ(log k)−1 · d(x, y)p .
To achieve worst case distortion O(log n) follow the same argument as in sub-section 5.2.
25
E
Proof of Theorem 11
−1 Let D = Θ(log n). We will define an embedding g : X → `D ϑ(log k)). p with scaling proximity distortion O(τ L (t) −1/p (t) We define g by defining for each 1 ≤ t ≤ D, a function g : X → R+ and let g = D g . Fix t, 1≤t≤D (t) i 1 ≤ t ≤ D. In what follows we define g . Let I = [log8 diam(X)]. For every i ∈ I construct a 8 -bounded uniformly τ -padded probabilistic partition Pˆi as guaranteed by Definition 8. For all i ∈ I fix a partition (t) Pi ∈ Pi . Let {σi (C)|C ∈ Pi , 0 < i ∈ I} be i.i.d symmetric {0, 1}-valued Bernoulli random variables. The (t) embedding is defined as follows: for each x ∈ X, 0 < i ∈ I let gi : X → R+ , by (t)
(t)
gi (x) = σi (Pi (x)) · and let g (t) (x) =
P
i∈I
d(x, X \ Pi (x)) , ϑ(i)
(t)
gi (x).
Lemma 36. For any x, y ∈ X: |g (t) (x) − g (t) (y)| ≤ d(x, y). (t)
(t)
Proof. For all i ∈ I, if Pi (x) = Pi (y) then by the triangle inequality gi (x) − gi (y) ≤ (t) gi (x)
−
(t) gi (y)
≤
(t) gi (x)
≤
d(x,y) ϑ(i) .
X
(t)
d(x,y) ϑ(i) ,
otherwise
So we get in any case that (t)
(gi (x) − gi (y)) ≤ d(x, y)
0