Diffusion Maps, Spectral Clustering and Reaction Coordinates of Dynamical Systems
Boaz Nadler Department of Mathematics Yale University
1
Analysis of Data Sets in High Dimensions In today’s world - massive amounts of data are generated, measured, collected and stored. Applications: CS, Engineering, Physics, Chemistry, Biology, Statistics, Finance, . . . BUT THE QUESTION IS: what to do with this data ? How to analyze it ? How to subsample it? ? How can we extract useful information from it ?
2
Talk Overview 1. Examples in different applications and generic data analysis tasks. 2. Diffusion Maps - a set of tools for (high-dimensional) data analysis and organization. 3. Mathematical theory 4. Application of diffusion maps to some examples. 5. Summary and future work
3
Statistics/CS - Classification/Clustering Classification: Given N points {xi } ∈ Rp , that are labelled as red/blue, find f : X → {red,blue} with good generalization performance on new (unseen) data. 1
0.5
0
−0.5
−1
−1.5
−1
−0.5
0
0.5
1
1.5
Question: How would you classify the two green points ? 4
Clustering Clustering: Data is not labelled, but we think/know it can be decomposed into a few clusters. 4
3
2
1
0
−1
−2 −3
−2
−1
0
5
1
2
3
Stochastic systems in Physics/Chemistry/Biology Consider a stochastic differential equation (SDE) in p À 1 dimensions ˙ x˙ = −∇U (x) + w x(t) ∈ Rp - configuration at time t U (x) : potential w(t) : p-dimensional Brownian motion Examples: macromolecules and proteins in water, interacting particle systems, chemical reactions, low dimensional projections of deterministic systems Show 7 Lennard-Jones clips
6
Simulation Problems Let {xi }N i=1 denote N configurations sampled from many simulations of the SDE. Questions: - Can one infer the geometry of this dataset: location of potential wells, most likely transitions between them, mean first passage times? - Design of efficient fast simulations based on a few long simulation runs. - Separation of ”slow” variables from ”fast” variables. - Design dynamical approximations in the slow variables.
7
The geometry of finite data sets p Let {xi }N i=1 be N points in R .
How do we know if the points (approximately) lie on a curve ? How do we interpolate this curve ?
1 0 −1 5
5
0 −5 −5
0
Suppose the points lie on a low dimensional manifold. How do we approximate the Laplace-Beltrami operator on the manifold ? 8
Problems in High Dimensions Many algorithms are based on all distances kxi − xj k In high dimensions, while small distances are meaningful, large distances are (almost) meaningless.
1
1
Yet typically high dimensional data has low intrinsic dimension. dimensional reduction: How can we embed data in a low dimensional space (and find the intrinsic dimensionality) ? 9
Analysis of Data Sets in High Dimensions Clustering Image Analysis Documents
Computer Science Machine Learning
Financial Data Dimensional Reduction
Physics Chemistry
Diffusion Maps
Statistics
Biology
Dynamical systems
Classification, Regression
Metastable states Slow variables
Mathematics
10
Geometry of Datasets
Classification Problem Revisited 1
0.5
0
−0.5
−1
−1.5
−1
−0.5
0
0.5
1
1.5
How to classify the green points ? Suppose we had to classify many additional points, which were all given to us at once ?
11
Classification Problem Revisited
1
0.5
0
−0.5
−1
−1.5
−1
−0.5
0
0.5
1
1.5
Intuitively: Upper point classified as red, lower green point classified as blue. Partially Labelled Data (Zummer,Nyogi,Lafon,others,...) 12
Classification Problem Revisited ε = 0.0006 1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1 −1
0
1
−1
0
1
Diffusion of colors according to local connectivity Local Connectivity between nearest neighbors is what matters Want to integrate local geometric information to a global picture Can we define a connectivity metric ?
13
Diffusion Maps p Let {xi }N denote a set of N points in R . i=1
View collection of data as a graph with N vertices and with connection strength between xi and xj given by kε (xi , xj ), where ¶ µ kx − yk2 kε (x, y) = exp − 2ε Construct a Markov chain random walk based on these weights: Mi,j = Pr{x(t + ε) = xi |x(t) = xj } = where pε (xj ) =
X
kε (xi , xj ) = D−1 L pε (xj )
kε (xi , xj ) = Djj
i
Claim: first few eigenvalues and eigenvectors of this matrix {λj , ψj } contain useful geometric information. 14
Diffusion Maps Method is more general. Kernel does not have to be Gaussian, for example: d(x, y)/(1 + d(x, y)). In fact, all we need are the following ingredients to measure global connectivity between points: 1) Assign a similarity measure between points, 2) Define a graph with the points as vertices and with these similarities as weights. 3) Perform a random walk on this graph, study its characteristics.
15
Diffusion Maps / Normalized Graph Laplacian The diffusion map at time t is defined as the non-linear embedding ¡ t ¢ t t x → Ψt (x) = λ1 ψ1 (x), λ2 ψ2 (x), ..., λk ψk (x) This construction is known in graph theory as the normalized graph Laplacian.
16
Data Analysis Applications / Motivation: - N-cuts: The second eigenvector gives an approximation to the normalized cut problem (Shi & Malik 97’). - Spectral Clustering - based on the first few eigenvectors (Weiss 99’). - Low Dimensional Data Representation: Ψ0 (x) = (ψ0 (x), ψ1 (x), . . . , ψk (x)) gives a low dimensional representation of the data (Coifman & Lafon 04’, Belkin & Niyogi 02’, others). - Kernel methods for classification and regression. (Sch¨olkopf, others) - Find metastable states in stochastic systems (Huisinga, Schutte)
17
Example: Clustering 2 0 −2 −2
0
2
−2
0
2
−2
0
2
2 0 −2
2 1 0 −1 −2
ψ1 depends only on x and not on y. 18
Diffusion Maps and the Geometry of the Data Questions concerning the diffusion maps: - Why do they work ? - Assume data randomly sampled from a density p(x) = e−U (x) . What is the connection between the eigenvectors and eigenvalues of M and the potential U ? - Can this tool be used for stochastic dynamical systems ? mathematical analysis of this algorithm?
19
Spectral Analysis of the Markov Chain The matrix Mf is a Markov matrix, adjoint to a symmetric one: Ms =
pε (x) kε (x, y) Mf (x, y) = p pε (y) pε (x)pε (y)
with the same eigenvalues. For ε large enough all points are (numerically) connected and therefore Mf has an eigenvalue λ0 = 1 with multiplicity 1, and an additional set of N − 1 non-increasing eigenvalues λj < 1. Both the right eigenvectors {φj }j≥0 and the left eigenvectors {ψj }j≥0 are a basis of RN .
20
Diffusion Distances The first right eigenvector φ0 (x) is the steady state distribution, lim Pr{y after t steps|x} = φ0 (y)
t→∞
Definition: Diffusion distance at time t, Dt2 (x0 , x1 )
= kp(y, t|x0 ) − p(y, t|x1 )k2w X = (p(y, t|x0 ) − p(y, t|x1 ))2 w(y) y
with weight w(x) = 1/φ0 (x). The diffusion distance is a measure of the dynamical proximity of the two points x0 , x1 , based on the random walk on the graph.
21
Diffusion Distances Spectral theorem: X 2 2 2 Dt (x0 , x1 ) = λ2t (ψ (x ) − ψ (x )) = kΨ (x ) − Ψ (x )k j 0 j 1 t 0 t 1 j j
Diffusion map converts the diffusion distance into Euclidian distance.
22
Diffusion Distances Since spectrum of M is in [0,1], only very few eigenfunctions are needed to describe the long time diffusion distance. 1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
64
M
←M
← M2
→
0.2
0.1
0
0
10
20
30
40
50
60
70
80
90
100
Long term diffusion has low rank captured by the first few eigenvectors of M - success of diffusion maps in dimensional reduction. 23
Dimensional Reduction by Diffusion Maps Consider a k-dimensional reduction of the evolution density pˆ(y, t|x) = φ0 (y) +
k X
αj (x)vj (y)
j=1
Optimality Theorem: The optimal k-dimensional approximation of p(x, t|y) that minimizes the mean squared norm of the approximation error min Ey {kpe (x, t|y) − pˆe (x, t|y)k2L2 (w) } where averaging is over all initial points y sampled according to the equilibrium density φ0 (y), is given by vj (x) = φj (x) and αj (y) = ψj (y)e−λj t . Question: What are the eigenfunctions that we computed ? 24
Asymptotics of the Diffusion Map To answer this question, we consider the asymptotic limit of the diffusion map as number of samples N → ∞ and in the limit ε → 0. Statistical Model: data is randomly sampled from smooth measure µ(x)dx = e−U (x) dx As N → ∞, the discrete in time and in space Markov chain converges to a Markov jump process in continuous space with isotropic transition probability density Pr{x(t + ε) = x|x(t) = y} = Mf (x, y) = where
Z pε (y) =
kε (x, y)p(x)dµ(x)
in a space with non-uniform density µ(x). 25
kε (x, y) pε (y)
Asymptotics of the Diffusion Map The forward operator Tf corresponding to this jump process is: Z ψ(x, ε) = Tf ψ = Mf (x, y)ψ(y, 0)dµ(y) Similarly
Z φ(x, ε) = Tb φ =
Mf (y, x)φ(y, 0)dµ(y)
is the backward operator. The right and left eigenvectors and eigenvalues of the matrix Mf for the finite dataset can be viewed as discrete approximations of the eigenfunctions and eigenvalues of the operators Tf , Tb .
26
The Operators Tf and Tb - Probabilistic Interpretation Let ψ(y, 0) be an initial probability density at time t = 0. Then Z (Tf φ)(x, ε) = Mf (x|y)φ(y, 0)dµ(y) is the time evolution of this density to time t = ε. Similarly, let ψ(x) be a function, then Z (Tb ψ)(y) = Mf (x|y)ψ(x)dµ(x) is the mean value of this function at time t = ε, given an initial condition δ(x − y). Operators Tf and Tb are adjoint under the measure µ, and share the same eigenvalues.
27
The limiting diffusion process As N → ∞, we can also take ε → 0 since there are enough points in a ball of radius O(ε1/2 ) near each x for local density estimation pε (x). As ε → 0 the jump process converges to a diffusion process. Define the infinitesimal generator (propagator) Tf − I Hf = lim ε→0 ε It can be shown that in the limit ε → 0, the forward time evolution of the probability density is given by the PDE ∂φ = Hf φ = ∆φ − φ (∇U · ∇U − ∆U ) ∂t Hf is a Schr¨odinger type operator of the form (∆ − V )φ, with a potential of the special form V = k∇U k2 − ∆U 28
The backward infinitesimal generator The backward time evolution is ∂ψ − = Hb ψ = ∆ψ − 2∇ψ · ∇U ∂t This result shows that the eigenvalues of Tf , Tb correspond to those of a Fokker-Planck operator with a potential 2U (x). Intimate connection between FP and Schr¨odinger operators, well known in physics (Bernstein & Brown, PRL, 84’): The trasformation p = e−U/2 ψ to the forward FPE ∂p = ∇ · (∇p + p∇U ) ∂t gives ∂ψ = ∆ψ − ψ ∂t
µ
∇U · ∇U ∆U − 4 2 29
¶
Asymptotics of the Diffusion Map ε>0
finite N × N
R.W. discrete in space
N 0
operator
R.W. continuous in space
N →∞
Tf
discrete in time
ε→0
infinitesimal
diffusion process
N →∞
generator Hf
continuous in time & space
30
Intermediate Summary Given N data points, we defined a graph and a random walk on it. In the limit N → ∞, ε → 0, we obtain a diffusion process with an isotropic kernel but in a space with non-uniform density. If data is sampled from µ(x) = e−U (x) , then the discrete eigenvectors are finite approximations to the eigenfunctions of a Fokker-Planck equation with a potential 2U (x). The first few eigenvectors correspond to the long time asymptotics of a stochastic Langevin process with a potential 2U (x). These have been studied extensively both by physicists and mathematicians. (Matkowski & Schuss (1980), many others)
31
Example - Clustering Suppose U (x) has two symmetric wells (e.g., two clusters), then so does 2U (x). Let φL and φR denote the zeroth forward eigenfunctions in each of the two wells, µ ¶ 00 2 2U (xL )(x − xL ) φL (x) ≈ exp − 2 Then, the first (non-trivial) backward eigenfunction is φL (x) − φR (x) ψ1 (x) = φL (x) + φR (x) This function almost constant in each of the wells with sharp transition between them → good coordinate for clustering. (Unsupervised) cluster definition according to sign(ψ1 ) achieves optimal Bayes error in the limit N → ∞, ε → 0. 32
Example - Clustering 3500 data points
Potential U(x,0)
1.5 5 1 4 3
0
2
−0.5
1
−1 −1.5 −2
0
2 x
4
6
0
Backward Eigenvector φ1 vs. x
2
4
Three Eigenvectors φ1,φ2,φ3
1.5 2
1
1 0.5 φ3
y
0.5
0
−1
−0.5
−2 2
−1 −1.5 −2
0
0 2
−2 0
2
4
φ2
6
33
−4
0 −2
φ1
Stochastic Dynamical Systems The mathematical analysis we presented gives a better understanding of this algorithm. Additional Questions: 1) Can we use this tool to analyze data from high dimensional stochastic dynamical systems ? 2) Can we design efficient simulations based on these results ?
34
Stochastic Systems Consider a stochastic differential equation in n À 1 dimensions ˙ x˙ = −∇U (x) + w where x(t) ∈ Rn - configuration at time t U (x) : potential w(t) : n-dimensional Brownian motion
35
Time Evolution Let p(x, t|y, t0 ) denote the forward transition probability density (t > t0 ). It satisfies the Fokker-Planck (FP) equation ∂p = ∇ · [∇p + p∇U ] ∂t The steady state follows the Boltzmann distribution p(x) = e−U (x) /Z,
Z − normalization factor
36
Time Evolution Approach to steady state governed by the eigenfunctions with the smallest eigenvalues of the FP operator. p(x, t) =
X
aj e−λj t φj (x)
j
When n À 1 the FP equation cannot be solved numerically or analytically. However, simulations of the SDE are ”easily performed” Given a lot of simulated data {xi }N i=1 ,N À 1, possibly from many different trajectories, can we approximate the first few eigenfunctions φj (x) ?
37
Weighted Diffusion Maps Original kernel gives eigenfunctions corresponding to 2U (x). If instead we define a new weighted kernel kε (x, y) k˜ε (x, y) = p pε (x)pε (y) and consider the normalized graph Laplacian (diffusion map) based on this kernel, then as N → ∞, ε → 0, ∂p ˜ f p = ∆p − ∇p · ∇U =H ∂t This equation is the same as the backward FP equation of the SDE ˙ x˙ = −∇U + w
38
Weighted Diffusion Maps Summary: Let xi be N configurations from the original SDE. As N → ∞, and ε → 0, the eigenvalues and eigenfunctions of the weighted graph Laplacian from this finite dataset converge to the eigenvalues and eigenfunctions of the corresponding backward Fokker-Planck equation. In particular, the first few eigenfunctions give a parametrization of the slow variables, since after a long enough time, the contribution from all other eigenfunctions with smaller eigenvalues is exponentially small. This (asymptotic) analysis suggests the use of the diffusion maps to empirically find the slow variables from large datasets of simulated data.
39
Analytical Examples Example #1: 1-D harmonic potential / Orenstein-Uhlenbeck process: U (x) = x2 /2τ In this case analytical calculations are possible: Z 2 2 1 1 √ pε (x) = e−(x−y) /2ε e−U (y) dy = p e−x /2(τ +ε) 2πε 2π(τ + ε) Eigenvalues and eigenfunctions of Tf are λk = (τ /(τ + ε))k ,
2
φk = pk (x)e−x
/2(τ +ε)
where pk is a polynomial of degree k. As ε → 0, (λk − 1)/ε → k, while pk → Hk , the Hermite polynomials (eigenfunctions of Schrodinger’s equation with harmonic potential).
40
Analytical Examples Example #2: 2-D harmonic potential: U (x) = x21 /2τ1 + x22 /2τ2 , with τ1 À τ2 . In this case e−U (x) = e−U1 (x1 ) e−U2 (x2 ) and similarly k(x, y) = k1 (x1 , y1 )k2 (x2 , y2 ). Therefore, general eigenvalue/eigenfunction structure is λi,j = λi1 λj2
φi,j (x) = φ1,i (x1 )φ2,j (x2 )
Since τ1 À τ2 , first few non-trivial eigenfunctions are φ1,0 , φ2,0 which give a parametrization of the slow variable x1 . The same analysis extends to the general n-dimensional harmonic potential.
41
Numerical Example - 2D potential Consider U (x) = x2 /2τ1 + y 2 /2τ2 , with τ1 = 1, τ2 = 1/25. φ1 (x, y) ≈ x, φ2 (x, y) ≈ x2 + c as predicted by theory. 2−D Diffusion map φ ,φ for ε=0.25
3500 data points
1 2
1 2 0 1 −1 0 −2 −1 −3
−2 −2
0
−4 −4
2
−2
φ1 vs. x1
0
2
4
2
4
φ2 vs. x1
3
1
2
0
1 −1 0 −2 −1 −3
−2 −3 −4
−2
0
2
−4 −4
4
42
−2
0
Semi-Analytical Example 3.5
3
2.5
2
1.5
Example #3: 2-D double well potential.
1
0.5
0 −1
U (x, y) = x4 /8 − x3 + 2x2 + y 2 /2τ2 3500 data points
0
1
2
3
4
5
τ2 = 1/10 ¿ 1 Potential U(x,0)
1.5 5 1 4 3
0
2
−0.5
1
−1 −1.5 −2
0
2 x
4
6
0
Backward Eigenvector φ1 vs. x
2
4
Three Eigenvectors φ1,φ2,φ3
1.5 2
1
1 0.5 φ3
y
0.5
0
−1
−0.5
−2 2
−1 −1.5 −2
0
0 2
−2 0
2
4
φ2
6
43
−4
0 −2
φ
1
Double Well Potential For a double well potential, approximations to the first and second forward eigenfunctions are φ0 ≈ φL + φR
φ1 ≈ φL − φR
where φL , φR are the eigenfunctions of the two single wells. Therefore, the second backward eigenfunction is ψb,1
φL − φR ≈ φL + φR
and has ”roughly” a shape of an arctan function. It gives a parametrization of the most likely path between the two wells, e.g. the slow variable.
44
Double Well Potential with 2 connecting paths µ U (x, y) =
4
2
x y 3 2 − x + 2x + + 6 exp(−2(x − 2)2 − 10y 2 ) 8 2τ2 x4/8 − x3 + 2 x2 +...+ 6 exp(−2 ((x−2)2+5 y2))
10 8 6 4 2 0 1
5 4
0.5 3 0
2 1
−0.5 y
0 −1
45
−1
x
¶
Double Well Potential with 2 connecting paths 2
1.5 0.1
1
0
1
−0.1 0.1
0 0.5
−1 −2 −5
0 0
5
10
0
0
5
10
−0.1 −0.05
0
0.05
N=120,000 points subsampled to ∼2000, with δ-covering, δ = 0.05 Note that λ1 = 1 − 0.00001 Diffusion map clearly shows the two connecting paths between the two wells.
46
3-well potential 2
U (x, y) = 3e−x
−(y−1/3)2
−3e−x
2
−(y−5/3)2
2
−5e−(x−1)
−y 2
2
−5e−(x+1)
−y 2
2 deep wells at (-1,0), (1,0) which are reactant and product, and a shallow one at (0,5/3) [Schulten,others] 2
2
2
2
3 exp(−x − (y−1/3) )−...−5 exp(−(x+1) − y )
0
−1
−2
−3
−4 3
3 2
2 1
1 0
0 −1
−1 −2
−2 −3
y
47
−3
x
3-well potential (kT = 1) 4 3
0.1
2 1
0
0 −1
−0.1 0.05
−2
0.2
0 −3 −4
−2
0
2
−0.05
4
0
2-D diffusion map clearly shows the 3 wells and the most common paths between them.
48
7 Lennard-Jones Spheres Consider a system of 7 spheres in 2-D interacting with each other with a potential ¸ ·³ ´ ³ ´ 1 6 σ σ 2− ULJ (r) = 4α r r 7
6
5
4
3
2
1
0
−1 0.5
1
1.5
2
49
2.5
3
7 LJ spheres Dynamics of the system is according to x˙ = −∇U (x) +
√
˙ 2w
where U is the sum of all pairwise interactions X U (x1 , . . . , x7 ) = ULJ (kxi − xj k) i≤j
Each configuration is in 14 dimensions. Questions: What are the meta-stable states (clustering). How do you move from one to the other ? What is the mean transition time between these states ?
50
7 LJ spheres In general, solving a Fokker-Planck equation in 14 dimensions is not feasible by conventional methods: Number of grid points ≈ (1/h)d Monte-Carlo/Brownian Dynamics simulations are easily done. Can get millions of data-points. Some meta-stable states are easily observed.
51
LJ Spheres C
C
0
3
2
1.5 1
1
0.5 0
0
−0.5 −1
−1
−1.5 −2
−2
−2
−1
0
1
2
0
1
2
C1
3
4
5
C2
1.5
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5 0
1
2
3
4
0
52
1
2
3
4
Agent Model p (= 5000) agents buying or selling stocks, denoted xi . Each xi ∈ [−1, 1], 1 means sell, -1 means buy. x(t + dt) = x(t) − γx(t)dt + I(t) where I(t) - market information + influence from g other investors. If other investors buy agent is inclined to buy as well. Buy and sell rates are ν ± = ν0± + gR± with jump sizes ε± .
53
Simulation Results If g is ”small”, approach to very stable equilibrium distribution. As g gets larger, there is a metastable equilibrium and a collapse after some random time. 0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0 −1
−0.8
−0.6
−0.4
−0.2
0
54
0.2
0.4
0.6
0.8
1
Diffusion Map on Agent Model 1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
φ2
1
0
0
0.2
0.4
φ1
0.6
0.8
0
1
0
2
4
6
8
10
The right plot shows the eigenvalues which quickly decrease to zero, asserting that only a few (slow) coordinates are needed to describe the long time evolution of the system. The left plot shows a representation of the data-points in the first two diffusion coordinates, with points color-coded by density. The top of the parabola is the (high density) metastable equilibrium, while the left branch is the collapse. 55
Efficient Simulation Suppose aim is to compute mean time till collapse. The straightforward scheme is to simulate many runs initialized at the metastable state and run them till they collapse. Inefficient: Spend most of the time near the equilibrium. Each run is very long and need many of them to get good accuracy. In the exponential regime of high barriers need O(1/ε2 ) simulations to get relative accuracy of O(ε). Diffusion maps give us a low dimensional representation of the system. What we would like is to initialize simulations at different places along this low dimensional representation and by means of multiple appropriately initialized short simulations: find the empirical dynamics on these reaction coordinates 56
Simulation Scheme
Xinitial
Xfinal(∆ t)
Lifting
Restriction
φinitial
φfinal(∆ t)
57
Restriction and Lifting RESTRICTION - a many-one mapping from a high-dimensional description (such as a collection of particles in Monte Carlo simulations) to a low-dimensional description - the first eigenvector of the diffusion map in our case. LIFTING - a one-many mapping from low- to high-dimensional descriptions. We do the short(!) step-by-step detailed simulation in the high-dimensional description. We do the macroscopic tasks in the low-dimensional description. Basic assumption is smoothness of the dynamics (φ(t + dt) is ”close” to φ(t)). 58
Restriction Scheme The eigenfunctions ψj satisfy the relation Z Mb (x, y)ψj (y)dµ(y) = λj ψj (x) Γ
for all x ∈ Γ = {xi }N i=1 For new data point x use the same formula, known as Nystrom extension Z 1 ψˆj = Mb (x, y)ψj (y)dµ(y) λj Γ Only first few eigenfunctions with λj close to 1 can be numerically extended.
59
Agent Model 1
140 120
0.8
100 0.6
80
0.4
60 40
0.2 0 −0.5
20 0
0.5
1
0 0.2
1.5
0.4
0.6
0.8
Left: Out of sample extension of φ1 and φ2 . Right: Empirical transition probabilities of φ1 for ∆t = 0.75. rightmost distribution is from initial value of φ = 0.9. An estimate of the mean exit time can be computed from these transition probabilities. 60
1
Laplace Beltrami Suppose data is on a manifold. Want to compute the eigenfunctions of the Laplace Beltrami (Heat) operator on the manifold. If data is uniformly sampled, U = 0 and the normalized graph Laplacian gives a discrete approximation. What if data is non-uniformly sampled ? A different normalization of the kernel, k(x, y) kLB (x, y) = pε (x)pε (y) gives in the asymptotic limit the eigenfunctions of the Laplace-Beltrami operator on the manifold.
61
Example: Points on a Curve Interpolation
Embedding 2 1
1
1
0
0
0
−1 5
−1
−1 5
5
0 −5 −5
0
−2 −2
5
0 0
2
−5 −5
0
example courtesy of Stephane Lafon
62
Summary and Future Work • Looking at properties of random walks on large datasets is a good idea (not ours !). • Mathematical modeling and analysis gives a new insight into this algorithm: identifies it as a discrete approximation of a diffusion process. • A family of diffusion maps with different kernels k(x, y) kα (x, y) = α pε (x)pα ε (y) and different infinitesimal generators, Hb φ = ∆φ − 2(1 − α)∇φ · ∇U α = 0, 1/2, 1 are interesting cases. 63
Summary and Future Work • Diffusion Maps define a connectivity metric. • Local information is ”integrated” into a global description. • Many applications: dimensional reduction, clustering, geometry of finite datasets, analysis of dynamical systems. Collaborators: Ronald Coifman, Stephane Lafon, Mauro Maggioni, Ann Lee (Yale). Yannis Kevrekidis, Ligang Chen (Princeton). http://pantheon.yale.edu/∼bn29
64