Jensen-Bregman Voronoi diagrams and centroidal tessellations

Report 12 Downloads 94 Views
Jensen-Bregman Voronoi diagrams and centroidal tessellations Frank Nielsen www.informationgeometry.org

´ Ecole Polytechnique, LIX, France Sony Computer Science Laboratories, FRL, Japan (joint work with Richard Nock)

International Workshop on Voronoi Diagrams (ISVD 2010) June 28, 2010 (Mon.)

Last updated, June 23rd 2010

c 2010, Frank Nielsen — p. 1/32

Voronoi diagrams Fundamental combinatorial structure for proximty locations:

Dual Delaunay triangulation → Extend Voronoi diagrams to Jensen-Bregman divergences.

c 2010, Frank Nielsen — p. 2/32

Bregman divergences BF (p, q) = F (p) − F (q) − hp − q, ∇F (q)i,

F Hq′ pˆ BF (p, q) = Hq − Hq′ Hq

qˆ q

p

Kullback-Leibler (F (x) = x log x): KL(p, q) = Squared Euclidean L22 (F (x) = x2 ): Pd 2 L2 (p, q) = i=1 (p(i) − q (i) )2 = kp − qk2

Pd

i=1

p

(i)

log

p(i) q (i)

c 2010, Frank Nielsen — p. 3/32

Symmetrizing Bregman divergences Jeffreys-Bregman divergences.

SF (p; q) = =

BF (p, q) + BF (q, p) 2 1 hp − q, ∇F (p) − ∇F (q)i, 2

Jensen-Bregman divergences (diversity index).

JF (p; q) = =

p+q BF (p, p+q ) + B (q, F 2 2 ) 2   p+q F (p) + F (q) −F = BRF (p, q) 2 2

c 2010, Frank Nielsen — p. 4/32

ensen-Bregman divergences: Burbea-Rao divergences Based on Jensen’s inequality for a convex function F : d(x, p) =

F (x) + F (p) −F 2



x+p 2



equal

= BRF (x, p) ≥ 0.

strictly convex function F (·). BRF (p, q) =

d X

BRF (p(i) , q (i) ),

i=1

Includes the special case of Jensen-Shannon divergence: JS(p, q) = H



p+q 2



H(p) + H(q) − 2

F (x) = −H(x), the negative Shannon entropy H(x) = −x log x. → generators are convex and entropies are concave (negative generators) c 2010, Frank Nielsen — p. 5/32

Visualizing Burbea-Rao divergences

(p, F (p))

p+q F (p)+F (q) ( 2 , ) 2

(q, F (q))

BRF (p, q) p+q ( p+q , F ( 2 2 ))

p

p+q 2

q c 2010, Frank Nielsen — p. 6/32

Burbea-Rao divergences: Squared Mahalanobis

BRF (p, q) = = = =

F (p) + F (q) −F 2



p+q 2



2hQp, pi + 2hQq, qi − hQ(p + q), p + qi 4 1 (hQp, pi + hQq, qi − 2hQp, qi) 4 1 1 hQ(p − q), p − qi = kp − qk2Q . 4 4

(Not a metric. square root of Jensen-Shannon is a metric but not the square roots of all Burbea-Rao divergences.) For Q = I, we get the squared Euclidean distance.

→ Ordinary Voronoi diagrams are a special case of Jensen-Bregman Voronoi diagrams. c 2010, Frank Nielsen — p. 7/32

Skew Burbea-Rao divergences

(α)

BRF

:

(α)

BRF (p, q) =

(α)

BRF (p, q) = =

X × X → R+ αF (p) + (1 − α)F (q) − F (αp + (1 − α)q)

αF (p) + (1 − α)F (q) − F (αp + (1 − α)q) (1−α)

BRF

(q, p)

Skew symmetrization of Bregman divergences: equal

αBF (p, αp + (1 − α)q) + (1 − α)BF (q, αp + (1 − α)q) = (α)

BRF (p, q) = skew Jensen-Bregman divergences. c 2010, Frank Nielsen — p. 8/32

Bregman as asymptotic skewed Burbea-Rao (α) 1 BR F (p, q) 1−α (α) limα→0 α1 BRF (p, q)

BF (p, q) = limα→1 BF (q, p) = Proof:

F (αp + (1 − α)q) = F (p + (1 − α)(q − p)) ≃α≃1 F (p) + (1 − α)(q − p)∇F (p) Taylor F (αp+(1−α)q)−αF (p)−(1−α)F (q) ≃α→1 (1−α)F (p)+(1−α)(q−p)∇F (p)−(1−α)F (q) ≃α→1 (1 − α) (F (p) − F (q) − (p − q)∇F (p)) (α)

limα→1 BRF (p, q) = (1 − α)BF (p, q)

For 0 < α < 1, swap arguments by setting α → 1 − α: (1−α)

(α)

BRF (p, q) = BRF

(q, p)

→ Extend to arbitrary α ∈ R by dividing by α(1 − α):

(α) BR′ F (p, q)

=

(α) 1 BR F (p, q) α(1−α)

or

′ ′ (α ) BR F (p, q)

=



( 1−α ) 4 2 BR (p, q) F (1−α′2 )

1−α′ 2 )

(with α = → Bregman Voronoi diagrams are special cases of (scaled) skew Jensen-Bregman Voronoi diagrams. Bregman Voronoi Diagrams, Discrete and Computational Geometry (Springer), 10.1007/s00454-010-9256-1, 2010. c 2010, Frank Nielsen — p. 9/32

Non-homogeneous distances Jensen-Bregman divergences are usually not homogeneous (D(kp, kq) = k λ D(p, q)) except for Burg entropy (λ = 0) F (x) = − log x, JF (p, q) = log

p+q √ 2 pq .

(logarithm of the ratio of the arithmetic mean over the geometric mean), Shannon entropy (λ = 1) F (x) = x log x,   2p 2q 1 JF (p, q) = 2 p log p+q + q log p+q

Squared entropy (λ = 2) F (x) = x2 , JF (p, q) = 14 (p − q)2

For homogeneous distances, D(p, q) = q λ D



 p ,1 . q

c 2010, Frank Nielsen — p. 10/32

Voronoi diagram as a minimization diagram Sites (generators), cells. V (pi ) = {p | JF (p, pi ) < JF (p, pj ) ∀j 6= i}. Anchored

distance function  F (pi )+F (x) pi +x Di (x) = JF (x, pi ) = −F 2 2  ≡ Di′ (x) = 12 F (pi ) − F pi2+x .

lower enveloppe Shannon Di (x)

lower enveloppe Shannon Di′ (x) c 2010, Frank Nielsen — p. 11/32

3D Jensen-Burg envelope

→ Vertical projection yields 2D Jensen-Burg Voronoi diagram. c 2010, Frank Nielsen — p. 12/32

Jensen-Bregman Voronoi Jensen-Bregman divergences are not necessarily convex. For example, consider F (x) = x3 on R+ (with F ′′ (x) = 6x). We have p Dp′′ (x) = 3(x − p+x ); This is non-negative for x ≥ 4 3 only. However, we have a special structure of the lower envelope. Minimization diagram mini Di (x) is equivalent to the minimization diagram of the functions Di (x) ≡ Di′ (x) =

1 F (pi ) − F 2



pi + x 2



.

(D ′ does not mean derivative function but simply Di (x) = Di′ (x) +

F (x) 2

)

c 2010, Frank Nielsen — p. 13/32

Jensen-Bregman bisector Bisector (p, q) (after removing F (p) −F 2 F |







F (x) 2

x+p 2



terms): F (q) −F = 2







x+q 2



.

x+q x+p F (p) F (q) − =0 −F + 2 2 2 {z 2 } {z }| {z } |

convex

Concave

Constante

Interpreted as the sum/intersection of a convex function F  F (q) x+p with a concave function −F 2 − 2 .

x+q 2





F (p) 2

In 2D, the iso-distance curves intersect in at most two points. (proof by contradiction, require strict convexity).

c 2010, Frank Nielsen — p. 14/32

Jensen-Bregman Voronoi In 2D, bisectors are pseudo-lines (pseudo-circles iso-distance contours) Complexity of the Jensen-Bregman Voronoi diagram is linear (planar graph)

Jensen-Shannon Voronoi diagram c 2010, Frank Nielsen — p. 15/32

2D Jensen-Burg Voronoi

c 2010, Frank Nielsen — p. 16/32

Centroidal Voronoi Tesselations CVT using Lloyd or L-BFGS algorithms (Limited ˝ ˝ ˝ BroydenUFletcher UGoldfarb UShanno).

c 2010, Frank Nielsen — p. 17/32

Applications of CVTs: Stippling Also called pointillism in computer graphics. (non-photorealistic renderings)

c 2010, Frank Nielsen — p. 18/32

Jensen-Bregman centroids Consider a finite point set {p1 , ..., pn }. c∗ = arg min c

n X

wi JF (pi , c) = arg min L(c) c

i=1

Minimization can be decomposed as L(c) = Lconvex (c) + Lconcave (c). (Under mild assumptions any function can be decomposed as a sum of a convex plus concave function) For the Jensen-Bregman centroid, this decomposition is given explicitly:

Lconvex (c) = Lconcave (c) =

F (c) 2   n X pi + c , F − 2 i=1 c 2010, Frank Nielsen — p. 19/32

Jensen-Bregman centroids (and barycenters) Use framework of CCCP (ConCave Convex Procedure): ∇F (x) =

x = ∇F

−1

n X i=1

wi ∇F

n X i=1



wi ∇F

x + pi 2





x + pi 2

!

Start from arbitrary c0 and iterate: xt+1 = ∇F −1

n X i=1

wi ∇F



xt + pi 2

!

.

Guaranteed convergence to a (local) minimum. (gradient descent methods require to fix learning rates)

c 2010, Frank Nielsen — p. 20/32

Jensen-Bregman centroids Consider a dense region X with an underlying density function ρ(·). arg min c∈X

Integral CCCP using c0 =

R

ct+1 = ∇F

Z

p∈X

−1

ρ(p)BRF (p, c)dp p∈X

ρ(p)dp, update as follows: R

p∈X

p+ct 2

ρ(p)∇F R ρ(p)dp p∈X



dp

!

→ In general, difficult to compute in closed-form (except for squared Mahalanobis). Approximate by discretizing.

c 2010, Frank Nielsen — p. 21/32

Jensen-Burg centroidal Voronoi tesselation We discretize cells.

c 2010, Frank Nielsen — p. 22/32

Jensen-Shannon centroidal Voronoi tesselation We discretize cells

c 2010, Frank Nielsen — p. 23/32

Application to statistical Voronoi diagrams For probability densities, Bhattacharyya similarity coefficient (and non-metric symmetric distance): Z p C( p, q) = p(x)q(x)dx,

0 < C(p, q) ≤ 1,

B(p, q) = − ln C(p, q).

(coefficient is always strictly positive) Hellinger metric s Z p p 1 ( p(x) − q(x))2 dx, H(p, q) = 2 such that 0 ≤ H(p, q) ≤ 1.

H(p, q) = =

s Z  Z Z p p 1 p(x)dx + q(x)dx − 2 p(x) q(x)dx 2 p 1 − C(p, q). c 2010, Frank Nielsen — p. 24/32

Chernoff coefficients/α-divergences Skew Bhattacharrya divergences based on Chernoff α-coefficients. Bα (p, q) = = =

− ln − ln

Z

Z

x

pα (x)q 1−α (x)dx = − ln Cα (p, q) q(x)

x



p(x) q(x)

− ln Eq [Lα (x)]



dx

Amari α-divergence:

Dα (p||q) =

   

4 1−α2

R



1−

R

p(x)

1−α 2

q(x)

1+α 2

p(x) log p(x) q(x) dx = KL(p, q),

   R q(x) log q(x) dx = KL(q, p), p(x)

 dx ,

α 6= ±1,

α = −1,

α = 1,

Dα (p||q) = D−α (q||p)

Remapping α′ =

1−α 2

(α = 1 − 2α′ ) to get Chernoff α′ -divergences

c 2010, Frank Nielsen — p. 25/32

Exponential families in statistics Probability measure Parametric

Non-parametric

Exponential families

Univariate

Non-exponential families

Multivariate Uniform

uniparameter

Bi-parameter

Beta β

Binomial

Bernoulli

Gamma Γ

Cauchy

L´evy skew α-stable

multi-parameter

Multinomial

Dirichlet

Weibull

Poisson

Exponential

Rayleigh

Gaussian

Finite moments of all orders.

c 2010, Frank Nielsen — p. 26/32

Exponential families in statistics Gaussian, Poisson, Bernoulli/multinomial, Gamma/Beta, etc.: p(x; λ) = pF (x; θ) = exp (ht(x), θi − F (θ) + k(x)) . Example: Poisson distribution λx p(x; λ) = exp(−λ), x! the sufficient statistic t(x) = x, θ = log λ, the natural parameter, F (θ) = exp θ, the log-normalizer, and k(x) = − log x! the carrier measure (with respect to the counting measure).

c 2010, Frank Nielsen — p. 27/32

Gaussians as an exponential family 

T

−1

1 (x − µ) Σ (x − µ)) √ p(x; λ) = p(x; µ, Σ) = exp − 2 2π detΣ



θ = (Σ−1 µ, 21 Σ−1 ) ∈ Θ = Rd × Kd×d , with Kd×d cone of positive definite matrices, F (θ) = 41 tr(θ2−1 θ1 θ1T ) −

1 2

log detθ2 +

d 2

log π,

t(x) = (x, −xT x), k(x) = 0.

Inner product : composite, sum of a dot product and a matrix trace : hθ, θ ′ i = θ1T θ1′ + tr(θ2T θ2′ ). The coordinate transformation τ : Λ → Θ is given for λ = (µ, Σ) by τ (λ) =





1 −1 λ2 , λ−1 λ , 1 2 2

τ −1 (θ) =



1 −1 1 θ2 θ1 , θ2−1 2 2



c 2010, Frank Nielsen — p. 28/32

Bhattacharyya/Chernoff of exponential families Equivalence with skew Burbea-Rao distances: (α)

Bα (pF (x; θp ), pF (x; θq )) = BRF (θp , θq ) = αF (θp )+(1−α)F (θq )−F (αθp +(1−α)θq ) Proof: Chernoff coefficients Cα (p, q) of members p = pF (x; θp ) and q = pF (x; θq ) of the same exponential family EF :

R R (x; θq )dx (x; θp )p1−α Cα (p, q) = pα (x)q 1−α (x)dx = pα F F R = exp(α(hx, θp i − F (θp ))) × exp((1 − α)(hx, θq i − F (θq )))dx R = exp (hx, αθp + (1 − α)θq i − (αF (θp ) + (1 − α)F (θq )) dx = exp −(αF (θp ) + (1 − α)F (θq )) × R exp (hx, αθp + (1 − α)θq i − F (αθp + (1 − α)θq ) + F (αθp + (1 − α)θq )) dx R = exp (F (αθp + (1 − α)θq ) − (αF (θp ) + (1 − α)F (θq )) × exphx, αθp + (1 − α)θq i − F (αθp + (1 − α)θq )dx Z = exp (F (αθp + (1 − α)θq ) − (αF (θp ) + (1 − α)F (θq )) ×

pF (x; αθp + (1 − α)θq )dx {z } | =1

(α)

Coefficient is always strictly positive. For θp = θq , Cα (θp , θq ) = exp −0 = 1 and Bα (θp , θq ) = 0. = exp(−BRF (θp , θq )) > 0.

c 2010, Frank Nielsen — p. 29/32

Summary of paper & results Extending Jensen-Shannon divergence to arbitrary convex (information) function (=negative entropy) Jensen-Bregman divergences (=Jensen or Burbea-Rao divergences) 2D Voronoi diagram is in linear complexity (bisector). Jensen-Bregman Voronoi diagrams extend Bregman Voronoi diagrams. Jensen-Bregman centroids (solved iteratively using CCCP) Jensen-Bregman centroidal Voronoi tesselations (CVTs, by discretization) Bhattacharyya distance of members of the same exponential family = Jensen-Bregman divergence on the natural parameters Statistical Voronoi diagrams, extending Bregman Voronoi diagrams (since asymptotically skewed Jensen-Bregman divergences yields Bregman divergences)

c 2010, Frank Nielsen — p. 30/32

References "Bhattacharyya clustering with applications to mixture simplifications," ICPR 2010. arXiv, 2010. http://arxiv.org/abs/1004.5049 "Sided and symmetrized Bregman centroids," IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2048-2059, June 2009. "Bregman Voronoi diagrams," Discrete & Computational Geometry, 2010. "On the convexity of some divergence measures based on entropy functions," IEEE Transactions on Information Theory, vol. 28, no. 3, pp. 489-495, 1982. "Statistical exponential families: A digest with flash cards," 2009, arXiv.org:0911.4863 A. Yuille and A. Rangarajan, "The concave-convex procedure," Neural Computation, vol. 15, no. 4, pp. 915-936, 2003. J. Zhang, "Divergence Function, Duality, and Convex Analysis," Neural Computation, vol. 6, 159-195, 2004. c 2010, Frank Nielsen — p. 31/32

Thank you www.informationgeometry.org www.informationgeometry.org/JensenBregman/

blog.informationgeometry.org

http://www.twitter.com/FrnkNlsn

Acknowledgements: We gratefully acknowledge financial support from DIGITEO (grant GAS 2008-16D) and French National Research Agency (ANR, grant GAIA 07-BLAN-0328-01). c 2010, Frank Nielsen — p. 32/32