Heat Kernel Based Community Detection - Purdue Math

Heat Kernel Based Community Detection

Joint with

David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF

Kyle Kloster! Purdue University!

Local Community Detection Given seed(s) S in G, find a community that contains S. seed

“Community” ?

Local Community Detection Given seed(s) S in G, find a community that contains S. seed

“Community” ?

high internal, low external connectivity !

Low-conductance sets are communities conductance( T ) =

# edges leaving T

# edge endpoints in T

= “ chance a random step exits T ”

Low-conductance sets are communities conductance( T ) =

# edges leaving T

# edge endpoints in T

= “ chance a random step exits T ”

conductance(comm) = 39/381 = .102

How to find these ?!

Graph diffusions find low-conductance sets A diffusion propagates “rank” from a seed across a graph. seed

= high! = low!

diffusion value!

Graph diffusions find low-conductance sets A diffusion propagates “rank” from a seed across a graph. seed

= high! = low!

diffusion value!

= local community / ! low-conductance set!

Okay…" how does this work?

Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.

seed p0

p1

p2

p3



Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.

“diffuse” seed p0

p1

p2

p3



Graph Diffusion

“diffuse”

A diffusion models how a mass seed (green dye, money, popularity) p3 … p0 p1 p2 spreads from a seed across a network."

Once mass reaches a node, it propagates to the neighbors, with some decay.

“decay”: dye dilutes, money is taxed, popularity fades

Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.

seed p0

p1

p2

p3

" Once mass reaches a node, it “diffuse” propagates to the neighbors, with some decay.

“decay”: dye dilutes, money is taxed, popularity fades



Graph Diffusion A diffusion models how a mass seed (green dye, money, popularity) p3 … p0 p1 p2 spreads from a seed across a network."

Once mass reaches a node, it propagates to the neighbors, with some decay.

“decay”: dye dilutes, money is taxed, popularity fades

Diffusion score “diffusion score” of a node = " weighted sum of the mass at that node during different stages.

c0 p0 + c1 p1 + c2 p2 + c3 p3 +



Diffusion score “diffusion score” of a node = " weighted sum of the mass at that node during different stages.

c0 p0 + c1 p1 + c2 p2 + c3 p3 +

diffusion score vector = f!

f=

1 X

k=0

P = k

ck P s

s= ck =

random-walk transition matrix normalized seed vector weight on stage k



Heat Kernel vs. PageRank Diffusions Heat Kernel uses tk/k! "

Our work is new analysis for this diffusion.

t0 p + t1 p + t2 p + t3 p3 + 0 1 2 3! 1! 2! 0!



𝛼0 p0 + 𝛼1 p1 + 𝛼2 p2 + 𝛼3 p3 +



PageRank uses 𝛼k at stage k."

Standard, widely-used diffusion we use for comparison.

Heat Kernel vs. PageRank Behavior HK emphasizes earlier stages of diffusion. 0

Weight

10

=0.99

PR, 𝛼k

=0.85

HK, tk/k!

−5

10

t=1 t=5 0

20

t=15 40

60

80

100

Length

à involve shorter walks from seed,

à so HK looks at smaller sets than PR

Heat Kernel vs. PageRank Theory

PR

HK

good conductance

fast algorithm

Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”

“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]

Heat Kernel vs. PageRank Theory good conductance

fast algorithm

PR

Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”

“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]

HK

Local Cheeger Inequality [Chung 07]

Heat Kernel vs. PageRank Theory good conductance

fast algorithm

PR

Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”

“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]

HK

Local Cheeger Inequality [Chung 07]

Our work!

Our work on Heat Kernel: theory THEOREM Our algorithm for a relative" ε-accuracy in a degree-weighted norm has

runtime