Heat Kernel Based Community Detection
Joint with
David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF
Kyle Kloster! Purdue University!
Local Community Detection Given seed(s) S in G, find a community that contains S. seed
“Community” ?
Local Community Detection Given seed(s) S in G, find a community that contains S. seed
“Community” ?
high internal, low external connectivity !
Low-conductance sets are communities conductance( T ) =
# edges leaving T
# edge endpoints in T
= “ chance a random step exits T ”
Low-conductance sets are communities conductance( T ) =
# edges leaving T
# edge endpoints in T
= “ chance a random step exits T ”
conductance(comm) = 39/381 = .102
How to find these ?!
Graph diffusions find low-conductance sets A diffusion propagates “rank” from a seed across a graph. seed
= high! = low!
diffusion value!
Graph diffusions find low-conductance sets A diffusion propagates “rank” from a seed across a graph. seed
= high! = low!
diffusion value!
= local community / ! low-conductance set!
Okay…" how does this work?
Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.
seed p0
p1
p2
p3
…
Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.
“diffuse” seed p0
p1
p2
p3
…
Graph Diffusion
“diffuse”
A diffusion models how a mass seed (green dye, money, popularity) p3 … p0 p1 p2 spreads from a seed across a network."
Once mass reaches a node, it propagates to the neighbors, with some decay.
“decay”: dye dilutes, money is taxed, popularity fades
Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network.
seed p0
p1
p2
p3
" Once mass reaches a node, it “diffuse” propagates to the neighbors, with some decay.
“decay”: dye dilutes, money is taxed, popularity fades
…
Graph Diffusion A diffusion models how a mass seed (green dye, money, popularity) p3 … p0 p1 p2 spreads from a seed across a network."
Once mass reaches a node, it propagates to the neighbors, with some decay.
“decay”: dye dilutes, money is taxed, popularity fades
Diffusion score “diffusion score” of a node = " weighted sum of the mass at that node during different stages.
c0 p0 + c1 p1 + c2 p2 + c3 p3 +
…
Diffusion score “diffusion score” of a node = " weighted sum of the mass at that node during different stages.
c0 p0 + c1 p1 + c2 p2 + c3 p3 +
diffusion score vector = f!
f=
1 X
k=0
P = k
ck P s
s= ck =
random-walk transition matrix normalized seed vector weight on stage k
…
Heat Kernel vs. PageRank Diffusions Heat Kernel uses tk/k! "
Our work is new analysis for this diffusion.
t0 p + t1 p + t2 p + t3 p3 + 0 1 2 3! 1! 2! 0!
…
𝛼0 p0 + 𝛼1 p1 + 𝛼2 p2 + 𝛼3 p3 +
…
PageRank uses 𝛼k at stage k."
Standard, widely-used diffusion we use for comparison.
Heat Kernel vs. PageRank Behavior HK emphasizes earlier stages of diffusion. 0
Weight
10
=0.99
PR, 𝛼k
=0.85
HK, tk/k!
−5
10
t=1 t=5 0
20
t=15 40
60
80
100
Length
à involve shorter walks from seed,
à so HK looks at smaller sets than PR
Heat Kernel vs. PageRank Theory
PR
HK
good conductance
fast algorithm
Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”
“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]
Heat Kernel vs. PageRank Theory good conductance
fast algorithm
PR
Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”
“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]
HK
Local Cheeger Inequality [Chung 07]
Heat Kernel vs. PageRank Theory good conductance
fast algorithm
PR
Local Cheeger Inequality:" “PR finds set of nearoptimal conductance”
“PPR-push” is O(1/(ε(1-𝛼))) in theory, fast in practice [Andersen Chung Lang 06]
HK
Local Cheeger Inequality [Chung 07]
Our work!
Our work on Heat Kernel: theory THEOREM Our algorithm for a relative" ε-accuracy in a degree-weighted norm has
runtime