Introduction • Graph-based clustering approaches are widely employed • Simple, easily to understand, good results [Shi-Malik1997, Ng et al
2001, Chan et al 1993] • Graph data are widely available • Most of previous research focus on static analysis of graph • Graph partition seeks grouping using static optimization, cut edges between clusters • Stochastic modeling maximize the likelihood of a generative model on the graph. • Our work present a novel dynamic analysis of graph data • Inspired by Matthew effect, a general phenomenon in nature and
societies • Stronger connections become stronger • Expand and smooth social circles
Motivation • The relationship among people in a society changes in time • People are typically involved in many social events • E.g. meeting new friends, attending conferences like ECML here • The more we meet with each other in a conference, the more familiar we are
• People will connect with each other using the connection, like meeting
friends’ friends • Several observations • Two people with many common friends have a lot of chance to know each other • Two good friends have good chances to meet in the same social events, hence they know each more • Social Diffusion Process • An analogue of the social relationship evolution
Motivation case study: Facebook
Motivation case study: Facebook We will see the events of our friend’s friends
Motivation case study: Facebook
More common friends means more chance to know the event
Social Diffusion Process • Two friends set up a date. They meet. • Two friends set up a date. One brings along a friend. The
three of them meet. • Two friends set up a date. Both friends bring along a
friend each. The four of them meet. There exist more processes. But these are the most fundamental processes. We consider them only in this work.
Social Diffusion Process • Two friends set up a date. They meet. • Two friends (A,B) set up a date. One (B) brings along a
friend (C). The three of them meet. • A meets C
• Two friends (A,B) set up a date. Both friends bring along a
friend [A brings C. B brings D]. The four of them meet. • A meets D • B meets C;
• Most importantly, C meets D
• Diffusion: two person meet due to their friends’ initiative
Social Diffusion Process • Two friends set up a date. They meet.
• Two friends (A,B) set up a date. One (B) brings along a friend (C).
The three of them meet. • A meets C (two person meet due to a common friend)
A
C B
• Two friends (A,B) set up a date. Both friends bring along a friend
[ A brings C. B brings D ]. The four of them meet. • A meets D (two person meet due to a common friend) • B meets C (two person meet due to a common friend)
• Most importantly, C meets D (two person meet due to a friend’s friend)
C
D
A
B
Social Diffusion Process •
Two friends setup date. They meet Two friends setup date. One brings along a friend. They meet. Two friends setup date. Both bring along a friend. They meet.
Social Diffusion Process • Assume we want to date with some one on the wedding
of Royal wedding for William and Kate, who are we going to date? • We will bring important friends
• Observations • We will choose different level of friends to attend a different events
• The bring-friend action should have a threshold
Social Diffusion Process •
Social Diffusion Process • Uniform distribution
Diffusion constant Set to 1 in algorithm
Social Diffusion Process Model Define thresholded graph adjacency matrix as Proportional constant Set to 1 in algorithm
random walk probability : Pk i
Akit dk
Social Diffusion Process
Diffusion constant Set to 1 in algorithm
random walk probability : Pk i
Akit dk
Social Diffusion Process Algorithm
The only model parameter
Social Diffusion Process: a simple case
Social Diffusion Process: a simple case
Social Diffusion Process: a simple case
Applications • Clustering • Grouping results can be derived when disconnected components are observed • Preprocessing for other machine learning tasks • Our algorithm take a graph as input and a better graph as output • Can be used as preprocessing • Clustering, semi-supervised learning etc.
Experimental Results • Empirically show that our algorithm converges
• Clustering • Semi-supervised learning • MicroRNA data analysis
Experimental Results Convergence analysis
Experimental Results: Clustering
24 UCI Data Sets
Experimental Results: Semi-supervised Learning
Experimental Results: microRNA function analysis
Experimental Results: microRNA function analysis let-7 microRNA family
Experimental Results: microRNA function analysis
rna-200 microRNA family
Experimental Results: microRNA function analysis • The corresponding genes
Experimental Results: microRNA function analysis • Observations • 6 microRNA groups are identified • let-7 and mir-200 family a have been reported by other researchers [Hu 2009, Abbott 2005]
Conclusions • A novel social diffusion process model is presented • Dynamic graph evolution • Analogue of the Mathew effect • Simple, intuitive, interpretable • Directly corresponds to graph language
• Extensive experiments on 24 UCI data sets • Better clustering accuracy • Better semi-supervised learning performance • Unsupervised graph-data exploration • Almost no parameter • Easy to visualize • Meaningful results