2010 International Conference on Pattern Recognition
A Supergraph-based Generative Model Lin Han, Richard C. Wilson , Edwin R. Hancock Department of Computer science,University of York, YO10 5DD,UK
[email protected] Abstract This paper describes a method for constructing a generative model for sets of graphs. The method is posed in terms of learning a supergraph from which the samples can be obtained by edit operations. We construct a probability distribution for the occurrence of nodes and edges over the supergraph. We use the EM algorithm to learn both the structure of the supergraph and the correspondences between the nodes of the sample graphs and those of the supergraph, which are treated as missing data. In the experimental evaluation of the method, we a) prove that our supergraph learning method can lead to an optimal or suboptimal supergraph, and b) show that our proposed generative model gives good graph classification results.
1. Introduction Relational graphs provide a convenient means of representing structural patterns. Examples include the arrangement of shape primitives or feature points in images, molecules and social networks. When abstracted in this way then complex data can be compared or matched using graph-matching techniques. Although matching problems, such as subgraph isomorphism or inexact graph matching provide a computational bottleneck, there are a number of effective algorithms based on probabilistic [12], optimization [11] or graphspectral [2] techniques that can give reliable results in polynomial time. However, despite considerable progress in the problems of representing and matching data using graph structures, the issue of how to capture variability in such representations has received relatively little attention. For vectorial patterns on the other hand, there is a wealth of literature on how to construct statistical generative models that can deal with quite complex data including that arising from the analysis of variability in shape. The main reason for the lack of progress is the difficulty in developing representations that capture 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.387
variations in graph-structure. This can manifest itself either as variations in a) edge-connectivity or as variation in b) node-composition and c) node or edge attributes. This trichotomy provides a natural framework for analyzing the state-of-the-art in the literature. Most of the literature can be viewed as modeling variations in node or edge attributes. In fact, most of the work on Bayes nets in the graphical models literature falls into this category [9]. There are also some well documented studies in the structural pattern recognition literature that also fall into this category including the work on Christmas et al. and Bagdanov et al. [1] who both use Gaussian models to capture variations in edge attributes. The problems of modeling variations in node and edge composition are more challenging since they focus on modeling the structure of the graph rather than its attributes. For the restricted class of trees, Torsello and Hancock [4] use a description length criterion to recover the node composition of trees from samples with unknown correspondences. Torsello and Dowe have recently made some progress in extending this method to graphs [3] using importance sampling techniques to overcome some of the computational bottlenecks. The problem of learning edge structure is probably the most challenging of those listed above. Although this is finessed in the case of trees since the operations of node and edge insertion or deletion are equivalent. Broadly speaking there are two approaches to characterizing variations in edge structure for graphs. The first of these is graph spectral, while the second is probabilistic. In the case of graph spectra, many of the ideas developed in the generative modeling of shape using principal components analysis can be translated relatively directly to graphs using simple vectorization procedures based on the correspondences conveyed by the ordering of Laplacian eigenvectors [6]. Although these methods are simple and effective, they are limited by the stability of the Laplacian spectrum under perturbations in graph-structure. The probabilistic approach is potentially more robust, but requires accurate corre1570 1566
spondence information to be inferred from the available graph structure. If this is to hand, then a representation of edge structure can be learned. To date the most effective algorithm falling into this category exploits a part-based representation [8]. Our aim in this paper is more ambitious.We aim to capture variations in edge structure by learning a probabilistic model of the adjacency matrix when correspondences are not available and must be inferred from the data.We follow Torsello and Hancock [4] and work with a supergraph representation from which each sample graph can be obtained by edit operations.The supergraph is represented by matrix in which each element reflects the probability of connection between a pair of nodes.To furnish the required learning framework,we extend the work of Luo and Hancock [5] and develop an EM algorithm in which the node correspondences and the supergraph edge probability matrix are treated as missing data.This novel technique is applied to a large database of object views,and used to learn class prototypes that can be used for the purposes of recognition.
2. Probabilistic Framework Given a sample of graphs, our aim in this paper is to learn a generative model that can be used to describe the distribution of the sample graphs and characterize the structural variations present in the set. Here we pose the problem as that of learning a supergraph from which each observed sample graph can be obtained by simple edit operations. We pose the problem as one of maximizing the expected log-likelihood function for the observed sample-graphs, with the supergraph and the correspondences between the nodes of the sample graphs and those of the supergraph treated as missing data. To commence our development we require the a posteriori probabilities of the sample graphs given the structure of the supergraph and the node correspondences. To compute these probabilities we use the method outlined in [5]. Suppose we are considering the match of the sample graph GD = (VD , ED ) where VD represents the node-set and ED represents the edge-set against a supergraph graph GM = (VM , EM ) which has node-set VM and edge-set EM . Further, suppose that the elements of the adjacency matrix of the sample graph D and the elements of adjacency matrix of the supergraph M are respectively 1 if (a, b) ∈ ED 1 if (α, β) ∈ EM Dab = , Mαβ = 0 otherwise
0 otherwise
We represent the correspondence matches between the nodes of the sample graph and the nodes of the supergraph using the assignment matrix S which has elements
1 if f (a) = α (1) 0 otherwise where f (a) = (α) means that node a ∈ VD is matched to node α ∈ VM . With these ingredients the a posteriori matching probability of the graphs GD and GM is [5] saα =
P (GD |GM , S) = Y X X X Ka exp[µ Dab Mαβ sbβ ] a∈VD α∈VM
where
e µ=ln 1−P Pe
(2)
b∈VD β∈VM
|VD |×|VM |
& Ka =Pe
Ba
(3)
In the above, Pe is the error rate for node correspondence and Ba is the probability of observing node a in graph GD . |VD | and |VM | are the number of the nodes in graphs VD and VM .
3. Learning the Supergraph Let the graphs in the sample set be {GDi } = {GD1 , ..., GDi , ..., GDN } and the supergraph be GM . The assignment matrices {S i } represent the correspondences between the nodes of the sample graphs and those of the supergraph. Under the assumption that the graphs in {GDi } are independent samples from the distribution, the likelihood of the sample graphs can be written as follows using the a posteriori probabilities P ({GDi }|GM , {S i }) = (4) Y Y X X X i i i Ka exp[µ Dab Mαβ sbβ ] i∈N a∈VDi α∈VM
b∈VDi β∈VM
We aim to locate the supergraph that maximizes likelihood function. To deal with the missing node correspondence matrices and the structure of the supergraph, we choose to use expectation maximization algorithm to locate the solution. Weighted log-likelihood function:According to [5], the expected log-likelihood function for observing sample graph GD , i.e. for it to have been generated by the supergraph GM is Λ(S (n+1) |S (n) ) = (5) X X (n) X X (n+1) Qaα {ln Ka + µ Dab Mαβ sbβ } a∈VD α∈VM
b∈VD β∈VM
(n)
where Q(n) is a matrix with elements Qaα that are equal to the a posteriori probability of node a in GD being matched to node α in GM at iteration n of the EM algorithm. To develop the expected log-likelihood function for our supergraph model, since we do not know the supergraph adjacency matrix M , we work with its expectation value P . From the set of sample graphs we have
1567 1571
Λ({S i }(n+1) |{S i }(n) ) =
X X
X
Expectation: In the expectation step of the EM algorithm, we compute the a posteriori probability of the nodes of the sample graphs being matched to those of the supergraph. Applying Bayes rule, the a posteriori probabilities of the nodes of graph GDi at iteration n+1 are given by
i Qi,(n) aα {ln Ka
i∈N a∈VDi α∈VM
+µ
X X
(n) i,(n+1) i Dab Pαβ sbβ }
(6)
b∈VDi β∈VM (n)
where Pαβ = E[ Mαβ ]= P ( Mαβ = 1| {GDi }, {S i }(n) ). Posed in this way, the estimation strategy is only computationally tractable using Monte-carlo sampling. The alternative is to assume a simple distribution for the supergraph edges. For instance, if we assume that the sample graph edges arise as independent samples from those of the supergraph under a Bernoulli distribution, then the likelihood becomes
Qi,(n+1) = (12) aα P P (n) i,(n) i,(n) i exp[ b∈V i β∈VM Dab Pαβ sbβ ]πα D P P P (n) i,(n) i,(n) i exp[ 0 α ∈VM b∈V i β∈VM Dab Pαβ sbβ ]πα0 D
i,(n)
i,(n)
In the above equation, πα0 = hQaα0 ia , where h ia means average over a.
P ({GDi }|GM , {S i }) = (7) Y Y X i i i i siaα sibβ Dab Pαβ (1 − Pαβ )1−saα sbβ Dab i∈N a,b∈VDi α,β∈VM
The trial success probability for the Bernoulli distribution Pαβ is equal to the expected number of successes, and so 1 X X i i i Pαβ = saα sbβ Dab (8) N
Figure 1. Four objects in the dataset.
4. Experiments
i∈N a,b∈VDi
In this section, we report some experimental results of our generative model on real-world data. The dataset used consists of images of 4 objects, with 20 different views of each object. Example images of the objects are illustrated in Figure 1. We extract feature keypoints in the images using the SIFT[7] detector and construct the sample graphs using Delaunay triangulation of the detected points. To initialize the structure of the supergraph, we match pairs of graphs from a same object using the discrete relaxation algorithm outlined in [10]. The algorithm outputs the matched node correspondences and labels the unmatchable nodes with high accuracy. We merge the common structures for pairs of graphs. The common structures over for the sample graphs are concatenated (merged) to form a supergraph. The initial supergraph constructed in this way well preserves the structural variations present in the set of sample graphs. The first part of our experimental investigation aims to validate the supergraph learning method. We iterate the two steps of the EM algorithm 50 times, and observe both how the structure of the supergraph changes and how the likelihood function changes with iteration number. We recover the structure of the supergraph at iteration n by setting ( (n) 1 if Pαβ > 0 (n) Mαβ = (13) 0 otherwise
To maximize the weighted log-likelihood function (6), we confine our attention to the second term under the curly braces, and which determines the update direction. The quantity of interest can be written as the summation of the traces of products of matrices that is i (n+1) ˆ Λ({S } |{S i }(n) ) = X T r[(Di )T Qi,(n) P (n) (S i,(n+1) )T ]
(9)
i∈N
Maximization: The maximization step involves recovering the elements in {S i }(n+1) that satisfy the condition S i,(n+1) = arg max T r[(Di )T Qi,(n) P (n) SˆT ] (10) ˆ S
Scott and Longuet-Higgins demonstrate that S i,(n+1) can be recovered by performing the singular value decomposition (Di )T Qi,(n) P (n) = V ∆U T , where V and U are orthogonal matrices and ∆ is a diagonal matrices. From the factorization, we construct the matrix E by making the diagonal elements in ∆ unity, and compute matrix R by setting R = V EU T . The elements of R are used to update the assignment indicators. If the element Raα is the maximum value in both its containing row and column, then the corresponding assignment indicator is set to unity; otherwise, it is set to zero. In other words, 1 if Raα = arg maxbβ Rbβ i,(n+1) Saα = (11) 0 otherwise
and measure the variation of the supergraph struc1568 1572
Table 1. Comparison of the classification results. The bold values are the average classification rates from 10-fold cross validation, followed by their standard error.
(a)
classification rate initial supergraph 0.663 ± 0.038 median graph 0.575 ± 0.020 learned supergraph 0.838 ± 0.053
(b)
Figure 2. (a) variation of the von-Neumann entropy during iterations and (b) variation of the likelihood during iterations. ture P using the Von-Neumann entropy Entropy = − i λ2i log λ2i where λi are the eigenvalues of the normalized Laplacian matrix of the supergraph that is deˆ = T −1/2 (T − M )T −1/2 , where T is the fined as L degree matrix of the supergraph and M is the adjacency matrix. The von-Neumann entropy can be used as an indicator of structural complexity of the supergraph. From Figure 2(a), it is clear that the von-Neumann entropy of the supergraph decreases as the iteration number increases. This indicates that the supergraph structure both condenses and simplifies as the number of iterations increases. Figure 2(b) shows that the product of the a posteriori probabilities of the sample graphs, i.e. the likelihood, increases and gradually converges as the number of iterations increases. In other words, our algorithm behaves in a stable manner both increasing the likelihood of sample graphs and simplifying the supergraph structure. Secondly, we evaluate the effectiveness of our generative model learned using the EM algorithm for classifying graphs. We learn a supergraph for each object class from a set of samples and use equation (2) to compute the a posteriori probabilities for each graph from a separate test-set. The class-label of the test graph is determined by the class of the supergraph which gives the maximum a posteriori probability. The classification rate is the fraction of correctly identified objects computed using 10-fold cross validation. For comparison, we have also investigated the results obtained using two alternative constructions of the supergraph. The first of these is the initial structure concatenated from the results of discrete relaxation. The second is the median sample graph, i.e. the sample graph with the largest a posteriori probability from the supergraph. Table 1 shows the classification results obtained with the three different supergraph constructions. Among the three constructions, the learned supergraph achieves an average classification rate of 83.8%, which is much higher than the initial supergraph’s classification rate(66.3%) and the median graph (57.5%).
In this paper, we have shown how a supergraph or generative model of graph structure can be learned using a novel variant of the EM algorithm. In our experiments, we demonstrate that our supergraph learning method can locate the structure of the supergraph that is optimal or suboptimal, and also show that the supergraph learned is effective for classification. Acknowledgement.This work was supported by the EU FET project SIMBAD (213250). Edwin Hancock was supported by a Royal Society Wolfson Research Merit Award.
References [1] A.D.Bagdanov and M.Worring. First order gaussian graphs for efficient structure classification. Pattern Recognition, 36:13111324, 2003. [2] K.A.Shokoufandeh, S.Dickinson and S. Zucker. Indexing using a spectral encoding of topological structure. CVPR, pages 491–497, 1999. [3] A.Torsello. An importance sampling approach to learning structural representations of shape. CVPR, pages 1–7, 2008. [4] A.Torsello and E.R.Hancock. Learning shape-classes using a mixture of tree-unions. PAMI, 28(6):954–967, 2006. [5] B.Luo and E.R.Hancock. Structural graph matching using the em alogrithm and singular value decomposition. PAMI, 23(10):1120–1136, 2001. [6] B.Luo and E.R.Hancock. A spectral approach to learning structural variations in graphs. Pattern Recognition, 39:11881198, 2006. [7] D.G.Lowe. Distinctive image features from scaleinvariant keypoints. IJCV, 99(2):91–110, 2004. [8] D.White and R.C.Wilson. Parts based generative models for graphs. ICPR, pages 1–4, 2008. [9] N.Friedman and D.Koller. Being bayesian about network structure. Machine Learning, 50(1-2):95–125, 2003. [10] R.C.Wilson and E.R.Hancock. Structural matching by discrete relaxation. PAMI, 19(6):634–648, 1997. [11] S.Gold and A.Rangarajan. A graduated assignment algorithm for graph matching. PAMI, 18(4):377–388, 1996. [12] W.J.Christmas and M.Petrou. Structural matching in computer vision using probabilistic relaxation. PAMI, 17(8):749–764, 1995.
5. Conclusion
1569 1573