NONPARAMETRIC CLUSTERING USING QUANTUM ... - CiteSeerX

Report 2 Downloads 81 Views
NONPARAMETRIC CLUSTERING USING QUANTUM MECHANICS Nikolaos Nasios and Adrian G. Bors Department of Computer Science, University of York,York YO10 5DD, UK. {nn,adrian.bors}@cs.york.ac.uk ABSTRACT This paper introduces a new nonparametric estimation approach that can be used for data that is not necessarily Gaussian distributed. The proposed approach employs the Shro¨ dinger partial differential equation. We assume that each data sample is associated with a quantum physics particle that has a radial field around its value. We consider a statistical estimation approach for finding the size of the influence field around each data sample. By implementing the Shr¨odinger equation we obtain a potential field that is assimilated with the data density. The regions of minima in the potential are determined by calculating the local Hessian on the potential hypersurface. The quantum clustering approach is applied for blind separation of signals and for segmenting SAR images of terrain based on surface normal orientation. 1. INTRODUCTION While parametric methods focus on finding appropriate parameter estimates, that describe a predefined density function, in nonparametric methods the emphasis is on achieving a good estimate of the density function without any underlying model assumption [1]. A nonparametric technique is unsupervised and can model any probability density function [2]. The nonparametric methods can be classified as histogram-based and kernel-based [2, 3, 4, 5]. The histogram-based approaches require a large data set to guarantee convergence. Kernel methods usually result in smooth, continuous and differentiable density estimates [1]. The main limitation of nonparametric methods is their computational complexity, due to the data size and dimensionality. Nonetheless, this is compensated by minimizing the risk of misinterpreting the data due to incorrect model specification, as it may happen in parametric methods. In our approach we assimilate data samples with particles, each manifesting an energy field that decreases when departing from its location, according to a scaling parameter. The energy field is modelled by a Gaussian kernel function. The conservation of energy law for a particle in quantum mechanics is represented by the Shr¨odinger partial differential equation [6]. The energy of quantum particles and their orbits can be found by solving this equation. In this study we consider the reverse problem. We assume the eigenfunction (ground state) given as a sum of Gaussians, each centred at a data point and depending on a scale parameter [3, 4]. In [3] the maxima for such a function are evaluated as cluster centers. The corresponding Shro¨ dinger potential can be calculated for the given eigenfunction [7, 8]. The resulting potential landscape will provide minima modes that correspond to data clusters. The number of minima modes depends on the scaling parameter used in defining the eigenfunction. Choosing an appropriate σ is an ongoing problem in nonparametric clustering [2, 5, 8]. In this paper we propose a new approach for estimating the scale parameter of the eigenfunction. For a random set of data, the

average Euclidean distance to a set of its neighbours is calculated. A histogram is formed with these average distances to local sets. The resulting histogram is modelled as a Gamma distribution and the scale parameter is taken as the mean of this Gamma distribution. In [7, 8] Horn and Gottlieb calculated the local minima in the potential landscape provided by the Shro¨ dinger equation using gradient descent after appropriate thresholding. While such an algorithm depends on a threshold, the gradient descent can easily fail to find all the modes. In our approach the local Hessian of the quantum potential is used to find the minima associated with the various modes of the energy function. The set of data samples is split into regions according to the sign of the local Hessian eigenvalues [10]. The proposed approach is applied in blind detection of modulated signals [11] and for segmenting vector fields. The vector fields represent surface normals estimated from SAR images of terrain [12]. The quantum clustering algorithm is presented in Section 2. The estimation of the scale parameter is described in Section 3, while the identification of cluster modes algorithm is detailed in Section 4. Experimental results are provided in Section 5 and the conclusions of this study are drawn in Section 6. 2. QUANTUM CLUSTERING One of the major problems in data modeling is that of defining clusters. Clusters can be represented using a certain model that depends on a set of parameters or by using a non-parametric approach [2, 3, 5]. In this paper we propose an unsupervised non-parametric algorithm that relies on quantum mechanics principles [6]. Each data sample is associated with a particle that is part of a quantum mechanical system and has a specific field defined around its location. The state of a quantum mechanical system is completely specified by a function ψ(X, t) that depends on the coordinates of that particle X at time t. According to the first postulate of quantum mechanics, the probability that a particle lies in a volume element dX, located at X, at the time t, is given by |ψ(X, t)|2 dX [6]. Such a field can be assimilated with a Parzen window [2]. The activation field in a location X calculated from N data samples is given by: » – N X (X − Xi )2 exp − (1) ψ(X) = 2σ 2 i=1 where Xi are data samples, i = 1, . . . , N and σ represents the scale parameter. Maxima of this function have been considered as cluster centers in a nonparametric approach in [3]. The fifth postulate of quantum mechanics states that a quantum system evolves according to the Schro¨ dinger differential equation [6]. The time-independent Schro¨ dinger equation is : „ « σ2 H · ψ(X) ≡ − ∇2 + V (X) · ψ(X) = E · ψ(X) (2) 2

where H is the Hamiltonian operator, E is the energy, ψ(X) corresponds to the state of the given quantum system, V (X) is the Shr¨odinger potential and ∇2 is the Laplacian. Conventionally, the potential V (X) is given and the equation is solved to find the solutions ψ(X). The solutions to this equation can describe the electron orbits in atoms. In our case we consider the inverse problem where we assume known the location of data samples and their state as given by equation (1) which is considered as a solution for (2), subject to the calculation of constants. We want to calculate the resulting potential V (X) created by the quantum system assimilated with the given data. We can observe that the solution from (2), for a single data point, N = 1, X1 , for the system given by (1), is: 1 V (X) = (X − X1 )2 (3) 2σ 2 that corresponds to the harmonic potential in quantum mechanics [6]. Its eigenvalue E = d/2, where d represents the dimension of the given data space is the smallest possible eigenvalue of H. We assume that the potential is always positive, V (X) > 0. After replacing ψ(X) from (1) into (2) we solve the Shro¨ dinger potential of the given set of data samples as [7, 8]: » – N X (X − Xi )2 1 d V (X) = E − + 2 (X − Xi )2 exp − 2 2σ ψX i=1 2σ 2 (4) In the following we address the problem of using (4) for data clustering. Each cluster corresponds to a local minima in the potential defined by equation (4). However, in order to find the number of local minima, we should be able to use an appropriate scale, σ. 3. ESTIMATION OF THE SCALE PARAMETER It can be observed that the potential V (X) from (4) depends on the scale σ. The selection of an appropriate value for the parameter σ is very important for the calculation of the number of clusters. By varying σ we can get different numbers of minima in the potential function [7]. Specifically, if σ is decreasing, more minima appear in the representation of V (X), when compared when considering higher σ values. Nonetheless, the old minima still occur at the same place that become deeper while the added minima lie higher on the potential surface and correspond to fewer data samples. In [7], σ was initialized to arbitrary values. In our study we propose a statistical approach for the estimation of the scale σ. Because σ describes the local data spread, a good reason is to assume that it depends on the distribution of distances between neighbouring points. Let us estimate first the local variance : PK 2 k=1 kX(k) − Xi k yi = (5) K for i = 1, . . . , N , where K < N , is the cardinality of a data set that defines the chosen neighbourhood, and k · k denotes the Euclidean distance between a data sample Xi and a data from its neighbourhood ranked according to : kX(k) − Xi k2 < kX(j) − Xi k2

(6)

for k = 1, . . . , K and j = K + 1, . . . , N , where Xk , Xj 6= Xi . The estimation from (5) provides a measure of the local variance. A histogram is formed from the estimates provided by (5) for randomly selected data samples. The local data scale can be represented using a Gamma distribution, whose parameters are empirically estimated from the distribution formed by data calculated according to (5). The Gamma

distribution depends on two parameters and is given by [9]: y α−1 −y/β e (7) β α Γ(α) where α > 0 is the shape parameter that enables the function to have a variety of shapes, β > 0 is the Gamma distribution scale parameter which, when greater or smaller than one, allows the density function to stretch or compress, respectively, and Γ(·) represents the Gamma function: Z ∞ rt−1 e−r dr (8) Γ(t) = p(y) =

0

The parameters α and β are estimated from the empirical distribution of local variances (5). The method of moments estimators for the Gamma distribution calculates first the sample mean and standard deviation of the distribution, denoted as y and s, respectively. The parameters are estimated as [9]: „ «2 y s2 (9) α ˆ= ; βˆ = s y After inferring the probability density associated with distances to neighbourhoods yi (5), for the given data set, we take the estimate of σ ˆ as the mean of the Gamma distribution : σ ˆ=α ˆ βˆ (10) where α ˆ and βˆ are estimated in (9). 4. FINDING LOCAL MINIMA IN THE QUANTUM POTENTIAL OF DATA After finding an appropriate σ ˆ we implement the Schro¨ dinger equation (4) and we obtain the quantum potential V (X). The quantum potential can be viewed as a landscape in the (d + 1)th dimension. Let us assume a regular orthogonal lattice z that is defined between extreme data entries along each dimension k = 1, . . . , d. A potential V (z) is calculated on this lattice. The distance between two consecutive lattice knots is constant along each direction and considered equal to kzi,j − zi,j−1 k = σ ˆ 2 /2, where σ ˆ is estimated in (10). Data clusters correspond to minima in the resulting potential landscape. Local minima, maxima and saddle points can be found according to the local Hessian calculated on the quantum potential. The Hessian entries are given by„ : « ∂ 2 V (z) H[V (z)] = (11) ∂x∂y x,y=1,...,d The evaluation of V (z) on a regular lattice as described above facilitates the calculation of the local Hessian by considering simple differences of the potential values at neighbouring lattice points on each data dimension. The eigendecomposition of the Hessian matrix provides : H·T=T·Λ (12) where | · | denotes matrix multiplication, T is a matrix whose columns represent the eigenvectors, while Λ is a diagonal matrix that contains the eigenvalues λi , i = 1, . . . , d. The sign of the eigenvalues of the Hessian matrix are used to identify the saddle points [10]. A saddle point occurs where there is simultaneously at least one local maximum and one local minimum in two directions that are perpendicular to each other. Following the eigendecomposition we can identify local minima, maxima and saddle points according to the eigenvalues signs for all the lattice knots : λi (z) > 0 ∀ i = 1, . . . , d then local minimum (13) λi (z) < 0 ∀ i = 1, . . . , d then local maximum (14) ∃λj (z) > 0 ∧ ∃λi (z) < 0, i 6= j then saddle point (15)

A common sense assumption is that the local minima are surrounded by saddle points [5]. In the following it is assumed that all the compact areas that consist of lattice points whose potential fulfils the condition of local minima (all the eigenvalues are positive) that are surrounded by local maxima and saddle points, correspond to a cluster. Consequently, the quantum potential landscape is split into several regions and the number of clusters can be calculated. The data points are associated to clusters based on their assignment to their nearest knot onto the lattice.

and V (X) from (4). The results for the scale σ ˆ , estimated according to the algorithm from Section 3 are in bold. The number of clusters has been identified correctly as 4 for 4-QAM and 8 for 8-PSK, respectively. As it can be observed from these tables the quantum potential V (X) provides better estimates than when using the function ψ(X) for the blind separation of signals. √ 2σ 0.5 0.6 0.6327 0.7 0.8 Misclassif. ψ(X) 4.37 4.58 6.56 6.04 10.0 Error (%) V (X) 4.79 5.10 6.15 8.13 9.69

5. EXPERIMENTAL RESULTS

Table 1. Misclassification error when varying σ for 4-QAM. √ 2σ 0.2 0.25 0.27 0.3 0.35 Misclassif. ψ(X) 2.29 1.98 2.71 3.02 9.17 Error (%) V (X) 1.46 2.50 2.71 6.67

The quantum clustering algorithm is applied for blind detection of modulated signals and for topographical segmentation of SAR images of terrain. We consider two cases of modulated signals: quadrature amplitude modulated signals (QAM) and phaseshifting-key (PSK) modulated signals, [11]. The perturbation channel equations for 8-PSK signals assuming interference are :

Table 2. Misclassification error when varying σ for 8-PSK.

xI (t) = I(t) + 0.2I(t − 1) − 0.2Q(t) − 0.04Q(t − 1) + N xQ (t) = Q(t) + 0.2Q(t − 1) + 0.2I(t) + 0.04I(t − 1) + N

0.6

0.2 0.1 0 −3 −1 z1 0

3

0

−0.5

−1

−2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−1.5 −1.5

−1

−0.5

0

0.5

1

(a) 4-QAM (b) 8-PSK Fig. 1. Signal constellations. Neighbourhoods of K = N/4 are assumed, Gamma distributions of distances from data to K-neighbourhoods are formed and their mean is used to estimate the σ ˆ parameter as in (10). The quantum potential calculated on a regular 2D lattice after applying (4) to the data sets from Figure 1 is shown in Figures 2a and 2b for 4-QAM and 8-PSK, respectively. From Figure 2 it can be observed that 4 minima are found in the case of 4-QAM and 8 minima in the case of 8-PSK. The potential surface is split in regions according to the signs of the local Hessian eigenvalues (13)-(15). In Figure 1 the data points corresponding to regions associated with minima are marked with “+”, while all the others are marked with “·”. Tables 1 and 2 present numerical results for 4-QAM and 8-PSK, respectively when assuming various values for σ. The misclassification error is evaluated by generating random data according to (xI (t), xQ (t)), considering the same noise distribution with that used in the training and classifying the data samples according to the segmentation given by the potential function modes. The misclassification error is considered for both ψ(X), provided in (1)

2

0

−2

z2

−3

−2 −1

−1 z1

0

0 z 2 1

1 2

2

(b) 8-PSK

Fig. 2. Quantum potential surfaces.

−0.5

−2.5 −2.5

3

1

−1

(a) 4-QAM

0.5

−1

1 2

0.5

−1.5

0 −2

−2

1

0

0.3

0.1

2

1

0.4 V(Z)

0.2

1.5

1.5

0.5

0.4

where (xI (t), xQ (t)) makes up the in-phase and in-quadrature signal components at time t on the communication line, and I(t) and Q(t) correspond to the signal symbols (there are eight signal symbols in 8-PSK, equi-distantly located on a circle). The noise denoted by N is Gaussian and corresponds to SNR = 22 dB. We have generated 960 signals, by assuming equal probabilities for all inter-symbol combinations. For 4-QAM signals we assume only additive noise, with SNR of 8 dB. The resulting signal constellations are displayed in Figures 1a and 1b. 2.5

0.6

0.5

V(Z) 0.3

1.5

In another application we consider a Synthetic Aperture Radar (SAR) image representing terrain information, shown in Figure 3a. We want to identify various topographic regions in this image according to the local surface orientation clustering. In [12] it was shown how the surface normals are estimated and smoothed from this SAR image. The resulting surface normals depicted as needle maps are shown in Figure 3b and as data samples in Figure 4b. As it can be observed from Figure 4b the surface normals do not have a Gaussian distribution. The x and y coordinates for 1518 local estimates of surface normals are used in the quantum clustering algorithm. The histogram of local neighbourhood distances for K = N/4 is shown, when fitted to a Gamma distribution in Figure 4a. The data classified as local minima according to the local Hessian is marked with “+”, while those corresponding to saddle points and local maxima are represented as “·” in Figure 4b. A total of 12 clusters, each corresponding to compact minima regions in the quantum potential are identified but some of them are afterwards discarded due to the fact that they contain too few data samples. The surface obtained after representing ψ(z) from (1) is shown in Figure 5b, while the quantum potential surface V (z) from (4) is displayed in Figure 5a. The same estimation for σ ˆ, shown in Figure 4a as the mean of the Gamma distribution, has been undertaken in the case of both functions. The vector field of surface normals is segmented based on the vector similarity in each compact area corresponding to a local minima in the potential landscape. The segmentation is done according to the identification of surface normal vectors with the closest lattice knot that corresponds to a local minimum :  ff M Vm = Xj |m = arg min kXj − zk k λi (zk ) > 0 i = 1, 2 k=1

(16) where Vm represents a segmented topographic area, Xj is a surface normal vector, zk a lattice point that is associated with an area

of local minimum k, according to the eigenvalues of the Hessian λi (zk ). The segmentation in topographic regions for the SAR image from Figure 3a, when using the quantum potential, is shown in Figures 6a and 6c, while when considering ψ(z) is displayed in Figures 6b and 6d, for 4 and 7 clusters, respectively. In the segmented image we can identify several compact topographical areas. The segmentation fits well to the actual terrain features. It can be observed from Figure 6 that the segmentation of the potential modelled by ψ(z) provides more compact and fewer topographic regions than when employing V (z) for this vector field. Both functions can be used in a hierarchical cluster representation.

(a) 4 clusters from V (X).

(b) 4 clusters from ψ(X).

(c) 7 clusters from V (X).

(d) 7 clusters from ψ(X).

Fig. 6. Topographical segmentation of the SAR image. (a) SAR image

(b) surface normals

7. REFERENCES

Fig. 3. Synthetic Aperture Radar image representing terrain. 1

0.14

0.8

← mean 0.12

0.6 0.1

0.4 0.2

0.08

0 0.06

−0.2 −0.4

0.04

−0.6 0.02

−0.8 0

0

0.5

1

−1 −1

1.5

−0.8

(a) Distribution of yi

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

(b) Data clustering.

Fig. 4. Data processed from the surface normal vector field.

0.8

150

0.6 V(Z)

ψ(Z)100

0.4

50 0 1

0.2 0 −1

−1

1

0.5

−0.5

−0.5 z

1

0 z

0

2

0.5 z

0 1

−0.5

0.5

0.5 1

0

(a) Generated from V (z)

z2

−0.5 −1

1

−1

(b) Generated from ψ(z)

Fig. 5. Potential produced by the vectorial data. 6. CONCLUSIONS This paper proposes a new methodology for non-parametric segmentation. A quantum clustering algorithm that employs the Shr¨odinger partial differential equation for calculating the potential in a certain location is proposed. The local scale parameter is estimated as the mean of the Gamma distribution modeling average distances from data to local neighbourhoods. Clusters correspond to local minima in the quantum potential hypersurface. The resulting quantum potential landscape is segmented according to the sign of the eigenvalues for the local Hessian. The proposed algorithm is applied in blind detection of modulated signals and for segmenting vector fields of surface normals.

[1] R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification. Wiley, 2000. [2] E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat., vol. 33, pp. 1065–1076, 1962. [3] S. J. Roberts, “Parametric and non-parametric unsupervised cluster analysis,” Pattern Recognition, vol. 30, no. 2, pp. 261–272, 1997. [4] A. Elgammal, R. Duraiswami, L. Davis, “Efficient kernel density estimation using the Fast Gauss Transform with applications to color modeling and tracking,” IEEE Trans. Patt. Anal. and Mach. Intel., vol. 25, no. 11, pp. 1499–1504, 2003. [5] D. Comaniciu, V. Ramesh, A. del Bue, “Multivariate Saddle Point Detection for Statistical Clustering,” Proc. of European Conf. on Computer Vision (ECCV), Copenhagen, Denmark, 2002, vol. 3, pp. 561–576. [6] S. Gasiorowicz, Quantum Physics. Wiley, 1996. [7] D. Horn, A. Gottlieb, “The method of quantum clustering,” Proc. of Advances in Neural Infor. Proc. Systems (NIPS) 14, 2001, pp. 769–776. [8] D. Horn, A. Gottlieb, “Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics,” Physical Review Letters, vol. 88, no. 1, art. no. 018702, pp. 1-4, 7 Jan. 2002. [9] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, Bayesian Data Analysis, Chapman & Hall, 1995. [10] R. M. Haralick, L. T. Watson, T. J. Laffey, “The Topographic Primal Sketch,” International Journal of Robotics Research, vol. 2, no. 1, pp. 50 - 72, 1983. [11] N. Nasios, A. G. Bors, “Variational expectation-maximization training for Gaussian networks,” Proc. IEEE Workshop on Neural Networks for Signal Processing, Toulouse, France, 2003, pp. 339-348. [12] A. G. Bors, E. R. Hancock, R. C. Wilson, “Terrain analysis using radar shape-from-shading,” IEEE Trans. on Pattern Analysis and Machine Intell., vol. 25, no. 8, pp. 974-992, 2003.