Similarity Measure for Vector Field Learning - Springer

Report 1 Downloads 45 Views
Similarity Measure for Vector Field Learning Hongyu Li and I-Fan Shen Department of Computer Science and Engineering, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, China {hongyuli, yfshen}@fudan.edu.cn

Abstract. Vector data containing direction and magnitude information other than position information is different from common point data only containing position information. Those general similarity measures for point data such as Euclidean distance are not suitable for vector data. Thus, a novel measure must be proposed to estimate the similarity between vectors. The similarity measure defined in this paper combines Euclidean distance with angle and magnitude differences. Based on this measure, we construct a vector field space on which a modified locally linear embedding (LLE) algorithm is used for vector field learning. Our experimental results show that the proposed similarity measure works better than traditional Euclidean distance.

1 Introduction With the increase in the precision of computational simulation and Computational Fluid Dynamics (CFD), very large vector data sets become readily available. As a result, the problem of vector field learning became a significant concern for the exploratory study in visualizing vector data ([1]). The main task of vector filed learning is to extract significant features contained implicitly in vector fields. Once such features are found with learning methods such as principal component analysis (PCA, [2]) and locally linear embedding (LLE, [3]), we can easily cluster or classify vector data ([4, 5, 6, 7]), segment ([8]) and simplify ([9, 10]) vector fields on the feature space. Vector data different from common point data includes direction and magnitude information other than position information. Thus, methods for dealing with vector data should be different from those for common point data. The traditional method to compute the similarity between point data is based on Euclidean distance which is obviously inappropriate to vector data. It is necessary for us to define a new similarity measure to study the affinity between vector data. The measure proposed in the paper combines the direction and magnitude information with Euclidean distance in the form of linear combination which corresponds to a vector field space(VF space). On this space, significant features of vector data can be easily extracted using a modified LLE algorithm. The remainder of the paper is divided into the following parts. Section 2 introduces a similarity measure of vector data. In section 3, a VF space is constructed for vector field learning. Experimental results will be presented in section 4. Finally, section 5 ends with some conclusions. J. Wang et al. (Eds.): ISNN 2006, LNCS 3971, pp. 436–441, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Similarity Measure for Vector Field Learning

437

2 Similarity Measure of Vector Data For common point data (for example, Fig. 1(a)), the neighborhood selection is simply based on Euclidean distance in general. The measure is possibly inappropriate to vector data (for example, Fig. 1(b)) due to the appearance of other important information except spatial information. 1.5 1

0.8

1 0.6

0.4

0.5 0.2

0

0 −0.2

−0.4

−0.5 −0.6

−0.8 −0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

−1 −0.8

(a) discrete point data

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

(b) discrete vector data

Fig. 1. Discrete point and vector data. Vector data include direction and magnitude information other than position information.

2.1 Euclidean Distance Given n points {pi } with spatial coordinates {xi , yi }, the similarity between pi and pj is traditionally defined in terms of Euclidean distance. s(pi , pj ) = e



d(pi ,pj ) σ2 1

(1)

which is essentially a Gauss kernel. d(pi , pj ) represents Euclidean distance between pi and pj , the scaling parameter σ1 is a positive real constant. Under this measure, nearby points with smaller Euclidean distance are assigned higher similarity. Obviously the neighborhood of each point is completely determined by Euclidean distance. But a problem will appear in some cases: when data points are sampled from several different but close manifolds, those points with smaller distance are possibly got from different manifolds. That is, those data points in the local neighborhood of a certain point should actually belong to different sets. If such neighborhood is used for learning, the effects could be poor. For example, data points (p1 , . . . , p7 ) in Fig. 2 only containing spatial coordinates are sampled from two 1-d manifolds. Although the spatial distance between p1 and p7 is smaller than that between p1 and p2 , p1 and p2 actually belong to the same class and have higher similarity. Clearly it is not beneficial for extracting features of manifolds if purely using Euclidean distance to determine the similarity measure.

438

H. Li and I-F. Shen

Fig. 2. Angular difference between vectors to compute the similarity

2.2 Angular Difference If vector information at each point are provided like in Fig. 2, the most similar point to p1 is clearly p2 , not p7 since vectors at p1 and p2 have closer direction. Therefore, the angle information should also be included in the similarity measure. Given a set of points with vector information, the similarity between any two points pi and pj can be rewritten in the form of linear weighting, s(pi , pj ) = αe−d(pi ,pj ) + βe1−a(pi ,pj )

(2)

where α and β are positive real constants, and α + β = 1. a(pi , pj ) = vi · vj /(|vi ||vj |) represents the angular distance between pi and pj where vi is the vector at point pi and | · | denotes the norm of a vector. The smaller is the angle between vectors at two points, the larger is the similarity between two point data. In Fig. 2, points (p1 , · · · , p7 ) should be divided into two sets: (p1 , · · · , p4 ) belong to set 1; (p5 , · · · , p7 ) to set 2. According to the measure (2), there is clearly higher intrinsic similarity within points of either set, and lower similarity between points of different sets. 2.3 Magnitude Difference Besides Euclidean and angular distances, magnitude is an important factor influencing similarity between data. It should also be incorporated in the measure. So formula (2) can be reorganized as follows, s(pi , pj ) = αe−d(pi ,pj ) + βe1−a(pi ,pj ) + γe−m(pi ,pj )

(3)

where γ is positive real constants, α+β+γ = 1, and m(pi , pj ) = ||vi |−|vj || represents the difference between magnitudes of vectors at pi and pj .

Similarity Measure for Vector Field Learning

439

2.4 Analysis In this paper we will use the similarity measure (3). The measure contains a fact that the similarity becomes the largest at 1 when i = j. Furthermore, when α (β or γ) increases gradually from 0 to 1, the importance of the role Euclidean distance (angle or magnitude) plays in the measure is increasingly built up until the measure is completely dependent on the component when the parameter attains 1. Note the similarity measure (3) is specially designed for dealing with vector data. It is actually built on a vector field space different from traditional Euclidean space on which Euclidean distance is based.

3 Vector Field Learning 3.1 Construction of Vector Field Space From n points pi = {xi , yi } with vector information vi = {vix , viy }, we can construct a vector field space τ where τi = (αxi , αyi , βκxi , βκyi , γmi ). mi = |vi | represents the vy vx magnitude of vector, κxi = mii and κyi = mii respectively denote the cosine of the angle between vi and the coordinate axis, and the weights α, β and γ is equivalent to those in (3). Note if the measure (3) is used in a learning method, the learning process must work on a vector field space τ . 3.2 Nonlinear Learning Traditional linear learning methods such as PCA and MDS work poor for data with nonlinear distribution. In recent years, locally linear embedding (LLE) became more popular because of its simplicity and better performance for nonlinear manifolds. The original LLE method, however, is not suitable for dealing with vector data. We need to modify the method in the first step. In the modified version, the criterion of determining the neighborhood of any data is not Euclidean distance, but the similarity measure (3). Those points with largest mutual similarity are considered as nearest neighbors. Besides, we replace Euclidean space with vector field space to extract features of vector data. The modified LLE algorithm for vector field is discussed in [11] in detail.

4 Experimental Results The point data shown in Fig. 1(a) is actually sampled from a 1-d dimensional manifold. However, the reconstructed curve from such data set is not always with the similar shape to the original manifold if only the position information of points is provided. Fig. 3(a) shows one of possible curves reconstructed from the point data. If the tangent vector at each point is provided like in Fig. 1(b), it is clear that the geometrical structure of such a set of data is unique and definite as in Fig. 3(b). If the proposed similarity measure (3) is applied into a set of vector data (Fig. 1(b)) to estimate the neighborhood of each vector, the significant feature of the set of vector data can be effectively found using the modified LLE algorithm and the original manifold curve can be appropriately unfolded on the feature space. Fig. 4(b) shows the extracted

440

H. Li and I-F. Shen 1.5 1

0.8

1 0.6

0.4

0.5 0.2

0

0 −0.2

−0.4

−0.5 −0.6

−0.8 −0.6

−0.4

−0.2

0

0.2

0.4

0.6

−1 −0.8

0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

(a) one of possible curves reconstructed from (b) a unique curve determined by vector data point data Fig. 3. Curves reconstructed from point and vector data. Clearly the curve obtained from vector data is unique and definite. Many different curves, however, possibly are reconstructed from point data.

1.4

1.08

1.3

1.06

1.2

1.04

1.1

1.02

1

1

0.9

0.98

0.8

0.96

0.7 −1.5

−1

−0.5

0

0.5

1

1.5

0.94 −1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

(a) the result obtained with original LLE on (b) the result obtained with modified LLE on Euclidean space a vector field space Fig. 4. Comparison of results of unfolding vector data shown in Fig.1(b). (a). Only spatial position information is utilized, and the similarity measure is completely based on Euclidean distance. (b). Vector information at each point is also incorporated into the similarity measure.

feature along which vector data on the 1-d manifold is arranged in good order. However, if we use Euclidean distance as the measure and execute the original LLE algorithm on Euclidean space, the result will become disorderly. Fig.4(a) shows the result based on Euclidean distance, where vector data appears orderless.

5 Conclusions Vector data distributed on a low-dimensional manifold provides us more geometric structure information of data in comparison with common point data. To determine

Similarity Measure for Vector Field Learning

441

the neighborhood of each vector data appropriately, a novel measure different from Euclidean distance is proposed to estimate the similarity of vectors. Besides, we design a vector field space to incorporate vector information on which the modified LLE algorithm can effectively extract significant features of vector data. Furthermore, the proposed similarity measure is also applicable to the analysis of optical flow in the field of computer vision, and to the reconstruction of surface in the field of computer graphics.

Acknowledgments This research work is supported by the National Natural Science Foundation of China under Grant No. 60473104.

References 1. Scheuermann, G., Hamann, B., Joy, K.I., Kollmann, W.: Visualizing local vector field topology. SPIE Journal of Electronic Imaging 9 (2000) 356–367 2. Tipping, M.E., Bishop, C.: Mixtures of probabilistic principal component analyzers. Neural Computation 11 (1999) 443–482 3. Roweis, S., Saul, L.: Nonlinear dimension reduction by locally linear embedding. Science 290 (2000) 2323–2326 4. Li, H., Chen, W., Shen, I.F.: Supervised learning for classification. In: FSKD (2). Volume 3614 of Lecture Notes in Computer Science. (2005) 49–57 5. Chen, J.L., Bai, Z., Hamann, B., Ligocki, T.J.: A normalized-cut algorithm for hierachical vector field data segmentation. In: Proc. of Visualization and Data Analysis 2003. (2003) 6. Garcke, H., Preusser, T., Rumpf, M., Telea, A., Weikard, U., Wijk, J.J.V.: A continuous clustering method for vector fields. In Ertl, T., Hamann, B., Varshney, A., eds.: Proc. of IEEE Visualization 2000. (2000) 351–358 7. Heckel, B., Uva, A.E., Hamann, B.: Clustering-based generation of hierarchical surface models. In Wittenbrink, C., Varshney, A., eds.: Proc. of IEEE Visualization 1998 (Hot Topics). (1998) 50–55 8. Li, H., Chen, W., Shen, I.F.: Segmentation of discrete vector fields. IEEE Transaction on Visualization and Computer Graphics (2006) (to appear). 9. Tricoche, X., Scheuermann, G., Hagen, H.: A topology simplification method for 2d vector fields. In Ertl, T., Hamann, B., Varshney, A., eds.: Proc. of IEEE Visualization 2000. (2000) 359–366 10. Telea, A.C., Wijk, J.J.V.: Simplified representation of vector fields. In Ebert, D., Gross, M., Hamann, B., eds.: Proc. of IEEE Visualization 1999. (1999) 35–42 11. Li, H., Shen, I.F.: Manifold learning of vector fields. In: ISNN 2006. Lecture Notes in Computer Science (2006) (to appear).