Manifold Learning of Vector Fields Hongyu Li and I-Fan Shen Department of Computer Science and Engineering, Fudan University, Shanghai, China {hongyuli, yfshen}@fudan.edu.cn Abstract. In this paper, vector field learning is proposed as a new application of manifold learning to vector field. We also provide a learning framework to extract significant features from vector data. Vector data containing position, direction and magnitude information is different from common point data only containing position information. The algorithm of locally linear embedding (LLE) is extended to deal with vector data. The learning ability of the extended version has been tested on synthetic data sets and experimental results demonstrate that the method is very helpful and promising. Manifold features of vector data obtained by learning methods can be used for next work such as classification, clustering, visualization, or segmentation of vectors.
1 Introduction Whether at optical flow analysis in computer vision, surface reconstruction in computer graphics or flow simplification in computational fluid dynamics (CFD), vector data is the primary research object at all times. Vector data different from common point data includes direction and magnitude information other than position information. Just since much more information can be obtained from them, vector data is more worth studying and analyzing than common point data. In recent years, many efficient and feasible methods have been proposed to deal with vector data. Telea and Wijk ([1]) developed a bottom-up approach for clustering vector data. Heckel et al. ([2]) considered a top-down approach to segment vector fields using principal component analysis. Garcke et al. ([3]) proposed a continuous clustering method for vector field based on diffusion. A modified normalized cut algorithm was extended successfully for hierarchical vector field segmentation by Chen et al. in [4]. Li et al. ([5]) proposed a novel approach to partition 2D discrete vector fields based on discrete Hodge decomposition. The basic principle of these methods essentially is to learn or mine significant information from a set of vector data which is called as vector field learning(VFL) by us. The problem of vector field learning has become a significant concern for the exploratory study of vector data ([1, 4, 6, 7]). In this paper, we in detail explain the concept of vector field learning and propose a basic learning framework. In particular, we extend the classic nonlinear manifold learning method–locally linear embedding (LLE, [8]) to a modified version for vector data. The remainder of the paper is divided into the following parts. Section 2 introduces the concept of vector field learning and designs the extended version of LLE to deal with vector data. In section 3, manifold features of vector data are completely analyzed J. Wang et al. (Eds.): ISNN 2006, LNCS 3971, pp. 430–435, 2006. c Springer-Verlag Berlin Heidelberg 2006
Manifold Learning of Vector Fields
431
1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Fig. 1. Three sets of discrete vectors respectively sampled from different circles. The direction of each vector represents the tangent direction of a certain point on circumference.
with a synthetic data, and some experimental results are presented in section 4. Finally, section 5 ends with some conclusions.
2 Manifold Learning of Vector Fields In differential geometry, a vector field is defined as a map f : Rn → Rn that assigns each x in Rn a vector function f(x). For the purpose of application, discrete vector fields are considered more in general. For instance, the 2-D discrete vector field shown in Fig. 1 corresponds to a map function, f(x, y) = {−y, x}. Discrete points in the figure are sampled from boundaries of three circles with a common center, and the vector at each point represents the tangent vector at the current position. A discrete vector field is essentially composed of a set of vectors. Vector data is different from common point data in the ability of representation: given a set of common points, the geometrical meaning represented by them is usually indefinite; if a vector is additionally provided at each point, the implicit meaning of the set usually is totally definite. Just like in Fig. 1, if only considering 2-D Euclidean coordinates of points, it is difficult for us to demonstrate that those points are obtained from the boundaries of three circles. But, when vector data is given at each point, the meaning represented by the set of data becomes clear and definite. Vector fields can provide us more useful information to study. Vector filed learning aims to extract that significant information contained implicitly in vector data. In this paper, we use the classical manifold learning method–locally linear embedding (LLE)– as the method of feature extraction from vector field. Moreover, we need a similarity measure ([7]) to judge the affinity among vectors and a vector field space to extend the LLE algorithm. 2.1 Extended Locally Linear Embedding Traditional linear learning methods such as PCA and MDS ([9, 10]) usually work poor for data with nonlinear distribution. In last years, locally linear embedding (LLE) grad-
432
H. Li and I-F. Shen 0.2
0.15
0.5
0.1
0.4
Feature 2
Feature 1
0.05
0.3
0
−0.05
0.2 −0.1
0.1 −0.15
0
0
10
20
30
40
50
60
−0.2
0
10
Vector data
20
30
40
50
60
Vector data
(a) the first feature
(b) the second feature
Fig. 2. Two features extracted from the vector field in Fig.1. (a): the first feature; (b): the second feature.
ually became popular because of its simplicity and better performance for nonlinear manifolds. The original LLE method, however, is unable to deal with vector data. We need to modify it in the first step. In the modified version, determining the neighborhood of any vector is no longer based on Euclidean distance, but the similarity measure proposed in [7]. Given n points xi with vectors vi , the similarity between xi and xj can be described as follows, s(xi , xj ) = αe−d(xi ,xj ) + βe1−a(vi ,vj ) + γe−m(vi ,vj )
(1)
where d(xi , xj ) represents Euclidean distance between two points, a(vi , vj ) the angular distance between two vectors, m(vi , vj ) the difference between magnitudes of vectors, and α + β + γ = 1. As for the exact definition of parameters, please refer to the paper [7]. Those vectors with largest mutual similarity are considered as nearest neighbors. Besides, we replace Euclidean space with vector field space to extract features of vector data. In general, for a vector field constructed on a space Rn , its corresponding vector field space should be of (2 × n+ 1) dimensions which include n-dimensional Euclidean coordinate, n-dimensional cosine of angle between vectors and coordinate axes and 1dimensional magnitude of vectors. The modified LLE algorithm is briefly introduced as follows: 1. Given N points xi with vectors vi , constructing a vector field space τ ; 2. Finding k most similar neighbors for each vector data in terms of the similarity measure (1); 3. Calculating the weights wij on the vector field space τ by minimizing the reconstruction errors, ε1 (W ) = |τi − wij τij |2 , i
j
where τij is the j-th most similar neighbor. The minimization is performed subject to a constraint: wij = 1 for all i.
Manifold Learning of Vector Fields
433
4. Finding coordinates ξi in a low-dimensional feature space by minimizing another error function, ε2 (Ξ) = |ξi − wij ξij |2 . i
j
3 Manifold Features of Vector Data For a 2-D vector field, we usually extend this 2-D space to a 5-D complex space where significant features about vector data are easier to be discovered. Applying the extended LLE to the vector field shown in Fig.1, we can mine some meaningful information and test the effectiveness and learning ability of our method. The learning results are shown in Fig.2 where the left and right panels respectively represent the first and second features extracted by our method. As one can see from the figure, the first feature only contains three values each of which corresponds to a set of vector data got from the same circle. In essence, the value on this feature is closely related with the radius of circle: when the radius of circle is changed, the feature value corresponding to the circle will also vary along. The second feature explains the periodicity of data: it actually implies the variation of central angle.
0.15
0.1
Feature 2
0.05
0
−0.05
−0.1
−0.15
−0.2
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Feature 1
Fig. 3. Two dominant features are simultaneously shown in a panel. Vector data orderly appears on such a feature space.
If both features are simultaneously shown in a panel, we can get the result in Fig.3 where three data sets linearly distribute in order on such a feature space. The result is very helpful for next vector clustering or other applications.
4 Experimental Results Another example is provide in Fig.4. The vector data is composed of two sets which are randomly sampled on two circular loops respectively. On the 2-D Euclidean space, since
434
H. Li and I-F. Shen
1.5 0.15
0.1
1
0.05
0.5
Feature 2
0
0
−0.05
−0.1
−0.5 −0.15
−1
−0.2
−0.25 0.2
−1.5 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
Feature 1
1
(a)
(b)
0.65
0.15
0.6
0.1
0.55 0.05
0.5
Feature 2
Feature 1
0
0.45
0.4
−0.05
−0.1
0.35 −0.15
0.3
0.25
−0.2
0.2
−0.25
0
50
100
Vector data
(c)
150
0
50
100
150
Vector data
(d)
Fig. 4. Manifold learning of vector field. (a): Original vector field. (b): Feature space spanned by the first two dominant features of vector data. (c): The first feature extracted by our method. (d): The second feature extracted by our method.
data distributes on surfaces of nonlinear manifolds just like in Fig.4(a), it is difficult to find an appropriate measure to cluster vectors. But if a feature space where data is with linear distribution was discovered, the process of clustering could become simple and doable. When the extended LLE algorithm is applied to this set of data, two significant features can be extracted respectively shown in Figs.4(c) and 4(d): the first feature still corresponds to a variable related with the radius of circle, and the second one tells us the periodicity of variation of data. These features span a feature space shown in Fig.4(b), where vector data scatters in two linear regions. On such a space, we can easily find a decision boundary to separate two sets of data. Although vector data in Fig.4 is affected by noise which results that direction and magnitude of vectors are kind of changed, learning effects of our method obviously are not destroyed which proves that the method is robust to a little of noise. What’s more, from the figure, one can also find that the method is not related to the regularity of sampling. Even if in the situation of irregular sampling, the method can still work well and extract significant features.
Manifold Learning of Vector Fields
435
5 Conclusions In this paper, we propose the concept of vector field learning and provide a learning framework to extract significant features from vector data. Vector data different from common point data contains direction and magnitude information other than position information. We extend the locally linear embedding (LLE) algorithm to make it suitable for vector data. The learning ability of the extended method has been tested on synthetic data sets and experimental results demonstrate the merits of such approach. Manifold features of vector data extracted by learning methods can be used in the field of classification, clustering, visualization, or segmentation of vectors, which is the topic of our future studies.
Acknowledgments This research work is supported by the National Natural Science Foundation of China under Grant No. 60473104.
References 1. Telea, A.C., Wijk, J.J.V.: Simplified representation of vector fields. In Ebert, D., Gross, M., Hamann, B., eds.: Proc. of IEEE Visualization 1999. (1999) 35–42 2. Heckel, B., Weber, G., Hamann, B., Joy, K.I.: Construction of vector field hierarchies. In: Proc. of IEEE Visualization 1999. (1999) 19–25 3. Garcke, H., Preusser, T., Rumpf, M., Telea, A., Weikard, U., Wijk, J.J.V.: A phase field model for continuous clustering on vector fields. IEEE Trans. Visualization and Computer Graphics (2001) 230–241 4. Chen, J.L., Bai, Z., Hamann, B., Ligocki, T.J.: A normalized-cut algorithm for hierachical vector field data segmentation. In: Proc. of Visualization and Data Analysis 2003. (2003) 5. Li, H., Chen, W., Shen, I.F.: Segmentation of discrete vector fields. IEEE Transaction on Visualization and Computer Graphics (2006) (to appear). 6. Li, H., Chen, W., Shen, I.F.: Supervised learning for classification. In: FSKD (2). Volume 3614 of Lecture Notes in Computer Science. (2005) 49–57 7. Li, H., Shen, I.F.: Similarity measure for vector field learning. In: ISNN 2006. Lecture Notes in Computer Science (2006) (to appear). 8. Roweis, S., Saul, L.: Nonlinear dimension reduction by locally linear embedding. Science 290 (2000) 2323–2326 9. Borg, I., Groenen, P.: Modern Multidimensional Scaling. Springer-Verlag (1997) 10. Tipping, M.E., Bishop, C.: Mixtures of probabilistic principal component analyzers. Neural Computation 11 (1999) 443–482