On detecting all saddle points in 2D images - Semantic Scholar

Report 5 Downloads 26 Views
Pattern Recognition Letters 25 (2004) 1665–1672 www.elsevier.com/locate/patrec

On detecting all saddle points in 2D images A. Kuijper

*

IT University of Copenhagen, Rued Langgaards Vej 7, DK-2300 Copenhagen S, Denmark Received 16 October 2003; received in revised form 15 April 2004 Available online 7 August 2004

Abstract Although spatial critical points (saddle points and extrema––minima and maxima) are mathematically well-defined, it is non-trivial to detect them on an arbitrary discrete grid. Discretising a continuous method as well as a straightforward discrete neighbourhood based method do not guarantee to return all critical points. Although not all image analysis tasks require the right amount of critical point in mutual relations, it is obviously an advantage to know that all critical points are found. Furthermore, some methods do require the right amount of saddle points in relation to the extrema. The Euler number is an invariant stating explicitly the relation of the number of types of critical points. Using this, one is sure to find the right number of critical points. It is defined on a discrete lattice, so one only has to use the right grid. This appears to be a hexagonal one where each point has six neighbours. An easy way is given to use the hexagonal based critical point detection in a rectangular grid, which is commonly used in computer vision and image analysis tasks. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Hexagonal lattice; Critical points; Scale space; Image analysis; Image structure

1. Introduction Mathematically, spatial critical points are defined by the zeros of the gradient. Regarding a discrete grid on which the image is embedded, these zeros will in general not coincide with the grid points, but lie somewhere in between them. In this *

Tel.: +45 72 18 50 68; fax: +45 72 18 50 01. E-mail address: [email protected] URL: http://www.itu.dk/people/arjan.

paper I investigate the influence of the structure of the grid with respect to detecting all critical points. For extrema the structure may seem of minor importance, since they are classified as the point with the highest (maximum) or lowest (minimum) intensity with respect to its neighbours. However, selecting a different definition of neighbourhood (8 instead of 4, say), can cause a change in the number of found extrema. In the same way the number of saddle points can vary.

0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.06.017

1666

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

Therefore, an invariant stating explicitly the relation between the number of extrema and saddle points in an image is needed to be sure that all critical points are found in an objective manner. This invariant is given by the so-called Euler number. Secondly, a grid and neighbourhood are needed that guarantees that is Euler number holds. This appears to be non-trivial and even non-true for the commonly used grids and neighbourhoods in computer vision and image analysis, the 4 and 8 neighbourhoods in the rectangular grid. Instead of these, a hexagonal grid with its embedded 6 neighbourhood does satisfy the Euler number restriction. This grid in commonly known in (and by) nature, e.g., by bees, stapling oranges at the groceryÕs or particle alignment (see any random issue of Nature). Despite this proven fact, a rectangular grid is closer to intuition with its two directions in 2D, as well as to computational and programming easiness. In most image analysis tasks finding the exact number of critical points is not an important issue. However, in some it is. In the recently published scale space approach––in which the image is blurred with a Gaussian at continuously varying scales and viewed in this extra dimensional space––the critical points at each scale play an important role, since they are connected at all the scales, forming critical curves. These curves exhibit nice mathematical properties: They consist of branches of a specific type of critical points, while at a connection point exactly two branches, one containing saddle points and one containing extrema, meet. It is therefore extremely important that all critical points are found. Missing some leads to gaps in curves or even incorrect curves. More details on this approach are given in Section 2. In Section 3 the relation between discrete and continuous methods for finding critical points is discussed, including the Euler number. The topic of selecting the right grid and neighbourhood is presented in Section 4, while Section 5 deals with the question how to use the right hexagonal grid for finding all critical points when a rectangular grid is given. In Section 6 an example is given and Section 7 summarizes the results.

2. Gaussian scale space Generally, there is no a priori reason to treat the image on pixel scale, since it is only a result of the acquisition of the image by an aperture with a certain scale or sampling range. Moreover, the assumption that the image can be regarded as a continuous function of its spatial variables is only mathematically justified, if this discrete image is convolved with a so-called test function. This highly mathematical result is obtained within the ‘‘Theory of Distributions’’, started by Schwartz (1950–1951). One of the most simplest and mostly used test functions is the Gaussian filter with zero mean and variable variance (Florack, 1997). The latter is very important: Since there is no a priori reason to fix the variance, all possible variances (i.e. scales) are to be taken. Therefore the n-dimensional image LðxÞ : Rn ! Rþ 0 is extended to an (n + 1)-dimensional scale space image Lðx; tÞ : Rn  Rþ ! Rþ 0 by Z jxyj2 1 Lðx; tÞ ¼ ð1Þ pffiffiffiffiffiffiffin e 4t LðyÞ dy: 4pt Rn As a result, the scale space image satisfies the well-known heat equation, otL(x;t) = DL(x;t), where ot denotes the partial derivative with respect to t, and D the Laplacean, defined by ox1x1 + ox2x2 + + oxnxn. Commonly, instead of t, the variance r, with pffiffiffiffi r ¼ 2t, is considered. From Eq. (1) it is clear that when regarding ‘‘scale r’’, scale and spatial location have the same dimensions. It thus makes sense to speak of ‘‘a scale of three pixels’’. This corresponds to regarding the image blurred with r = 3 and thus t = 4.5. For more details on (Gaussian) scale space and its properties, the reader is referred to the ample literature on this subject, e.g., by Florack (1997), Koenderink (1984), Lindeberg (1994), and Weickert (1998) and references therein. 2.1. Critical curves When tracing the critical points while the scale increases, one finds that they always annihilate (or sometimes are created) in pairs: One saddle point and one extremum. These are mathemati-

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

cally proven by Damon (1998) to be the only generic events. An example is given in Fig. 1. When the scale is considered as third dimension, the critical points form curves in scale space, as shown in Fig. 2. For more details on the behaviour of these curves, see Kuijper and Florack (2002). Although this behaviour is of interest from a mathematical point of view, it is also the basis of deriving a unique rooted ordered binary tree from the Gaussian scale space image that represents the image in a topological sense––since it is based on the topological entities, the critical points. It basically boils down to defining regions to each extremum, subdividing the image in regions that satisfy a hierarchical ordering. For more details, see Kuijper et al. (2003) and Kuijper and Florack (2003).

1667

Fig. 2. The spatial critical points in scale space form curves containing branches with saddle points and branches with extrema. At the connection points of two branches, a saddle branch and an extremum branch meet.

When implementing this structure, one should be aware that it is essential to extract all (or: The right amount of) critical points at each scale. As may be clear from Fig. 2, if the number of saddle points is n, then the number of extrema must be n + 1. The Section 3 deals with this issue.

3. Critical points: Continuous versus discrete

Fig. 1. Artificial image with markers on the critical points. As the scale increases from left to right, top to bottom, the image gets more and more blurred, while pairs of a saddle point and an extremum move to each other and disappear.

Many methods, based on continuous functions, can be applied to find the locations of critical points with sub-pixel precision, For instance, the method of determining the zero crossings of the gradients, in which for each pixel the result is ‘‘positive’’ or ‘‘negative’’ for each partial derivative. Or the method of calculating winding numbers (see e.g., Staal et al. (1999) and Kalitzin et al. (1998)), based on a local neighbourhood around each pixel. The change of orientation of the gradient vector field on the boundary of that neighbourhood determines the type of each point. These approaches, however, only yield evidence of a pixel being close to (i.e. in the neighbourhood of) a critical point. If all critical point are found and uniquely assigned to a pixel, all is well. However, it may occur that two neighbouring pixels give alike output. One (then) may try any sub-pixel

1668

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

precision algorithm, but still need to assign the critical point to one grid point. Assigning extrema to grid points is almost trivial: The grid point is an extremum if its value is larger––or, equivalently, smaller––than all its neighbours. For saddle points the situation is more complicated. Keyword in this observation is neighbourhood, to which we will come back.

This Euler number, ‘‘popping up everywhere’’ (Koenderink, 1990) gives a powerful requirement with respect to assigning the critical points of an image to its grid points. Assuming that two adjacent pixels have different values 1, it is at fore hand clear what the number of saddle points must be. Furthermore, the chosen grid (triangulation) should be such, that it does not violate the Euler number.

3.1. The Euler number An important topological entity is the Euler number. In two dimensions in binary images it simply equals the number of objects minus the number of holes, in three dimensions it is more complicated, see Lee et al. (1991, 1993). So for simplicity we restrict ourselves to two dimensions. The Euler number of an image is related to the winding number (Henle, 1979; Lee et al., 1991) mentioned before. It can be expressed in term of the critical points: Let v(M) denote the Euler number of a manifold M (say the image), g denote the genus of the manifold (roughly the number of holes, see e.g., Koenderink, 1990; Milnor, 1969), m+ the number of maxima, m the number of minima, and s the number of saddles (assume there are no catastrophes/singularities), then mþ  s þ m ¼ 2ð1  gÞ ¼ vðMÞ: So calculating the Euler number is just counting the critical points. Another nice property is commonly used in e.g., combinatorial and computational topology (Biasotti et al., 2000a,b; Fomenko and Kunii, 1997; Henle, 1979; Kergosien, 1991; Shinagawa and Kunii, 1991; Shinagawa et al., 1991). A triangulation of an object, or image, is obtained by taking a number of landmarks (e.g., points) and overlay the object with triangles that connect three landmarks, a method that is also commonly used in spline representations of objects. Obviously, this can be generalized to a set of vertices that are connected by edges thus forming cells (called faces). Then the Euler number is also present here: It equals the number of faces (F) minus the number of edges (E), plus the number of vertices (V): vðMÞ ¼ F  E þ V :

4. Neighbourhoods Having noticed this, it is necessary to determine what ‘‘a neighbourhood’’ means. Golay (1969) pointed out that only three types of planar, symmetric, isotropic point grids exist, namely square, hexagonal and triangular ones. The triangular grid can be obtained from the hexagonal one, and is least suited for neighbourhood investigations. Therefore, the other two remain. 4.1. Rectangular grids Using a rectangular grid, as is rather common in image analysis, a first idea would be the socalled 4-neighbour connectivity: For the moment, think of a pixel as a square, connected to its neighbours left, right, up and down, see Fig. 3a, where the grey blocks denote the neighbours. An alternative is the so-called 8-neighbour connectivity, in which also the corners are taken into account (Fig. 3b). Here immediately problems rise, as shown by the following example of (Deutsch, 1972). Assuming that in Fig. 3a the centre block is background, we have 4 faces. Each has 4 vertices, so in total there are 16 vertices (landmarks), since the four faces are disjoint. Furthermore there are 16 edges, leading to an Euler number of 4  16 + 16 = 4, whereas the number of objects minus the number of holes yields 4  1 = 3: A problem. For the 8neighbourhood a similar result holds: In the same 1 Images with adjacent pixels with equal value are nongeneric. An infinitesimal perturbation of the image, e.g. by adding extremely small noise, will remove identical neighbouring values.

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

(a)

(b)

1669

(c)

Fig. 3. Different types of connectivity based on neighbours (shaded region) of the centre block: (a) 4-neighbour connectivity; (b) 8neighbour connectivity; (c) 6-neighbour connectivity.

Fig. 3a there are still 4 faces and 16 edges, but now there are 12 vertices. This yields an Euler number of 4  16 + 12 = 0, but now the part in the middle is not a hole: It is connected to the background, so the number of objects minus the number of holes is 1  0 = 1. The problem is solved by assuming 4neighbour connectivity for the objects, and 8neighbourhood for the background (then counting objects yield 4  0 = 4) or the other way round (obtaining a hole in the middle and 1  1 = 0). This duality between 4- and 8-neighbour connectivity makes it non-trivial to implement it in grey value images instead of binary images.

ally, a preference has been shown for them in the work due to their consistent connectivity.

4.2. Hexagonal grids

5.1. Horn

The problems mentioned in the previous section are avoid when using an hexagonal grid. An example of it is shown in Fig. 3c, where the grey blocks represent the neighbourhood of the block in the middle. Now there is no problem in connectivity: Assuming again that the centre block is background, we have 24 vertices, 30 edges, and 6 faces so the Euler number V  E + F = 24  30 + 6 = 0, equalling one object minus one hole (the genus is one). Consequently, from a topological point of view, hexagonal grids (or: Lattices) are to be preferred. Middleton and Sivaswamy (2001) state that they have been shown to have better efficiency and less aliasing. They exploit the oblique effect in human vision. They can be obtained by staggering the individual square pixels like a brick wall. Gener-

Horn (1986) suggests two neighbours on the same diagonal, to be obtained by moving the row above a particular cell a half cell to the left and the row below a half cell to the right (or the other way round). An example is shown in Fig. 4b, where the eight neighbours in the rectangular grid are grey. In this example the neighbours top-left and bottom-right are removed to obtain 6-neighbourhood connectivity.

5. Construction of a hexagonal lattice The idea of the brick wall is shown in Fig. 4a, where each brick has six neighbours. Obviously, it takes ‘‘the golden way in between’’ the 4- and 8-neighbourhood connectivity by simply translating rows. It remains to select how to translate, and which pair of neighbouring pixels to select (or deselect) in the original rectangular grid.

5.2. Blom To locate the critical points uniquely Blom (1992) used a translation, which is derived from a usual 2-D grid by shifting each even row a half pixel to the right and leaving odd rows unattached, or of course any similar translation. An example of

1670

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

(a)

(c)

(b)

Fig. 4. A hexagonal lattice. Staggering the individual square pixels like a brick wall (a), and the effect on the 8-neighbours due to the translations of Horn (b) and Blom (c).

this translation is shown in Fig. 4c, where the eight neighbours in the rectangular grid are again grey. In this example the neighbours top-right and bottom-right are removed to obtain the 6-neighbourhood connectivity. So this translation is in fact only half of HornÕs translation. The advantage of BlomÕs approach is found in its relatively simple algorithmic implementation. 5.3. Types of hexagonal grid points On this grid each point has 6 neighbours. For each of these neighbours the sign of the difference with respect to the point itself is determined. To determine the class of the point, the number of sign changes walking clockwise along these six neighbours is counted. This leads to 4 typical cases: Zero sign changes: The point is an extremum, a maximum iff all signs are negative, a minimum iff all are positive (see Fig. 5a).

+

+

+

-

+ +

+

(a)

Two sign changes: The point is a regular point (see e.g., Fig. 5b). Four sign changes: The point is a saddle point (see e.g., Fig. 5c). Six sign changes: The point is a degenerated saddle point, a so-called money-saddle (see Fig. 5d). The monkey saddle is non-generic, that is: It is generically not to be found. A monkey saddle can analytically be modelled by the function x3  3xy2, which is not stable under a small perturbation: Adding x or y transforms the monkey saddle in two ordinary saddles. In the remaining we use BlomÕs translation, which, we emphasize, holds the Euler number and thus finds all saddle points. A close look at the description of the lattice shows that it can be implemented as the 6 neighbourhood of a point by taking into account the 8 neighbours and disregarding the upper right and lower right neighbour

+

-

+

+ +

+

(b)

-

+

+ +

+

-

(c)

-

-

+ +

-

(d)

Fig. 5. The four typical neighbourhoods in a hexagonal lattice. (a) An extremum; (b) a regular point; (c) a saddle point; (d) a degenerated saddle point, a so-called monkey saddle.

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

in the even rows, and the upper left and lower left neighbour in the odd rows. Or vice versa.

6. Example In Fig. 6a we show an artificial image containing 5 extrema and––thus––4 saddle points. The 6-neighbourhood connectivity result is shown in Fig. 6b. The black pixels are of type 0, i.e., extrema, the grey are of type 2, regular, and the white pixels are of type 4, saddle points. A result as desired. Fig. 6c shows the result of the 4-neighbourhood connectivity. One saddle (a white colored pixel of type 4) missing in the middle of the image. Obviously since the 4-neighbourhood connectivity is too coarse to detect the saddles. The 8-neighbourhood connectivity is shown in Fig. 6d. Now the missing saddle is detected, but not unambiguously. It is assigned to two neighbouring pixels. This becomes clear from a more detailed investigation around this pixel. The rectangular grid can be represented as Fig. 7a. Using a 4-neighbourhood connectivity on the two pixels in the middle of the centre row, it becomes evident that both the ‘‘3’’ and the ‘‘4’’ show two sign changes. Starting on top, the ‘‘3’’ has pattern  + + +, the ‘‘4’’ has pattern +   +. They are thus qualified are regular points. Using an 8-neighbourhood connectivity, however, one finds four sign changes for both pixels: The ‘‘3’’ has pattern  + + + +  + +, the ‘‘4’’ +   +  + + +. And thus they are qualified as saddle points.

5

2

6

7

4

3

4

8

5

6

2

4

(a)

1671

5 4

2 3

5

6 4

6

7 8

2

9 4

(b)

Fig. 7. Saddle detection: (a) values in the rectangular grid; (b) values in the corresponding hexagonal grid.

The corresponding 6-neighbourhood connectivity is shown in Fig. 7b. Now the ‘‘3’’ has pattern (starting top left) + + + + + , thus two sign changes and consequently a regular point. The ‘‘4’’ has pattern   +  + + and is thus a saddle. Its neighbour, the ‘‘8’’ is again a regular point. Obviously, the Figs. 1 and 2 also used this hexagonal critical point detection algorithm. Application on more real world images (medical Magnetic Resonance Images) of this method can be found in (Kuijper et al., 2003; Kuijper and Florack, 2002, 2003).

7. Conclusions Concluding we find that a hexagonal grid is highly preferred in finding saddle points in two dimensions, since firstly it respects the Euler number, guaranteeing that all saddles are found, and secondly it assigns the saddles unambiguously to pixels. Quoting Horn (1986): ‘‘On a hexagonal tessellation life is easier’’.

Fig. 6. (a) Synthetic image. Type of the pixels (black is extremum, grey is regular, white is saddle) according to: (b) detection at a 6neighbourhood; (c) detection at a 4-neighbourhood; (d) detection at an 8-neighbourhood.

1672

A. Kuijper / Pattern Recognition Letters 25 (2004) 1665–1672

The hexagonal critical point detection can easily be transferred to a rectangular grid by artificially translating each even row of the image a half pixel. A disadvantage is that this procedure is only usable in two dimensions. In three dimensions things get very complicated. Mathematically, this problem relates to the optimal tiling principle, stated in 1611 by Kepler: He believed that the face-centred cubic lattice was the most efficient of all arrangements. Only recently, this problem, known as KeplerÕs Sphere Packing Conjecture has been proven to be right (Devlin, 1996). Implementation of this structure is possible, but complicated. However, attempts for similar tessellations using truncated octahedral and rhombic dodecahedral tillings are made (Miller, 1999). References Biasotti, S., Falcidieno, B., Spagnuolo, M., 2000a. Extended reeb graphs for surface understanding and description. In: Borgefors, G., Nystro¨m, I., Sanniti di Baja, G. (Eds.), Discrete Geometry for Computer Imagery 2000, Lecture Notes in Computer Science, 1953, pp. 185–197. Biasotti, S., Falcidieno, B., Spagnuolo, M., 2000b. Shape abstraction using computational topology techniques. In: Proc. 7th Workshop GEO-7, series on geometric modeling: Fundamentals and applications, Parma, Italy, pp. 155–165. Blom, J., 1992. Topological and geometrical aspects of image structure. Ph.D. thesis, Utrecht University. Damon, J., 1998. Generic structure of two-dimensional images under Gaussian blurring. SIAM J. Appl. Math. 59 (1), 97– 138. Deutsch, E.S., 1972. Thinning algorithms on rectangular, hexagonal, and triangular arrays. Comm. ACM 15 (9), 827–837. Devlin, K., 1996. Mathematics: The science of patterns. W.H. Freeman, article on KeplerÕs Sphere Packing Conjecture at http://www.maa.org/devlin/devlin_9_98.html. Florack, L.M.J., 1997. Image structureComputational Imaging and Vision Series, vol. 10. Kluwer Academic Publishers, Dordrecht, The Netherlands. Fomenko, A.T., Kunii, T.L., 1997. Topological Modeling for Visualization. Springer-Verlag, Tokyo. Golay, M.J.E., 1969. Hexagonal parallel pattern transformations. IEEE Trans. Pattern Anal. Machine Intell. 18, 740– 773. Henle, M., 1979. A Combinatorial Introduction to Topology. A Series of Books in Mathematical Sciences. W.H. Freeman and Company, San Francisco. Horn, B.K.P., 1986. Robot Vision. MIT Press, Cambridge, MA.

Kalitzin, S.N., Romeny, B.M.T.H., Salden, A.H., Nacken, M.A., Viergever, M.A., 1998. Topological numbers and singularities in scalar images. Scale-space evolution properties. J. Math. Imag. Vision 9 (3), 253–296, November. Kergosien, Y.L., 1991. Generic sign systems in medical imaging. IEEE Comput. Graphics Appl. 11, 46–65, September. Koenderink, J.J., 1984. The structure of images. Biological Cybernet. 50, 363–370. Koenderink, J.J., 1990. Solid Shape. MIT Press, Cambridge, Massachusetts. Kuijper, A., Florack, L., 2003. The hierarchical structure of images. IEEE Trans. Image Process. 12 (9), 1067–1079. Kuijper, A., Florack, L.M.J., 2002. The relevance of nongeneric events in scale space. In: Proc. 7th European Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002, pp. 190–204. Kuijper, A., Florack, L.M.J., Viergever, M.A., 2003. Scale space hierarchy. J. Math. Imag. Vision 18 (2), 169–189, April. Lee, C.-N., Poston, T., Rosenfeld, A., 1991. Winding and Euler numbers for 2D and 3D digital images. Computer Vision, Graphics Image Process.: Graphical Models Image Process. 53 (11), 522–537. Lee, C.-N., Poston, T., Rosenfeld, A., 1993. Holes and genus of 2D and 3D digital images. Computer Vision, Graphics Image Process.: Graphical Models Image Process. 55 (11), 20–47. Lindeberg, T., 1994. Scale-space theory in computer vision. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers. Middleton, L., Sivaswamy, J., 2001. Edge detection in a hexagonal-image processing framework. Image Vision Comput. 19 (14), 1071–1081. Miller, E.G., 1999. Alternative tilings for improved surface area estimates by local counting algorithms. Comput. Vision and Image Understanding 74 (3), 193–211. Milnor, J., 1969. Morse Theory, third ed.Annals of Mathematics Studies, Vol. 51 Princeton University Press, Princeton. Nielsen, M., Johansen, P., Olsen, O.F., Weickert, J. (Eds.), 1999. Scale-Space Theories in Computer Vision, vol. 1682, Lecture Notes in Computer Science, Springer-Verlag, Berlin, Heidelberg. Schwartz, L., 1950–1951. The´orie des Distributions. Actualite´s scientifiques et industrielles; 1091,1122, vols. I, II. Publications de lÕInstitut de Mathe´matique de lÕUniversite´ de Strasbourg, Paris. Shinagawa, Y., Kunii, T.L., 1991. Constructing a Reeb graph automatically from cross sections. IEEE Computer Graphics Appl. 11, 44–51, November. Shinagawa, Y., Kunii, T.L., Kergosien, Y.L., 1991. Surface coding based on Morse theory. IEEE Computer Graphics Appl. 11, 66–78, September. Staal, J., Kalitzin, S., ter Haar Romeny, B.M., Viergever, M.A., 1999. Detection of critical structures in scale space. In: Nielsen et al., 1999, pp. 105–116. Weickert, J.A., 1998. Anisotropic Diffusion in Image Processing. Teubner, Stuttgart.