Local Invariant Generalized Hough Transform - Semantic Scholar

Report 2 Downloads 143 Views
LIGHT: Local Invariant Generalized Hough Transform Jose A.R. Artolazabal and John Illingworth Centre for Vision Speech and Signal Processing University of Surrey, UK [email protected], [email protected] Abstract In this paper, we present a novel method for 2D shape extraction based on the Hough Transform. The method is applicable under similarity transformations while maintaining the dimensionality of the problem as that of the original GHT. This is possible thanks to the use of a set of Fourier based descriptors which remain invariant under translation, scale and rotation. In contrast with other invariants used in the same context, the descriptors we present here are local (only small segments of a given shape around a point are needed to compute them), and therefore our method is specially tolerant to noise and occlusion. Experimental results will be provided to justify the parameters chosen to implement our approach, as well as to demonstrate its performance.

1. Introduction The Hough Transform [4], HT, and its variants, constitute a set of commonly used and fairly successful methods for object recongnition via shape detection. Originally used for the extraction of analytical shapes, such as straight lines, circles or ellipses, it was soon extended (Generalized Hough Transform, GHT) to deal with more general non-analytical shapes [2]. The GHT has proved to be an extremelly robust technique in the presence of noise and occlusion. Such robustness, unfortunately, is only possible at the expense of a high computational load. This limitation is particularly apparent if, as it is usually the case in pattern recognition applications, similarity or even affine invariance is necessary. It is well known that the GHT can accomodate for these situations by merely increasing the number of parameters describing the target model. However, the point spread function (psf), which represents the trace of a locus in the accumulator space, has a size which grows exponentially with the number of free parameters considered. As a result, arbitrary shape extraction under similarity or affine transformations by means of the GHT, leads irremediably to 4D

or 6D accumulator spaces, and o(n4 ) or o(n6 ) complexity algorithms, which makes the idea impractical. Following the original GHT, a number of extensions exist intented to reduce its computational cost [3, 6, 7, 8]. These techniques, however, do not reduce the high-dimensionality inherent to the GHT, but rather optimize the accumulation process and the subsequent search for maxima in the accumulator space. There have also been strives to deal more straightforwardly with the computational cost issue via reducing the high dimensionality of the problem while still allowing for general templates and invariant shape extraction. In this context, [1] suggests the use of an invariant characterization of the shapes under analysis as a means to ensure the resulting psf remains in a 2D space representing the shape’s location. Thus, even though the use of such invariant characterization as part of the evidence gathering process may involve some extra computational load, keeping the dimensionality of the process independent of the shape’s complexity and the transformations it might have undergone, provides a solid framework towards alleviating the computational load associated to HT-like approaches. The idea above has motivated several of such invariant characterizations of shapes to be investigated and incorporated to the GHT. In [1], sets of points from a given shape defining angles and parallel lines are advised as geometric arrangements whose properties do not vary under similarity and affine transformations respectively. A similar approach is presented in [5], while the concept of curvature is utilized in [10] instead. In this paper, we present a set of invariants to similarity transformations based on Fourier descriptors and how they can be integrated in the GHT (section 2). Unlike those consisting of arbitrarily far away points within a shape, our invariants are local, as long as only small segments around the point under study are needed, which will augment the robustness of the method to occlusion. Moreover, Fourier descriptors are well known and easy to compute. Experimental results are described in section 3.

2. LIGHT In this section, we introduce our local invariant generalized Hough transform. First, some elliptic Fourier descriptors are presented, and then we explain how they can be applied, as invariant local features, to reduce ambiguity in the evidence gathering process of the GHT and limit its computational cost at the same time.

2.1

Elliptic Fourier descriptors

Consider a curve under analysis. We can extract, around any given point on it, a small symmetric segment. Then, by concatenating this small segment with its inverted version, a closed curve c(t) with twice the length of the original segment is constructed. For the shake of simplicity, we can go one step further and assume c(t) periodic, where the parameter t corresponds to the arc-length along the curve and the period T coincides with the length of a full cycle over the closed curve. At this point, we need a way to characterize c(t) in an invariant fashion. As only small segments around the point considered are used, such characterization would be local which, as we hinted before and will demonstrate later, is a particularly advantageous property. In our approach, we accomplish this by means of a set of elliptic Fourier descriptors. In the 2D image plane, c(t) can be described in terms of the well known Fourier series as:     ∞   1 ax0 X axk bxk cos(kt) x(t) c(t) = = + (1) y(t) ayk byk sin(kt) 2 ay0 k=1

Each of the terms of the sum above represents an elliptic phasor in 2D, and the coefficients can be calculated as: T

2 axk = T

Z

2 T

Z

ayk =

0

2 x(t) cos(kt)dt , bxk = T

Z

2 T

Z

T

y(t) cos(kt)dt , byk = 0

T

x(t) sin(kt)dt 0 T

y(t) sin(kt)dt 0

(2) It is then straight forward to prove that coefficients of the form: q q a2xk + a2yk b2xk + b2yk |Ak | |Bk | Ck = + = q +q (3) |A1 | |B1 | a2 + a 2 b2 + b 2 x1

y1

x1

y1

are invariant to scale and rotation. In our method, these elliptic descriptors provide a local invariant representation for every point in the contour of the shape under study. There are, however, some practical considerations to the equations above, resulting from the discrete nature of our data.

First, for implementation purposes, rather than the formulation in equations 1 and 2, the discrete Fourier transform (DFT) equivalents were used. Accordingly, the number of coefficients like those in equation 3 that can be obtained is limited by the number of samples from c(t) available, and therefore by the length of the segments considered. Ideally, we would want small segments so that descriptors are more local. However, as detailed later, there exists a tradeoff between how local and how accurate the descriptors are. The second consideration arises from the samples of c(t) not being equispaced, as two connected points along a √ line are 1 pixel away while two points along a diagonal are 2 pixels apart. Thus, some interpolating that guarantees a set of uniformly spaced samples is needed prior to applying the DFT ([9]).

2.2

A local similarity invariant GHT

As four parameters (translation(2), rotation and scale) define a similarity transformation, an o(n4 ) algorithm (where n is the grid for each of the four parameteres) is needed to directly apply the GHT in this situation. We describe here how, using the invariants introduced in section 2.1, it is possible to confine the complexity of the algorithm to o(r·c), where r and c are the number of rows and columns in the target image. In our discussion, we assume edge images with tangent information available. Regarding the generation of the RTable, a vector v of k elliptic descriptors, as the ones discussed above, is obtained for each edge point in the model and used as an index. Accordingly, our table will contain as many entries as points there are in the model. Each entry i, apart from the vector of descriptors vi , will include information on the absolute tangent at that point φi , as well as on the length di and direcction αi , relative to the absolute tangent, of the line between the point and the reference point of the model. As for the voting process, the algorithm consists in repeating, for every edge point pj with tangent φj in the input image: 1. Obtain a vector of k Fourier descriptors (vj ). It is important that the segments around the point used for this purpose have the same length as those used when generating the R-Table. Let us call such length l. 2. For every entry i in the R-Table, calculate the error vector (eij = vj −vi ) and the associated weight wij . 3. For every entry i, calculate the location of the reference point by means of φj , αi and di , and increase the 2D accumulator at such location by wij . It is important to notice that the resulting 2D accumulator will only solve for the location of the shape. However, the

values of the location parameters, as well as φi and di , can easily be exploited to solve for the rotation and scale, while the complexity of the algorithm remains o(r ·c) . The reader is refered to [1] for an extended discussion on this idea. As only points in the template for which the invariant Fourier descriptors coincide with those of a given point are considered, the ammount of false evidence is reduced enormously. In practice, a certain bias in the vectors, represented by wij , is accounted for. Section 3 describes how these weights, as well as the parameter l, are optimally tuned for our experiments via analysis of the available datthereforea. At this point, it is worth remarking how, even if severe occlusion occurs, as long as a small segment of size l around a point remains uncorrupted, evidence will be gathered for the corresponding point in the template. This behaviour, virtue of the local nature of our descriptors, doesn’t apply to invariants defined as arrangements of arbitrarily distant points [1][5].

3. Experiments and discussion In this section, we present some experiments designed to evaluate our algorithm. First, we use a directory of 1100 synthetic fish images, show how our system can be optimized for that type of data and how it performs on it. Due to copyright issues, only the binary images and the corresponding contours resulting from edge extraction on the originals are shown in this work. Details on this database can be found in [11]. Then, one more experiment on real data is discussed.

3.1. Optimizing the system on the database As we adverted in section 2, there are a two parameters in our algorithm that must be tuned for optimal performance: l and the weights wij . The former is related to k via the theory of the DFT. In our case, keeping in mind our segments present even symmetry around their centres, the number of samples in each of them must be at least twice the number of coefficients we intent to extract, k. For our experiments, we selected a value of k = 3. Thus, obviating for the shake of simplicity in this explanation the fact that our samples are not equally spaced, our segments would ideally be 6 pixels long. However, we found in practice that such short segments provide extremelly noisy coefficients, and therefore lead to poor performance. In order to investigate how short our segments can be without compromising accuracy, we compared the descriptors of all the points in all the original images in the database, with those of their rotated (φ = 1◦ . . . 360◦ ) and scaled (up to 20%) versions for different values of the parameter l. Figure 1 (left) shows the results of all these experiments, where the mse for the 3 descriptors is represented as a function of l. As we ex-

Figure 1. Optimizing l and wi,j pected, higher order coefficients are more prone to noise, and longer segments involve less error in the descriptors. From observation of these results, a value of l = 140 was chosen as a good compromise beetween length and accuracy. This value, although far from the one theory suggests, still allows for a local description of the points, since the number of boundary pixels in the images of the database ranges from 400 to 1600. Then, for that particular working point, we need a means to set the weights wij . Being k = 3, our experiments provided us with a 3D probability function of the error in the coefficients for l = 140. Given the difficulty to plot 3D data, Figure 1 (right) portrays this probability function for a particular value of the third component of the error vector coinciding with its mean in that direction. We can directly relate the weight associated to an error vector with the corresponding probability. Thus, a certain wij will indicate how likely it is, given the error vector eij = vj −vi (see section 2), that the point pj in the image corresponds to the point in the ith entry of the R-Table. In this way the uncertainty associated to the calculation of our elliptic Fourier coefficients, is incorporated to the evidence gathering process.

3.2. Results on the database Having assessed our system for its best possible performance, several experiments were conducted on the database. We argued throughout this paper how the tolerance of our method to severe occlusion was one of its main strengths. To back this assertion, we tested the algorithm on such situations with promising results. Figure 2 shows 5 of the shapes in the database (top right), 2 of which correspond to one particular model (see top left) that has been rotated and scaled. Despite around 50% of the contours being occluded, the targets are succesfully chosen out. The bottom left picture shows the edge image on which the extracted shapes have been over imposed. The resulting 2D accumulator indicating the location of the templates on the image with two clearly visible peaks, is depicted on the bottom right corner.

Such severe occlusion, is not likely to be accomodated by

model. The bottom of the figure shows the detected result overwritten on the edges image (left) as well as the accumulator (right). Note how, besides being robust to clutter and occlusion, the system performs well for non-rigid shapes. This tolerance to slight deformations is inherent to the Fourier transform, and is cunningly incorporated to our system via our invariant descriptors.

4. Conclusion

Figure 2. some results algorithms in which invariants for one particular point are based on other points in distant parts of the same contour. In our experiments, however, all points bounded by 140 pixels long segments are used to gather evidence. Hence the improved performance, even if less than half the points in the model are visible.

We have presented a new method for the extraction of arbitrary shapes that accounts for similarity transformations of the template. Our approach is based on the GHT, and therefore inherits its robustness, while dealing with the issue of its high computational cost in an elegant and effective manner at the same time. This is accomplished by defining a set of elliptic Fourier descriptors that are invariant to translation, scale and rotation. These descriptors allow for weak deformation of the template, and are local to the point considered, which enables our method to succeed in situations of severe occlusion where others are likely to fail. Results were presented to support our discourse.

References

3.3. Results on real images We also conducted some experiments on real data. In order for the parameters obtained above to be appropriate here as well, the images used had a size similar to those in the database. Figure 3 shows a hand selected as a model

Figure 3. some results on real images (top left), and an image in which the same hand appears rotated and partially hidden (top right). Again, despite less than half the contour being visible, the system extracts the

[1] A. S. Aguado, E. Montiel, and M. S. Nixon. Invariant characterisation of the hough transform for pose estimation of arbitrary shapes. Pattern Recognition, 35(5):1083–1097, 2002. [2] D. H. Ballard. Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition, 13(2):111–122, 1981. [3] C. Galamhos, J. Matas, and J. Kittler. Progressive probabilistic hough transform for line detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, number 1, pages –560, 1999. [4] P. Hough. Method and means for recongnizing complex patterns. U.S Patent 3,069,654, 1962. [5] A. Kimura and T. Watanabe. An extension of the generalized hough transform to realize affine-invariant two-dimensional (2d) shape detection. In 16th International Conference on Pattern Recognition ICPR, volume 1, page 10065, 2002. [6] N. Kiryati, Y. Eldar, and A. Bruckstein. A probabilistic hough transform. Pattern Recognition, 24(4):303–316, 1991. [7] P. Kultaken, L. Xu, and E. Oja. A new curve detection method: randomized hough transform. Pattern Recognition Lett, 11:331–338, 2002. [8] H. Li, M. A. Lavin, and R. J. L. Master. Fast hough transform: A hierarchical approach. Computer Vision, Graphics, and Image Processing, 36(2-3):139–161, 1986. [9] Q. H. Liu and N. Nguyen. An accurate algorithm for nonuniform fast fourier transforms (nufft). IEEE Microwave Guided Wave Letters, 8(1), 1998.

[10] S. D. Ma and X. Chen. Hough transform using slope and curvature as local properties to detect arbitrary 2d shapes. In Proc. 9th International Conference on Pattern Recognition, pages 511–513, 1988. [11] F. Mokhtarian, S. Abbasi, and J. Kittler. Robust and efficient shape indexing through curvature scale space. In Sixth British Machine Vision Conference, BMVC’96, pages 53– 62, 1996.