A Similarity-Based Aspect-Graph Approach to 3D Object Recognition

Report 4 Downloads 222 Views
A Similarity-Based Aspect-Graph Approach to 3D Object Recognition Christopher M. Cyr ([email protected]) Brown University

Benjamin B. Kimia ([email protected]) Brown University Abstract. This paper describes a view-based method for recognizing 3D objects from 2D images. We employ an aspect-graph structure, where the aspects are not based on the singularities of visual mapping but are instead formed using a notion of similarity between views. Specifically, the viewing sphere is endowed with a metric of dis-similarity for each pair of views and the problem of aspect generation is viewed as a ”segmentation” of the viewing sphere into homogeneous regions. The viewing sphere is sampled at regular (5 degree) intervals and an iterative procedure is used to combine views using the metric into aspects with a prototype representing each aspect, in a ”region-growing” regime which stands in contrast to the usual ”edge detection” styles to computing the aspect graph. The aspect growth is constrained such that two aspects of an object remain distinct under the given similarity metric. Once the database of 3D objects is organized as a set of aspects and prototypes for these aspects for each object, unknown views of database objects are compared with the prototypes and the results are ordered by similarity. We use two similarity metrics for shape, one based on curve matching and the other based on matching shock graphs, which for a database of 64 objects and unknown views of objects for the database give (90.3%, 74.2%, 59.7%) and (95.2%, 69.0%, 57.5%), respectively, for the top three matches; identification based on the top three matches is 98% and 100%, respectively. The result of indexing unknown views of objects not in the database also produce intuitive matches. We also develop a hierarchical indexing scheme the goal of which is to prune unlikely objects at an early stage to improve the efficiency of indexing, resulting in savings of 35% at the top level and of 55% at the next level, cumulatively.

1. Introduction 3D object recognition is the task of identifying a three-dimensional object among a set of known models or categories from some form of input data (e.g., images, range data). This is a difficult task due to the large number of dimensions of variability that arises from object articulation, occlusion, view point variation, illumination changes, etc. Besl and Jain [6] motivate the use of an intermediate-level representation to reduce the dimensionality of the task, where model-data and input-data can be transformed into with equal success. The type of representation used is critical in the selection and performance of optimal object recognition strategies. While there is a great deal of variety in the intermediate representations used in existing object recognition approaches, most can be viewed as either focusing on attributes of the c 2003 Kluwer Academic Publishers. Printed in the Netherlands.

2 object’s 3D geometry, or attributes of its 2D projection, classically referred to as distinction between object-centered versus view-centered representations. We briefly discuss these below as they are relevant to the development of our approach, but without emphasizing the associated matching strategies, which are covered in several surveys [6, 8]. 1.1. R EPRESENTATIONS

RELYING ON

3D

GEOMETRY

In this approach, the 3D geometry of the object is explicitly stored in a database, either by specifying it completely or by describing aspects of it in the form of capturing certain features, invariants, parts, ridges, and surface patches. For example, the 3D object shape has been reduced to a set of features in the form of critical points, corners, line segments, etc. [11, 50, 9, 4, 20, 32, 22]. While no geometric invariants exist in general to relate 3D and 2D models, a restriction of the class of models either by enforcing a generic attribute such as symmetry or by selecting particular models has been shown to lead to geometric invariants [49, 48, 10]. The part structure has also been made explicit in the 3D representation, e.g., in the GEON-based representation of Biederman [7]. Ridges on the surface have also been used as a critical feature set [19]. Partial surface descriptors have been used by Fan et al. [16], Wong et al. [51], and Flynn and Jain [17], where segmentation of a surface into patches is represented by a graph. Object surfaces have been fully represented in parametric form, e.g., by using superquadrics [3], by implicit polynomials [23], or by skeletons [29]. Drawbacks of the volumetric representation include limitations in generating models automatically, the difficulty in recovering the 3D geometry from single views, the reliability of the resulting representation, and the complexity of the matching process. 1.2. V IEW- BASED O BJECT R ECOGNITION The goal of view-based approaches is to represent a 3D object with a set of 2D views, resulting in a significant reduction in dimensionality by comparing 2D images rather than comparing 3D objects. Efficiency mandates that the complete set of views, which are redundant to some degree, must be somehow reduced to a minimal set. Appearance-based methods focus on changes in the intensity distribution in each view and use principal component analysis to capture the main directions of variations as briefly described below. In contrast, aspect-graph methods focus on changes in the projected geometry and group views bounded by transitions of the geometry on the viewing sphere into aspects, as described in detail in Section 1.3. Appearance-based Methods: Appearance-based methods rely on the similarity of the projected intensity image among neighboring views. These methods typically use a form of Principal Component Analysis (PCA) on the

3 image, which is considered as a high-dimensional vector, to determine the principal direction of variations so that only a subset of information can be retained as advocated by the eigenmodels [46, 24]. Nayar et al. [30] used Principal Component Analysis (PCA) on input color images of known objects, using only color and texture information. Specifically, they construct a vector for each normalized color channel (red, green, and blue) of images generated of known models at 7.5-degree increments. They perform PCA on these vectors to determine the principal components and project each vector into the “eigenspace”, which is of significantly lower dimension. Each object thus forms its own manifold in the eigenspace and this is done for each object in a collection of known objects. To perform recognition, PCA is performed on the three color channels of the image of an unknown object and the result is projected into the eigenspace of known objects. The object is recognized if all of its three vectors are close to the manifold formed by an object. Appearance-based methods can be sensitive to changes not embedded in the training set including lighting changes, object rotation, deformations, viewing changes, and occlusion. Shokonfandeh et al. [41] built some robustness to these variations by including a multi-scale structure using a wavelet transform. In addition, the database in appearance-based methods cannot be updated dynamically as incremental changes require a recomputation of the principal dimension. 1.3. T HE A SPECT-G RAPH R EPRESENTATION The aspect-graph representation is a viewer-centered representation of a threedimensional object. The underlying theory was introduced by Koenderink and van Doorn [26, 27], who observed that while for most views a small change in the vantage point of an object results in a small change in the shape of the projection of that object, for certain views the change in projected object shape is dramatic. These “un-stable” views represent a singularity in the visual mapping, or a transition. They suggested that a derivation of such transition boundaries is a good representation of the object. The stable views, also called general views, are what define an aspect. The aspect-graph representation of an object is a structured graph of the set of aspects of that object, where the edges of the graph are the transitions between two neighboring stable views and a change between aspects is called a visual event. The computation of an aspect-graph in applications was made possible by assuming the objects belong to a limited class of shapes, for example algorithms were developed for generating the aspect-graphs of polyhedra [43, 40], solids of revolution [14, 28], piece-wise smooth objects [42] and algebraic surfaces [31]. A survey of early results can be found in Bowyer and Dyer [8]. Previous work on generating the aspect-graph of an object has focused on

4 two major issues: (i) how to derive transitions and formally define aspects for a large class of shapes, and (ii) how to handle the problem of scale. Enlarging the class of shapes: Methods for generating the aspect-graph of polyhedra were developed in [43, 40]. Sripradisvarakul and Jain [42] developed a methodology to build an aspect-graph representation of CAD models for curved opaque objects that are at least smooth. To build their aspectgraph, they compute the aspects and stable views by using their knowledge of the potential visual events that can be generated from a smooth, curved object. By computing the boundary viewpoints from shape descriptors they partition the viewpoint space and form their graph. Kriegman and Ponce [28] also discuss the approach of computing the aspect-graph of solids of revolution generated from an algebraic curve. The singularities which give rise to visual events can then be determined by a system of equations, using an orthographic projection on transparent objects. The boundary curves on the viewing sphere which give rise to the visual events are computed and views are grouped into aspects using cylindrical algebraic decomposition and marching techniques. Ray tracing is used to remove hidden branches from the graph, and to merge neighboring aspects with equivalent views. These steps are directly equivalent to finding the roots of a system of polynomial equations, which are solved with the numerical methods of continuation and symbolic elimination theory. Petitjean et al. [31] also developed a method of creating an aspect-graph of algebraic surfaces by using singularity theory and the catalog of visual events. This algorithm was implemented and tested on various smooth objects, and resulted in intuitive aspects, with each object giving roughly 10-25 aspects per object, depending on the complexity of its shape. Despite these generalizations to fairly interesting classes of shapes, the problem of generating the aspect-graph of free-form shapes of arbitrary complexity remains unsolved. Handling scale: A second problem in the computation of aspect-graphs is the large number of aspects arising from complex shapes, e.g., with node comand for an -faced polyhedron [28, 43, 18]; while for plexity a smooth algebraic surface of degree it is and under orthogonal and perspective projection, respectively. For a piecewise smooth surface consisting of algebraic patches that is homeomorphic to a polyhedron it is under orthogonal transformation [40]. While these estimates are upper bounds, practical experience suggests that indeed the number of aspects is large. This is partially because small-scale changes that may never be visually significant create additional nodes or modify existing nodes in the aspect-graph; observe how a bumpy surface patch may create numerous aspects which might never be observable when viewed by a finite-size aperture [15, 40]. Eggert et al. [15] addressed this issue by introducing the notion of scale in their aspect-graph representation. By relaxing the measure



 

  



 









5 used to group views into an aspect, they reduce the number of views required to represent an object to a computationally feasible number. Specifically, by using a sampling rate parameter specifying singular views, the number of aspects can be balanced between inefficiently large and overly simple. Using this measure, features can be merged reducing the set of features, which determine singular views. Shimshoni and Ponce [40] present an algorithm for computing the finite resolution aspects of a polyhedron. They partition the view space into non-critical regions bounded by the transition curves using a plane-sweep algorithm such that adjacent regions with identical finite resolution views are merged. Bellaire [5] presents a method of building a hierarchical representation of an aspect-graph using edge-face relationships and combining aspects judged to be similar. Two views belong to the same aspect if the same vertices are visible in both and the same edges are visible in both, partially occluded edges can be located in both, and T-junctions divide the same edges in both views. If these criteria are met, the views are judged similar and merged. There may be aspects, which are redundant due to object symmetry. In addition, incorporating a more intelligent ordering of aspects, such as an ordered tree further reduces the size of the data needed to fully represent a 3D object. Characteristic Views: Ikeuchi and Kanade [22] describe an object recognition system similar to the aspect-graph by grouping “views” which have similar features. The features they use to decide on similarity is based on acquired photometric stereo images, such as face movement, face relationships, face shape, edge relationships, an extended Gaussian image (a histogram of surface orientations), and surface characteristic distribution (such as planar, cylindrical, elliptic, or hyperbolic). They use these features to group similar data and form an interpretation tree for use in recognition. The interpretation tree is formulated in two parts, one on classifying the unknown view into the correct aspect, then the second part determines the actual position of the unknown view in that aspect. The success of this approach, like other featurebased systems, dominated by the performance of choosing features. Unstable features chosen between views can cause significant problems. Several other approaches are relevant. Burns and Riseman represent polyhedral models by a network of 2D descriptions for expected views. Dickinson et al. [13] use a probabilistic model to relate characteristic views of volumetric primitives describing parts of objects. Weinshall and Werman [47] provide a formal analysis of view likelihood and view stability which can be used to determine characteristic views. Seales and Dyer [34] use the features extracted from the occluding contours of polyhedral objects to determine viewpoint. The features they use are contour T-junctions and the arrangements of the sections of the contour. These features give rise to regions where the viewpoint of the object gives rise to the same occluding features (the T-junctions). This is analogous to

6 the aspect-graph representation in that there are regions of views which are judged equal using some measure, in this case the set of features between views. 1.4. P SYCHOLOGICAL F OUNDATION There has been a fair amount of psychological research performed on the role aspects play (or more generally a single view representing that plays) in object recognition since Koenderink and van Doorn [26] first presented them. Bajcsy and Solina [2] discuss Rosch’s hypothesis [33] regarding the process of storing an object in memory, which states that objects are formed into a set of categories, where basic categories have the highest level of abstraction, i.e., a generalized outline form. As an example, furniture is more general then a chair which is more general than an arm chair. Objects are recognized by comparing the unknown object to these levels of categories with an allowable amount of deformation. This hypothesis is similar to the aspect-graph representation of shape, in that each shape is represented by a prototypical shape (or view). Bajcsy describes an object recognition procedure which is built around Rosch’s hypothesis. The unknown model is recognized by comparing it against the known objects’ basic categories (prototypes), and the prototype which requires the least deformations, is chosen as the recognized object. Stone [44] explored whether people recognize objects using certain views (view-specificity hypothesis) or by a series of rotational views (motionspecificity hypothesis). By conducting experiments on whether subjects could recognize object as successfully using forward rotation as backward rotation, it was found that information from motion plays a part in object recognition as well as view-specificity recognition, such as prototypical view storage. Tarr and Kriegman [45] performed research on the definition of a view in the context of human perception. They performed five experiments using objects with known aspect graphs and tested the recognition capabilities of people using these objects. They found that people are most sensitive to the viewpoint changes of the objects at the transitions predicted by the aspect graph structure. This lends credence to the theory of using these views as the basis for object recognition. 1.5. OVERVIEW

OF OUR APPROACH

The goal of the aspect-graph representation is to partition the viewing space into a minimal set of views that can be distinguished as a group to determine view transitions, corresponding to visual events, e.g., as a new part comes into view. Since traditional methods based on the singularities of visual mapping are not currently applicable to complex free-form objects and also often result in numerous aspects in practice, we adopt an approach based on grouping views into aspects using a notion of similarity between views. The process of

7

Figure 1. This database of 64 objects is constructed for 3D object recognition experiments. Images with the black background are 3D VRML models whose views are generated by projection. Images with the red background are real objects (toys) whose views are generated using a turntable and digital camera. The database features both biological shapes (animals) as well as man made objects. Views were taken at 15 degree increments.

8 constructing an aspect graph can be viewed as a segmentation of the viewing sphere. One can then abstractly view the singularity-based aspect generation approach as performing “edge detection” on the viewing sphere by analyzing differences between projections of the 3D object. In contrast, the aspect generation method of using similarity of adjacent views can be viewed as a “region-growing” segmentation approach, which has two distinct advantages. First, the salience of a singularity in the visual mapping is related not only to its own significance but also to the lack of such events in its neighboring views. Second, the grouping of similar views can be done exclusively in the domain of 2D images without requiring or restricting 3D representations of shape. The paper is organized as follows. Section 2 discusses the notations, how aspects and characteristic views for each are generated based on the similarity metric in a manner that keeps each aspect distinct under this metric. Section 3 defines the concept of an aspect-separable grouping in a database, which ensures that recognition by rank-ordered similarity-based indexing in the collection of characteristic views correctly identifies the object. Section 4 recognizes the need for early pruning of distant characteristic views by hierarchically discarding aspects in several scales. Section 5 describes the results of experiments with a database of 64 synthetic and real 3D objects, with a discussion on the effect of using different metrics. This work was previously introduced in a conference paper [12].

Figure 2. Projections are taken of the object at regular intervals and this set of 2D views is used to represent the object.

9

Figure 3. Two objects (“kangaroo” and “bull”) from the database are depicted on the left. On the right we show a range of views from the ground plane in five degree increments. Circled views represent prototypical views. Aspect boundaries are vertical lines in green.

2. Similarity-based Aspect-Graph 2.1. N OTATIONS

AND

D EFINITIONS

The process of generating views, aspects, characteristic views, and shape changes between views is illustrated in several figures before the notation is formally described. Figures 2, 3, and 4 highlight the generation of views, show an example of aspects and characteristic views for each aspect for two objects, and depict how the shape changes as the angle at which it is viewed is changed, respectively. The shape generally holds a level of consistency and gradual change until a significant change takes place, e.g., as a leg becomes unoccluded. By measuring this similarity between adjacent views aspects can be generated. Formally, we require that the viewing sphere be endowed with a metric indicating “distance” between two views which measures the dissimilarity between the projected views of the object. We shall use two such metrics in this paper, one based on curve matching and the other based on shock matching; however, other metrics can also be used.



10

Figure 4. An example of how the projected shape of a 3D object changes as the viewing angle changes, and how significant shape changes can be used to determine borders of aspects. The results of our approach is shown in the bottom where each aspect is covered by a distinct bar and where the characteristic view for that aspect is circled. Object is Represented by Individual Views

Object is Represented by Views Grouped into Aspects

Figure 5. The aspect-graph representation of a 3D object by 2D views. The object being considered (rhino model in this case) is represented initially by all views along the ground-view, but once the aspect-graph of the model is derived, views judged similar by some metric are grouped and represented by a prototypical view. Each small circle represents a view, and views in the same aspect are showing the same color, with the characteristic views of each aspect filled in.



   





 





 , which comprise objects Let there be   

 an object database, Figure 1. Each object is composed of M views sampling         where denotes the  view of the viewing sphere,   object  . Therefore, the full database consists of  views. For simplicity we take these views along the ground-view in this paper. The goal of our approach is to minimize the set of views required to represent each object  . The aspect-graph is a graph where each of the nodes is a prototypical or









11 characteristic view representing one or more neighboring views which have been grouped. Thus, instead of representing each object  with the full set of  views, a reduced set of characteristic views is used, Figure 5. We represent   aspect of object  as   , a collection of views ranging from   to         and represented by the characteristic view  , where ( , ) are the left and right aspect radii,



       

              , which is the dis We represent the dis-similarity of two views as  tance between the   view of object  and the   view of object  . The

  





  



   







dis-similarity measured is required to be a metric, in our case a curve-based or a shock-based segmented shape dis-similarity metric. The metric can also include non-shape properties, but the properties of a metric cannot be violated in incorporating additional features as these properties are crucial to the approach. 2.2. G ENERATION

OF

A SPECTS

AND

C HARACTERISTIC V IEWS

We impose several criteria which are required in order to maintain successful object recognition while forming aspects represented by characteristic views. First, we require views to monotonically change within each aspect starting from the aspect’s characteristic view. 

 CRITERION 1. (Local Monotonicity) For each characteristic view  , there exists an integer   such that

  

 

   





  

 

   

 

!"#$ 

(1)

This criterion is a natural one to expect: the dis-similarity of two views increases as their relative viewing angle increases, at least for some range of angles. The criterion relies on a sufficient sampling rate, i.e., that for the views between two samples views the monotonicity condition still holds. In absence of a view “sampling theorem” we state an assumption: ASSUMPTION 1. (Sampling sufficiency): The view-sampling rate is sufficiently high that unsampled views between sampled views also satisfy the monotonicity criterion. Given that we have sampled our views every five degrees, we have not noted the appearance of “surprise” views. A violation of this assumption may lead to recognition errors. A further formalization of a “sampling theorem” clearly depends on the shape of the objects and is a challenging task. Observe that as the difference between the viewing directions increases, the dis-similarity is eventually reduced, thus limiting the size of an aspect.

12

Figure 6. Criterion 2 constrains the upper limits of the boundary views and  of an aspect for each view; and  are the aspect radii, i.e. views most distant from the characteristic view candidate which are below the value of the global minimum outside the monotonic

and  , but only the set satisfying region. Typically several values are possible, e.g., criterion 1 is used. The above graph is generated from the ape model.

An effective dis-similarity metric is one with a large basin of monotonically distant views. While the monotonicity condition limits the size of an aspect centered around a candidate characteristic view, the metric of dis-similarity should be able to differentiate between views that should be considered in the same aspect and other views. The global minimum dis-similarity outside the local monotonic region places a second constraint on the size of each aspect. This value is used to restrict the largest aspect to be the local area satisfying, CRITERION 2. (Object-Specific  Distinctiveness of Aspect For    Views):   each       aspect   with prototypical view  , we must have        for any    , and     , i.e.,



    





 

 

   



     

 

 



  







(2)

  That is, the distance of each view  in an aspect   and the characteristic  view of that aspect is smaller than   the distance between any non-aspect  view  and the characteristic view  , Figure 6.

This constraint leads to a few possible values for aspect boundaries. These values are dependent on the characteristic view in relation to all other views. Together with the monotonicity criterion, we derive a pair of unique “upper”

13 

Table I. Table of curve-based distance between each view, at increments, of the fish object. For each row, costs in bold specify views that are located within the aspect boundaries of the view represented in that row. In our experiments we include a threshold of noise of 5% of the cost, which operates by discounting increases or decreases in cost by less than 5%. Thus, each candidate characteristic view has a pair of upper limits for the size of aspect it can represent. 0 0

15

30

45

60

75

90

105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345

0.00 0.38 0.34 0.48 0.81 0.83 0.71 0.59 0.51 0.46 0.24 0.19 0.39 0.29 0.45 0.65 0.65 0.54 0.59 0.54 0.47 0.42 0.30 0.24

15 0.38 0.00 0.25 0.30 0.43 0.47 0.44 0.37 0.34 0.33 0.34 0.41 0.24 0.27 0.29 0.60 0.62 0.70 0.70 0.70 0.50 0.43 0.26 0.54 30 0.34 0.25 0.00 0.16 0.29 0.28 0.33 0.25 0.18 0.17 0.32 0.41 0.38 0.42 0.26 0.33 0.58 0.54 0.61 0.61 0.39 0.38 0.49 0.48 45 0.48 0.30 0.16 0.00 0.19 0.22 0.19 0.15 0.12 0.24 0.42 0.51 0.47 0.50 0.32 0.43 0.37 0.24 0.52 0.27 0.40 0.41 0.35 0.53 60 0.81 0.43 0.29 0.19 0.00 0.11 0.11 0.14 0.18 0.32 0.51 0.53 0.57 0.51 0.51 0.33 0.28 0.23 0.15 0.18 0.39 0.40 0.45 0.59 75 0.83 0.47 0.28 0.22 0.11 0.00 0.09 0.17 0.20 0.38 0.62 0.57 0.62 0.49 0.46 0.30 0.23 0.17 0.20 0.23 0.32 0.29 0.47 0.57 90 0.71 0.44 0.33 0.19 0.11 0.09 0.00 0.18 0.22 0.47 0.48 0.61 0.68 0.51 0.42 0.31 0.18 0.15 0.45 0.24 0.45 0.36 0.49 0.56 105 0.59 0.37 0.25 0.15 0.14 0.17 0.18 0.00 0.13 0.37 0.52 0.53 0.68 0.49 0.45 0.26 0.37 0.27 0.40 0.43 0.42 0.33 0.52 0.59 120 0.51 0.34 0.18 0.12 0.18 0.20 0.22 0.13 0.00 0.17 0.41 0.44 0.47 0.31 0.30 0.19 0.58 0.65 0.61 0.29 0.61 0.35 0.48 0.51 135 0.46 0.33 0.17 0.24 0.32 0.38 0.47 0.37 0.17 0.00 0.29 0.38 0.43 0.24 0.28 0.48 0.58 0.57 0.65 0.58 0.41 0.39 0.26 0.29 150 0.24 0.34 0.32 0.42 0.51 0.62 0.48 0.52 0.41 0.29 0.00 0.33 0.32 0.20 0.30 0.61 0.67 0.73 0.68 0.74 0.35 0.32 0.34 0.40 165 0.19 0.41 0.41 0.51 0.53 0.57 0.61 0.53 0.44 0.38 0.33 0.00 0.17 0.23 0.36 0.68 0.73 0.94 1.00 1.00 0.50 0.41 0.40 0.16 180 0.39 0.24 0.38 0.47 0.57 0.62 0.68 0.68 0.47 0.43 0.32 0.17 0.00 0.17 0.36 0.42 0.50 0.59 0.58 0.51 0.56 0.36 0.27 0.22 195 0.29 0.27 0.42 0.50 0.51 0.49 0.51 0.49 0.31 0.24 0.20 0.23 0.17 0.00 0.20 0.34 0.45 0.50 0.54 0.56 0.40 0.32 0.41 0.31 210 0.45 0.29 0.26 0.32 0.51 0.46 0.42 0.45 0.30 0.28 0.30 0.36 0.36 0.20 0.00 0.25 0.35 0.49 0.37 0.31 0.26 0.22 0.34 0.37 225 0.65 0.60 0.33 0.43 0.33 0.30 0.31 0.26 0.19 0.48 0.61 0.68 0.42 0.34 0.25 0.00 0.21 0.21 0.19 0.21 0.20 0.26 0.43 0.37 240 0.65 0.62 0.58 0.37 0.28 0.23 0.18 0.37 0.58 0.58 0.67 0.73 0.50 0.45 0.35 0.21 0.00 0.15 0.17 0.16 0.17 0.20 0.43 0.48 255 0.54 0.70 0.54 0.24 0.23 0.17 0.15 0.27 0.65 0.57 0.73 0.94 0.59 0.50 0.49 0.21 0.15 0.00 0.11 0.16 0.17 0.22 0.48 0.59 270 0.59 0.70 0.61 0.52 0.15 0.20 0.45 0.40 0.61 0.65 0.68 1.00 0.58 0.54 0.37 0.19 0.17 0.11 0.00 0.12 0.20 0.30 0.56 0.57 285 0.54 0.70 0.61 0.27 0.18 0.23 0.24 0.43 0.29 0.58 0.74 1.00 0.51 0.56 0.31 0.21 0.14 0.14 0.12 0.00 0.20 0.33 0.53 0.55 300 0.47 0.50 0.39 0.40 0.39 0.32 0.45 0.42 0.61 0.41 0.35 0.50 0.56 0.40 0.26 0.20 0.17 0.17 0.20 0.20 0.00 0.14 0.46 0.43 315 0.42 0.43 0.38 0.41 0.40 0.29 0.36 0.33 0.35 0.39 0.32 0.41 0.36 0.32 0.22 0.26 0.20 0.22 0.30 0.33 0.14 0.00 0.42 0.35 330 0.30 0.26 0.49 0.35 0.45 0.47 0.49 0.52 0.48 0.26 0.34 0.40 0.27 0.41 0.34 0.43 0.43 0.48 0.56 0.53 0.46 0.42 0.00 0.42 345 0.24 0.54 0.48 0.53 0.59 0.57 0.56 0.59 0.51 0.29 0.40 0.16 0.22 0.31 0.37 0.37 0.48 0.59 0.57 0.55 0.43 0.35 0.42 0.00

aspect boundaries. Tables I and II show examples of pairwise distance between views, with boldface numbers indicating the computed boundaries of an aspect of an object in each row with the central view being the characteristic view, using curve matching and shock matching, respectively, Section 5.1. With these “upper bounds” for the possible aspects of each view, we use an iterative scheme, much like the “seeded region-growing” method used

14  

Table II. Table of shock-based distance between each view, at increments, of the fish object. Costs in bold specify views that are located within the aspect boundaries. In our experiments we include a threshold of noise of 5% of the cost, which operates by discounting increases or decreases in cost by less than 5%. Note that compared to curve-based distance results the upper limits of aspect size are larger. 0 0

15

30

45

60

75

90

105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345

0.00 0.55 0.83 0.81 0.89 0.89 0.94 0.93 0.87 0.81 0.46 0.37 0.38 0.59 0.77 0.96 1.00 1.00 0.89 0.93 0.79 0.69 0.49 0.35

15 0.55 0.00 0.51 0.75 0.86 0.84 0.85 0.82 0.80 0.65 0.56 0.52 0.50 0.50 0.74 0.84 0.92 0.93 0.84 0.84 0.91 0.52 0.54 0.57 30 0.83 0.51 0.00 0.60 0.70 0.71 0.72 0.69 0.59 0.53 0.71 0.76 0.70 0.63 0.78 0.72 0.82 0.87 0.72 0.80 0.86 0.72 0.72 0.71 45 0.81 0.75 0.60 0.00 0.45 0.50 0.53 0.60 0.54 0.53 0.74 0.81 0.81 0.78 0.77 0.68 0.80 0.90 0.78 0.81 0.89 0.77 0.76 0.78 60 0.89 0.86 0.70 0.45 0.00 0.47 0.44 0.49 0.55 0.68 0.82 0.85 0.82 0.86 0.80 0.78 0.82 0.87 0.89 0.82 0.91 0.79 0.89 0.84 75 0.89 0.84 0.71 0.50 0.47 0.00 0.42 0.45 0.58 0.71 0.72 0.81 0.79 0.84 0.74 0.77 0.83 0.90 0.80 0.81 0.88 0.78 0.90 0.79 90 0.94 0.85 0.72 0.53 0.44 0.42 0.00 0.45 0.55 0.67 0.80 0.85 0.87 0.85 0.70 0.75 0.81 0.82 0.80 0.76 0.90 0.80 0.92 0.84 105 0.93 0.82 0.69 0.60 0.49 0.45 0.45 0.00 0.56 0.67 0.75 0.84 0.87 0.81 0.73 0.80 0.87 0.86 0.85 0.85 0.93 0.86 0.91 0.87 120 0.87 0.80 0.59 0.54 0.55 0.58 0.55 0.56 0.00 0.50 0.71 0.83 0.82 0.84 0.76 0.74 0.79 0.84 0.82 0.78 0.81 0.75 0.94 0.83 135 0.81 0.65 0.53 0.53 0.68 0.71 0.67 0.67 0.50 0.00 0.55 0.74 0.75 0.74 0.73 0.74 0.83 0.90 0.83 0.84 0.84 0.68 0.78 0.75 150 0.46 0.56 0.71 0.74 0.82 0.72 0.80 0.75 0.71 0.55 0.00 0.36 0.42 0.57 0.73 0.85 0.87 0.89 0.77 0.86 0.80 0.65 0.46 0.43 165 0.37 0.52 0.76 0.81 0.85 0.81 0.85 0.84 0.83 0.74 0.36 0.00 0.35 0.53 0.83 0.90 0.94 0.95 0.87 0.91 0.90 0.72 0.49 0.42 180 0.38 0.50 0.70 0.81 0.82 0.79 0.87 0.87 0.82 0.75 0.42 0.35 0.00 0.47 0.77 0.92 0.96 0.97 0.85 0.88 0.84 0.63 0.43 0.41 195 0.59 0.50 0.63 0.78 0.86 0.84 0.85 0.81 0.84 0.74 0.57 0.53 0.47 0.00 0.65 0.71 0.80 0.79 0.74 0.70 0.78 0.66 0.62 0.52 210 0.77 0.74 0.78 0.77 0.80 0.74 0.70 0.73 0.76 0.73 0.73 0.83 0.77 0.65 0.00 0.48 0.65 0.64 0.76 0.65 0.82 0.60 0.78 0.81 225 0.96 0.84 0.72 0.68 0.78 0.77 0.75 0.80 0.74 0.74 0.85 0.90 0.92 0.71 0.48 0.00 0.51 0.52 0.57 0.57 0.61 0.65 0.83 0.84 240 1.00 0.92 0.82 0.80 0.82 0.83 0.81 0.87 0.79 0.83 0.87 0.94 0.96 0.80 0.65 0.51 0.00 0.46 0.52 0.53 0.57 0.68 0.91 0.94 255 1.00 0.93 0.87 0.90 0.87 0.90 0.82 0.86 0.84 0.90 0.89 0.95 0.97 0.79 0.64 0.52 0.46 0.00 0.49 0.39 0.53 0.71 0.96 0.93 270 0.89 0.84 0.72 0.78 0.89 0.80 0.80 0.85 0.82 0.83 0.77 0.87 0.85 0.74 0.76 0.57 0.52 0.49 0.00 0.36 0.50 0.64 0.88 0.82 285 0.93 0.84 0.80 0.81 0.82 0.81 0.76 0.85 0.78 0.84 0.86 0.91 0.88 0.70 0.65 0.57 0.53 0.39 0.36 0.00 0.44 0.62 0.88 0.89 300 0.79 0.91 0.86 0.89 0.91 0.88 0.90 0.93 0.81 0.84 0.80 0.90 0.84 0.78 0.82 0.61 0.57 0.53 0.50 0.44 0.00 0.40 0.83 0.81 315 0.69 0.52 0.72 0.77 0.79 0.78 0.80 0.86 0.75 0.68 0.65 0.72 0.63 0.66 0.60 0.65 0.68 0.71 0.64 0.62 0.40 0.00 0.64 0.68 330 0.49 0.54 0.72 0.76 0.89 0.90 0.92 0.91 0.94 0.78 0.46 0.49 0.43 0.62 0.78 0.83 0.91 0.96 0.88 0.88 0.83 0.64 0.00 0.40 345 0.35 0.57 0.71 0.78 0.84 0.79 0.84 0.87 0.83 0.75 0.43 0.42 0.41 0.52 0.81 0.84 0.94 0.93 0.82 0.89 0.81 0.68 0.40 0.00

for segmentating images to group views into aspects. Each view is initially considered a distinct aspect characteristic view. Next, we select the two characteristic views with the lowest global distance are chosen as the initial candidates to be merged. In general, two aspects can be merged as long as the upper bounds are not violated for each view in the merged aspects. In each iteration the two most similar aspects which can be merged are grouped together. The characteristic view for the new aspect is the view which mini-

15 mizes the distance to all other views in the aspect. The similarity between two aspects is defined as the similarity of their respective characteristic views. It is possible that for a given object there may be several non-neighboring view classes with similar characteristic views. This will restrict the degree of grouping that can be performed with these view classes. However, this is only likely to occur for highly regular “featureless” shapes and will not be a problem in databases containing complex free-form shapes. Table III. The aspect-combination algorithm used to merge the views of an object into aspects. 1. Consider every view to be an aspect and its own characteristic view. An object with views will initially contain aspects composed of 1 view. 2. Compute the distance between each neighboring characteristic view. 3. Select the pair of characteristic views with the minimal distance: a) If they are within each others groupable boundaries, combine the two aspects into a single aspect. b) The view with the minimal distance to other view in the new group/aspect becomes the new characteristic view for that aspect 4. Recompute the distances between the neighboring characteristic views. 5. Repeat steps 3 and 4 with the new characteristic views of the formed aspects. 6. The process ends when there are no aspect views that can be grouped without violating the aspect boundaries.

This process is repeated iteratively until all views which can be grouped without crossing “upper boundaries” are merged. The end result is a set of aspects and associated characteristic views for each aspect, with the number of aspect depending on the complexity of the object as well as the sensitivity of the shape dis-similarity metric. The results of the grouping is not unique to the order by which the classes are processed since each object is handled in an independent manner and the order of merging is determined by the shape distances. Figure 7 shows the characteristic views of various objects in the database, located by using two shape metrics, curve matching and shock matching. The differences that are due to the different metrics such as the number of characteristic views per object, will be discussed in detail in Section 5.1. These characteristic views are then used in the recognition framework described in Section 3. The traditional aspect graph creates a full graph structure with stable views as nodes with edges representing the transitions between stable views. Currently, in our approach nodes represents views we judge stable. However, the edges are not used in the recognition phase, only in the merging phase (described in Section 4). We feel that these edges and transitions will be of use when the process is moved to the full viewing sphere.

16

Model

Curve-Based Characteristic Views

Shock-Based Characteristic Views

Figure 7. An example of prototypical views chosen using curve matching (left) and shock matching (right) on various models included in our database. Observe that the shock-based metric consistently produces fewer characteristic views than curve matching. Fewer characteristic views with the same recognition capability implies that shock matching provides more efficient characteristic view generation for use in recognition.

17

Object 1

Object 1

Object 2

Object 2

Object 3 Object 3

Object 4 Object 4

Figure 8. An illustration of the construction of our database. To construct the database of objects, the prototypical views (shown as the solid circles) of each object are used to represent the object.

3. 3D Recognition by Matching 2D Characteristic Views The database of object views used in recognition is then the combined pool of characteristic views resulting from the aspect-graph generation phase 1 , Figure 8. The constraint that “upper bounds” for each aspect are not violated in the aspect generation ensures that the distance between each view in an aspect and its characteristic view is guaranteed to be less than the distance between this view and any other aspect’s characteristic view. This is a vital element which for an aspect-separable database leads to successful recognition. 1 Since the focus of this research is on the performance of an aspect-based recognition scheme, we assume that errors due to segmentation and acquisition are not present.

18 

DEFINITION 1. (Aspect Radius): The radius of an aspect   is the max  imum distance of the views in the aspect from its characteristic view  , i.e.,

  

      

 



 

  

   



 





 

   .





 

DEFINITION 2. (Aspect-Separable): Consider a database of objects     with associated aspects   and characteristic projecting views    views   satisfying criteria 1 and 2. The grouping of object views into    aspects is aspect-separable if for any pair of distinct characteristic views    and we have



 

 









  

   



 for all pairs of distinct aspects   and   .

(3)





THEOREM 1. Given an unknown view  , selected from an aspect-separable grouping of a database, the object identity  and aspect  is correctly identified by the characteristic view which has the minimum shape metric value to the unknown view, i.e.,

    

 

 



   

 

  

Proof 1 Without loss of generality, let    be the most distant view from      

  in aspect such that . For all characteristic views      

 distinct from  , by the triangle inequality and monotonicity



 













  







             

      $ 

  

 



  

Since, by hypothesis

  

 

 





     



 









we have by summing Inequalities (4) and (5) and subtracting have

Finally, since we have



   

 

   



 



 

  

 





 



  



 

  





   ,



 











(4)

(5)



   









we



 





19

Object 1

Object 2

Object 3

Object 4

Figure 9. In an aspect-separable grouping, recognition will be successful if the distance between an unknown view (red square, object 1) and the characteristic view of the aspect it belongs to (match represented by blue line) is less then the distance to all other characteristic views (all other lines).

It is obvious that the success of recognition is dependent on the distinctness of the views produced from the objects in the database which allows for aspect-separable grouping. This is the case in all recognition systems, i.e., that if the set of objects are very similar in nature, the recognition performance of a view-based approach will suffer. Figure 9 illustrates the process of recognition, i.e., that the characteristic views of each object are used to recognize a novel view and that the success of matching is contingent upon the unknown view having the lowest cost between itself and the characteristic view of it aspect. While the process we have described here is relegated to the ground-level of the viewing sphere, it is conceptually not difficult to extend this to views from the entire sphere. Instead of a right and left boundary at each view, there would be a closed 2D boundary-curve surrounding each viewpoint. Again a region-growing approach would be used to iteratively merge the views with the lowest dis-similarity within each other’s boundaries, the main difference being that this would be a true 2D region growing algorithm. We have performed some preliminary experiments with very encouraging results.

20 4. Hierarchical Recognition The task of matching each unknown view to all characteristic views can be time consuming relative to the on-line nature of matching the query. Similarly, the task of matching each pair of views for an object can be time consuming relative to the off-line nature of database construction. The fact that we have used a metric in the recognition approach also allows for significant pruning of unlikely options using the triangle inequality. This is done by re-posing the current task of finding the nearest characteristic views to the unknown view to the task of pruning unlikely aspects in a pre-processing step. This pre-processing is based on two ideas. First, the notions of aspect boundary for each characteristic  view in Definition 1 and aspect   separability ensure that if for some view  and some characteristic view  we have





 

 

  

 

 

(6)

it is certain that   is not a view in that aspect. Moreover if this holds for all  prototypical views  of an object  , the object  can be ruled out altogether. Second, we now extend the process of merging aspects, that was performed during the database construction phase, to continue beyond the point where it had stopped. This process had been stopped via Theorem 1 to guarantee the result of the “nearest neighbor match”. Since the task has now been changed to “prune unlikely matches”, the grouping process can continue by disregarding the upper boundaries, defined by Criterion 2, in order to obtain aspects of coarser granularity. The grouping continues until the number of aspects of each object reaches a fixed number 2 . This coarse-scale representation of aspects with an associated characteristic view and radius for each, allows the unlikely aspects to be pruned quickly without evaluating a distance to all characteristic views. Specifically, the un  known view  is only compared to the characteristic views in the coarsescale representation. If Equation 6 is satisfied, that aspect, and thus the associated finer-scale aspects, are all removed from further consideration. This can be done hierarchically for compounded savings.

5. Experimental Results To test the success of the object recognition process described here we generated unknown views of objects in our database by taking both projections of the synthetic data and pictures of the real data at random viewing angles. We constructed two databases of prototypes, one using curve matching and one 2

Alternatively, grouping can continue until the radius reaches a pre-determined number.

21 Curve-Based Results

Shock-Based Results

Unknown Match 1 Match 2 Match 3 Match 1 Match 2 Match 3

Cost:

0.202

0.316

0.335

0.207

0.215

0.225

Cost:

0.258

0.479

0.711

0.073

0.153

0.185

Cost:

0.355

0.378

0.413

0.075

0.189

0.202

Cost:

0.197

0.199

0.418

0.221

0.358

0.410

Cost:

0.339

0.385

0.394

0.180

0.247

0.250

Cost:

0.187

0.213

0.353

0.342

0.351

0.411

Cost:

0.109

0.253

0.283

0.153

0.261

0.265

Cost:

0.098

0.163

0.217

0.153

0.199

0.205

Cost:

0.159

0.570

0.614

0.267

0.377

0.413

Cost:

0.221

0.245

0.347

0.093

0.099

0.101

Figure 10. The result of matching using the shock-based metric (left) and the curve-based metric (right). Unknown views of unknown objects are matched against stored prototypes and ordered by increasing distance. The top three matches are shown. Note that while the absolute values of costs in the two kinds of metrics cannot be directly compared, shock-matching is shown to be more effective in rank-ordering the results. This might seem counterintuitive at a first glance since the top three matches of the curve-based results generally appear to correspond better to the query. However, this is mainly due to the greater capability of shock matching to represent a larger aspect with a single prototype. For example, for the buffalo (first example) the top three matches of curve-based matching are all incorporated into the first match in the shock-based results.

22 Table V. Cumulative savings due to hierarchical pruning of prototypes. Number of Prototypes

Average Cumulative Savings

4



8



Maximum Cumulative Savings

Minimum Cumulative Savings

%

 %

 %

%

 %

%

with shock matching. The curve matching metric is obtained by finding the least action path in deforming one curve to another [37, ?], while the shock match metric is computed by finding the least action path in deforming the shape represented by its skeleton, or shock graph [38]. Both are true metrics which is crucial for the present work. The use and comparison of multiple metrics reveals differences in the underlying sensitivity of the shape metric as the shape metric defines the boundaries used in grouping, so a different shape metric might provide different prototypes. Figure 10 shows example results of matching unknown views against the shock-based and curve-based databases. Table IV. Results of matching unknowns using curve-based and shock-based dis-similarity metrics. Notice that using the top three matches results in correct identification almost always. Object Correctly Detected

Best Match 1

Best Match 2

Best Match 3

In Top 2 Matches

In Top 3 Matches

Curve-based Similarity

90.3%

74.2%

59.7%

96.7%

98.3%

Shock-based Similarity

95.2%

69.0%

57.5%

97.6%

100.0%

Table IV shows the results of matching a set of unknown views against our database using curve-based and shock-based dis-similarity metrics. Using curve-based matching, the top three matches lead to correct identification rates of (90%, 74%, 59%) while shock-based matching leads to (95%, 69%, 57%). The cumulative recognition rate, i.e., the fraction indicating correct identification from one of the top two and three matches lead to (96.7%, 98%) for curve-based matching and (97.6%, 100%) for shock-based matching, respectively. The recognition rate for an initial database of 18 objects, presented in [12] was not reduced when it was extended to the current database of 64 objects, despite the fact that generally as the size of the database grows substantial the recognition rates are expected to drop. We plan to construct much larger databases and investigate recognition rates as the size of the database grows. We expect that the differences in the curve-based and shock-based metric would become more significant with larger databases due to the superior discrimination capability of shock-based matching [36, 39].

23 Table V summarizes the savings in terms of number of objects removed from further processing for a two stage hierarchical process that groups aspects until each object has first eight and then four prototypes. Indexing into the coarsest (4 aspects) representation removes about a third of the aspects from further consideration. Indexing into the medium level representation (8 aspects) removes another twenty percent of the original total. 5.1. C URVE

MATCHING VS . SHOCK MATCHING IN THE CONTEXT OF ASPECT- GRAPH CONSTRUCTION

We have used two metrics on [?, 38, 35, 25, 37] to make comparisons between shapes to determine dis-similarity, one based on curve matching and the other based on shock matching. The framework of the aspect-graph generation and the process of recognition is independent of the shape metric used, so this is an opportunity to compare these two different metrics. The measures used for shock matching and curve matching are metrics by construction since they arise from the optimal paths of deformation. A key difference that arises from using the two metrics is in the resulting selection of prototypes described in Section 2. This is due to the nature of each metric and the relative weighting of shape differences. The curve matching metric is more sensitive to a change in its outline layout of a shape, even though the structure of the shape does not change, this is illustrated in Figure 4 where the leg in the middle of the projections under the red bar “moves” from the left to the right as the view changes. This presents a fairly large change in the curve because numerous points on the curves are being altered, therefore the cost will maintain a fairly high value. In contrast, the comparison of the same series of views using the shock matching metric produces a lower level of cost since the underlying structure of the shape does not change across the views. These differences cause an increase in the number of prototypes generated using curve matching as the upper-bounds on aspect boundaries described above are not as large as those generated by shock matching. Figure 11 shows an example of the differences in the aspect boundaries formed by the two metrics. The difference in costs is also evident in Table I and Table II where the views within the boundaries are in bold. These larger and more stable boundaries allow a greater number of views to be grouped than the curve matching, as is evident from Figure 7.

6. Conclusions We have presented a method to generate an aspect-graph representation of complex shapes using the dis-similarity between neighboring views to generate aspects and to select prototypes for each. The set of prototypical views

24

(a)

(b) Figure 11. An illustration in the differences in the aspect boundaries generated using curve-based matching (a) and shock-based matching (b). The graph in each case is the shape distance from a central view to other views, plotted against relative viewing angle. The resulting aspect boundaries are the red and blue lines. The aspect boundaries using the shock metric form larger regions, as evidenced by the last three graphs, where the left and right boundary holds it position across several viewpoints, on the aspect upper boundaries located using the curve matching metric are smaller and do not hold positions across as many view points. This leads to a better clustering of shock-based views and a reduction in the number of shock-based prototypes.

obtained from each 3D object is then cast in hierarchy and constitutes our database. In the course of recognition the unknown view is compared against the set of prototypes hierarchically. At the coarse levels those prototypes whose distance to query is larger than the radius associated with each prototype are pruned. At the finest scale, the comparison uses rank-ordering among remaining prototypes. The result is that the top three choices in our experiment here always picked the correct object prototypes from our database, whose identity can then be further verified be examining specific views within the selected aspect. There is an issue of sampling inherent in any method that operates by partitioning the viewing sphere. Specifically, the frequency of view acquisition (5 degrees in this paper) will be a parameter which in conjunction to the shape complexity of database objects determines the overall recognition accuracy. We have chosen 5 degrees as our sample rate as a minimal increment which allows for timely experiments. The time to create each object’s set of aspects based on 5-degree increment is listed in Table VI. More

25 Table VI. The performance information related the to tasks performed in object recognition. Improvements in shock-matching speeds since these experiments were first ran indicate two orders of magnitude improvements [39].

Action

Curve Matching

Shock Matching

Database Construction

6 hours

11 hours

Recognition

20 minutes

45 minutes

Post Estimation

3 minutes

5 minutes

examples are available online at “http://www.lems.brown.edu/vision/researchAreas/3DRecog/overview.html” for examination. Acknowledgments: Benjamin Kimia acknowledges the support of NSF grant IRI9700497.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

C. Arcelli, L. Cordella, and G. S. di Baja, editors. Proceedings of the International Workshop on Visual Form, Capri, Italy, May 2001. Springer. R. Bajcsy and F. Solina. Three dimensional object representation revisited. In ICCV87, pages 231–240, 1987. A. Barr. Superquadrics and angle-preserving transformations. IEEE CGA, 1(1):11–23, January 1981. R. Basri and S. Ullman. The alignment of objects with smooth surfaces. In ICCV88, pages 482–488, 1988. G. Bellaire. Feature-based computation of hierarchical aspect-graphs. Machine GRAPHICS and VISION, 2(2):105–122, 1993. P. Besl and R. Jain. Three-dimensional object recognition. ACM computing Surveys, 17(1):75–145, March 1985. I. Biederman. Recognition by components. Psych. Review, 94:115–147, 1987. K. Bowyer and C. Dyer. Aspect graphs: an introduction and survey of recent results. IJIST, 2:315–328, 1990. J. Burns and L. Kitchen. Recognition in 2D images of 3D objects from large model bases using prediction hierarchies. In IJCAI87, pages 763–766, 1987. S. Carlsson and D. Weinshall. Dual computation of projective shape and camera positions from multiple images. IJCV, 27(3):227–241, May 1998. C. Chien and J. Aggarwal. Model construction and shape recognition from occluding contours. PAMI, 11(4):372–389, April 1989. C. M. Cyr and B. B. Kimia. 3D object recognition using shape similarity-based aspect graph. In ICCV2001 [21], pages 254–261. S. Dickinson, A. Pentland, and A. Rosenfeld. 3D shape recovery using distributed aspect matching. PAMI, 14(2):174–198, February 1992.

26 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

26. 27. 28. 29. 30.

31. 32. 33. 34. 35. 36.

D. Eggert and K. Bowyer. Computing the perspective projection aspect graph of solids of revolution. PAMI, 15(2):109–128, February 1993. D. Eggert, K. Bowyer, C. Dyer, H. Christensen, and D. Goldgof. The scale space aspect graph. PAMI, 15(11):1114–1130, November 1993. T. Fan, G. Medioni, and R. Nevatia. Recognizing 3D objects using surface descriptions. PAMI, 11(11):1140–1157, November 1989. P. Flynn and A. Jain. BONSAI: 3D object recognition using constrained search. PAMI, 13(10):1066–1075, October 1991. Z. Gigus, J. Canny, and R. Seidel. Efficiently computing and representing aspect graphs of polyhedral objects. PAMI, 13(6):542–551, June 1991. P. L. Halliman, G. G. Gordon, A. L. Yuille, P. Giblin, and D. Mumford. Two- and Three-Dimensional Patterns of the Face. A. K. Peters, 1999. D. Huttenlocher and S. Ullman. Recognizing solid objects by alignment with an image. IJCV, 5(2):195–212, November 1990. Eighth International Conference on Computer Vision, Vancouver, Canada July 9-12, 2001, Vancouver, Canada, July 9-12 2001. IEEE Computer Society Press. K. Ikeuchi and T. Kanade. Automatic generation of object recognition programs. PIEEE, 76(8):1016–1035, August 1988. D. Keren, D. B. Cooper, and J. Subrahmonia. Describing complicated objects by implicit polynomials. PAMI, 16(1):38–53, 1994. M. Kirby and L. Sirovich. Application of the kahunen-loeve procedure for the characterization of human faces. PAMI, 1:103–108, 1990. P. Klein, T. Sebastian, and B. Kimia. Shape matching using edit-distance: an implementation. In Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 781–790, Washington, D.C., January 7-9 2001. J. J. Koenderink and A. J. van Doorn. The singularilarities of the visual mapping. Biol. Cyber., 24:51–59, 1976. J. J. Koenderink and A. J. van Doorn. The internal representation of solid shape with respect to vision. Biol. Cyber., 32:211–216, 1979. D. Kriegman and J. Ponce. Computing exact aspect graphs of curved objects: Solids of revolution. IJCV, 5(2):119–135, November 1990. F. Leymarie and B. B. Kimia. The shock scaffold for representing 3D shapes. In Arcelli et al. [1], pages 216–228. S. Nayar, S. Rene, and H. Murase. Realtime 100 object recognition system. In Proceedings 1996 IEEE International Conference on Robotics and Automation, pages 2321–2325, 1996. S. Petitjean, J. Ponce, and D. Kriegman. Computing exact aspect graphs of curved objects:algebraic surfaces. IJCV, 9(3):231–255, 1992. A. Pope and D. Lowe. Learning object recognition models from images. In ICCV93, pages 296–301, 1993. E. Rosch. Principles of categorization. Cognition and Categorization, Erlbaum Hillsdale, NJ, 1978. W. Seales and C. Dyer. Viewpoint from occluding contour. CVGIP, 55(2 March 1992):198–211, March 1992. T. Sebastian, P. Klein, and B. Kimia. On aligning curves. PAMI, 25(1):116–125, January 2003. T. B. Sebastian, J. J. Crisco, P. N. Klein, and B. B. Kimia. Constructing 2D curve atlases. In Proceedings of Mathematical Methods in Biomedical Image Analysis, pages 70–77, 2000.

27 37.

38. 39. 40.

41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52.

T. B. Sebastian and B. B. Kimia. Curves vs skeletons in object recognition. In Proceedings of the IEEE International Conference on Image Processing, pages 22–25, Thessaloniki, Greece, October 2001. IEEE Computer Society Press. T. B. Sebastian, P. N. Klein, and B. B. Kimia. Alignment-based recognition of shape outlines. In Arcelli et al. [1], pages 606–618. T. B. Sebastian, P. N. Klein, and B. B. Kimia. Recognition of shapes by editing shock graphs. In ICCV2001 [21], pages 755–762. T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock-based indexing into large shape databases. In Seventh European Conference on Computer Vision, pages Part III:731 – 746, Copenhagen, Denmark, May 28-31 2002. Springer Verlag. I. Shimshoni and J. Ponce. Finite-resolution aspect graphs of polyhedral objects. PAMI, 19(4):315–327, April 1997. A. Shokoufandeh, I. Marsic, and S. Dickinson. View-based object recognition using saliency maps. IVC, 17(5/6):445–460, April 1999. T. Sripradisvarakul and R. Jain. Generating aspect graphs for curved objects. In 3DWS89, pages 109–115, 1989. J. Stewman and K. Bowyer. Creating the perspective projection aspect graph of polyhedral objects. volume 9, pages 494–500, 1988. J. Stone. Object recognition: view-specificity and motion-specificity. Vision Research, 39:4032–4044, 1999. M. Tarr and D. Kriegman. What defines a view? Vision Research, 41:1981–2004, 2001. M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 1990. D. Weinshall and M. Werman. On view likelihood and stability. PAMI, 19(2):97–108, February 1997. D. Weishall. Model-based invariants for 3-D vision. The International Journal of Computer Vision, 10(1):27–42, 1993. I. Weiss and M. Ray. Model-based recognition of 3D objects from single images. PAMI, 23(2):116–128, February 2001. D. Wilkes and J. Tsotsos. Active object recognition. In CVPR92, pages 136–141, 1993. A. Wong, S. Lu, and M. Rioux. Recognition and shape synthesis of 3D objects image based on attributed hypergraphs. PAMI, 11(3):279–290, March 1989.