OBJECT RECOGNITION USING THREE ... - Semantic Scholar

Report 4 Downloads 162 Views
OBJECT RECOGNITION USING THREE-DIMENSIONAL INFORMATION

M a s a k i Oshima and Y o s h i a k i S h i r a i

Electrotechnical 1-1-4,

recognition. T h e r e have been several studies on scene description using range data [ 4 - 1 2 ] . Some methods assumed p l a n a r i t y o n o b j e c t s [ 4, 10, 12 ]. Sugihara [9] p r o p o s e d a way f o r t r i h e d r a l objects. A g i n [ 5 ] , and Nevatia and Binford [6] described a scene including curved o b j e c t s w i t h cylinders. Although t h e method is generalized in describing body which possesses advantageous elongation, t h e r e may a r i s e some considerable for other k i n d of o b j e c t s . W e have difficulties d e v e l o p e d a method w h i c h d e s c r i b e s a scene w i t h planar and smoothly curved surfaces [8]. The method I s a p p l i c a b l e t o g e n e r a l scenes with real objects. Recognition of objects usually requires p a t t e r n matching process: matching a part of a scene description to a p a r t of an o b j e c t model. The c o n t r o l s t r a t e g y f o r m a t c h i n g i s important in processing a complex scene. The b l i n d s e a r c h i s i n e f f i c i e n t u n d e r c i r c u m s t a n c e s where objects are stacked and the p o s i t i o n and o r i e n t a t i o n o f them a r e n o t known a priori. Our system, at first, tries to get reliable and u s e f u l features of a scene and t o m a t c h them t o models so that the result may g u i d e t h e f u r t h e r p r o c e s s i n g . Thus t h e s y s t e m r e a l i z e s f l e x i b l e and e f f i c i e n t r e c o g n i t i o n .

ABSTRACT This paper d e s c r i b e s an approach to the recognition of stacked objects with p l a n a r and curved s u r f a c e s . The r a n g e d a t a o f a scene a r e obtained by a range f i n d e r . The s y s t e m w o r k s I n two p h a s e s . In a l e a r n i n g phase, a scene c o n t a i n i n g a s i n g l e o b j e c t I s d e s c r i b e d I n terms o f p r o p e r t i e s o f r e g i o n s and r e l a t i o n s between them. This d e s c r i p t i o n Is stored as an object model. In a r e c o g n i t i o n p h a s e , a n unknown scene I s described I n t h e same way a s I n t h e l e a r n i n g p h a s e . And t h e n t h e d e s c r i p t i o n i s matched t o t h e o b j e c t models so that stacked objects are r e c o g n i z e d one b y o n e . E f f i c i e n t matching is achieved by a combination of data-driven and model-driven search process. E x p e r i m e n t a l r e s u l t s f o r b l o c k s and machine parts a r e shown.

I

INTRODUCTION

This p a p e r d e s c r i b e s a method recognition using three-dimensional scene. Our a i m i s t o d e a l w i t h scenes the f o l l o w i n g f e a t u r e s :

Laboratory

Umezono, S a k u r a - m u r a , N i i h a r i - g u n I b a r a k i , 305 Japan

for object data of a which have

( 1 ) O b j e c t s a r e p l a c e d i n any 3-D p o s i t i o n w i t h any orientation. ( 2 ) O b j e c t s may be s t a c k e d in a Bcene. (3) Objects have planar and/or smoothly curved surfaces. R e c o g n i t i o n o f r e a l o b j e c t s has been studied aiming at realization of automatic inspection, a s s e m b l y and s o o n . For this purpose, the r e c o g n i t i o n method s h o u l d b e f l e x i b l e and e f f i c i e n t enough t o a n a l y z e s c e n e s w i t h s t a c k e d objects and recognize them. It Is well known that even r e c o g n i t i o n o f a s i m p l e o b j e c t I s n o t easy if the object is allowed to rotate I n 3-D s p a c e . The shape i d e n t i f i c a t i o n methods u s i n g a m o n o c u l a r g r a y picture [ l , 2] assumed many c o n s t r a i n t s o n I n p u t scenes. R e c o g n i t i o n o f scenes w i t h s t a c k e d o b j e c t s was s t u d i e d f o r s i m p l e scenes [ 3 ] . I f range data la available, more flexible recognition can b e a c h i e v e d because 3-D shapes o f o b j e c t s are d i r e c t l y o b t a i n e d . In order to analyze a scene with multiple objects occluding one another, it is o f t e n necessary to make a scene description which Includes useful information for

601

(1) Group the p o i n t s i n t o small surface elements and assuming each element to be a plane, get the equations of the surface elements( F i g . 3(b)), (2) Merge the surface elements together i n t o regions( elementary r e g i o n s , F i g . 3 ( C ) ) . (3) C l a s s i f y the elementary regions i n t o planar and curved ones( F i g . 3 ( d ) ) . (4) Try to extend the curved regions by merging adjacent curved regions to produce l a r g e r regions( g l o b a l regions) and f i t the quadratic surfaces to them( F i g . 3 ( e ) ) . (5) Describe the scene in terms of p r o p e r t i e s of regions and r e l a t i o n s between regions( F i g . 3(f)).

II OUTLINE The system works In two phases: l e a r n i n g and r e c o g n i t i o n as shown in F i g . 1. In l e a r n i n g phase, known objects are shown to the system one by one. The system makes a d e s c r i p t i o n of a scene in terms of p r o p e r t i e s of regions( surfaces) and t h e i r r e l a t i o n s . The d e s c r i p t i o n is stored as a model of an o b j e c t . If one view is not enough to b u i l d a model of an o b j e c t , several t y p i c a l views are shown. In r e c o g n i t i o n phase, the system makes a d e s c r i p t i o n of unknown scenes in the same way as In l e a r n i n g phase. The system s e l e c t s among unknown regions those which seam to be most r e l i a b l e and u s e f u l f o r r e c o g n i t i o n . A part of a scene c o n s i s t i n g of these regions is c a l l e d a k e r n e l ( see F i g . 2 ) . Then a model is selected which includes regions corresponding to the k e r n e l . Once a candidate model is chosen, regions neighboring the kernel are searched f o r by a model-driven matching process. When t h i s process terminates, the system decides if the candidate model Is r e a l l y found or n o t . This process Is repeated u n t i l a l l regions In a scene are processed. Further d e t a i l s of d e s c r i p t i o n and matching are described in the f o l l o w i n g s e c t i o n s .

Our range f i n d e r employs a v e r t i c a l s l i t p r o j e c t o r and a TV camera to p i c k up the r e f l e c t e d l i g h t . By r o t a t i n g a p r o j e c t o r from the l e f t to the r i g h t , many p o i n t s in a f i e l d of view are obtained. Three-dimensional co-ordinates of are c a l c u l a t e d by t r i a n g u l a t i o n ( F i g . processing [ 8 ] proceeds as f o l l o w s :

the p o i n t s 3 ( a ) ) . The

602

adjacent regions In the f o l l o w i n g manner. We consider t h a t planar regions are more r e l i a b l e and useful than curved ones because the r e l a t i v e d i r e c t i o n s of planar surfaces of an object are consistent under r o t a t i o n o f p a r t i a l o c c l u s i o n . Regions w i t h l a r g e r area are also b e t t e r because p r o p e r t i e s of l a r g e r regions are l e s s s e n s i t i v e to n o i s e . Regions w i t h many neighbors are also more u s e f u l because many r e l a t i o n s can be used in matching. For each region S t , the following function Is c a l c u l a t e d . The region S 4 which maximizes the f u n c t i o n is selected as a part of the kernel:

Figure A shows an example of a s l i t Image of a scene to be processed, which includes a dodecahedron, an icosahedron, and a c y l i n d e r . In F i g . 5 elementary regions of the scene are shown.

(1)

A ( S i ) and N a ( S i ) are the area of a region S i and the number of adjacent regions r e s p e c t i v e l y . λ, ,λ 2 and λ 2 are w e i g h t s . The model to be selected must have a region that corresponds to S*. Suppose the region S4 has no neighboring r e g i o n s . Then the kernel contains only one r e g i o n . Let My denote i - t h region of J - t h model. The d i s s i m i l a r i t y between S i and M i is evaluated by the d i f f e r e n c e s of t h e i r p r o p e r t i e s defined in s e c t i o n 3. Let a denote l - t h property of a region and D denote an operator to evaluate the d i f f e r e n c e between property v a l u e s . The system c a l c u l a t e s the f o l l o w i n g f u n c t i o n :

F i g . 5. Elementary regions f o r the d e s c r i p t i o n . The p r o p e r t i e s of a region consist o f : (a) type of a surface f i t t e d to the region( planar or curved, and type of quadratic curve f o r curved region) (b) equation of the surface in a 3-D space, (c) 2-D p r o p e r t i e s of the region( area, perimeter, compactness (4 π a r e a / p e r i m e t e r 1 ) , mean radius, standard d e v i a t i o n of r a d i i , minimum r a d i u s , and maximum r a d i u s , e t c . ) (d) three-dimensional c e n t r o i d of the region (e) number of adjacent r e g i o n s .

(2)

The system s e l e c t s a model which minimizes f 2 if the value Is small enough. If the value is not s m a l l , the assumption f a l l s . If the region S 1 has neighboring r e g i o n s , the evaluation f u n c t i o n should contain the r e l a t i o n s between r e g i o n s . Among neighboring regions a region S( is picked out which maximizes E q . ( l ) . Now the k e r n e l consists of S A and SA. Let M denote a neighbor of M i , , and q m denote m-th r e l a t i o n between r e g i o n s . (the r e l a t i o n s are defined In s e c t i o n 3.) The system c a l c u l a t e s the following function:

The r e l a t i o n s between regions consist o f : (a) adjacency (b) type of i n t e r s e c t l o n ( convex or concave) (c) angle between regions( For non-planar r e g i o n s , planes are f i t t e d to them.) (d) r e l a t i v e p o s i t i o n s of the c e n t r o i d s . The matching section.

p r o p e r t i e s and r e l a t i o n s are used in process described in the f o l l o w i n g

(3)

The system s e l e c t s a model which minimizes i f the value i s small enough.

I I I MATCHING A. Selecting a probable model

kernel

and

assuming

the

most

B.

f2

V e r i f i c a t i o n of assumption

The system v e r i f i e s an assumed model by matching the regions around the k e r n e l to those of the model. Since some regions of the model may not

The system f i r s t l y selects the most promising region. The c r i t e r i o n is based on the type of the r e g i o n , the area of the r e g i o n , and the number of

603

the d i r e c t i o n of the r e g i o n are not a f f e c t e d by the occlusion. Such a planar r e g i o n , t h e r e f o r e , is determined t o m a t c h a model region if the dlssimilarlty of corresponding r e l a t i v e d i r e c t i o n i s s u f f i c i e n t l y s m a l l , 3-D p o s i t i o n correspondence is good e n o u g h , and t h e a r e a o f t h e r e g i o n i s l e s s than that of the model.

be seen in the scene, regions i n t h e scene a r e p i c k e d u p one b y o n e , and a r e matched t o t h o s e of t h e model o b j e c t . S t a r t i n g from a k e r n e l , the system tries to establish a correspondence b e t w e e n scene r e g i o n s and model r e g i o n s and t o f i n d a l l the regions of the object in the scene. A t each s t e p , t h e s y s t e m s e l e c t s a new r e g i o n Sn among those which are adjacent to known regions (that i s , the regions w h i c h have been s e l e c t e d and matched i n t h e earlier steps). A g a i n , E q . ( l ) i s used f o r t h e s e l e c t i o n o f a new r e g i o n .

When a matching process for a model terminates, the system checks i f the assumption i s acceptable or not. I f enough p o r t i o n o f t h e model i s f o u n d , t h e s y s t e m c o n c l u d e s t h a t i t has f o u n d a n o b j e c t t h a t i s i d e n t i c a l t o t h e assumed m o d e l . The position and orientation of the object are c a l c u l a t e d from the correspondence of the r e g i o n s .

Whenever S n i s s e l e c t e d f r o m the scene, the corresponding r e g i o n i s searched f o r i n the model. The s e a r c h i s based o n t h e dissimilarity function which evaluates dissimilarity of p r o p e r t i e s and that of relations b e t w e e n S n and supposed model r e g i o n Mui. ( see F i g . 6).

V

The f u n c t i o n compares p r o p e r t i e s o f t h e r e g i o n pl(Sn) with that of the region in t h e model Pp(M u i *)^ The function a l s o compares relations q ( S n S i ; ) between Sn and t h e known r e g i o n s S i of t h e same o b j e c t w i t h those qm(Mui.Mi) of the corresponding regions Mui and Mi. In t h e model. Suppose t h e number o f a l r e a d y known r e g i o n s is K. The dissimilarity o f r e g i o n S n and M u i i s d e f i n e d as f o l l o w s :

where P 1 and T m d e n o t e w e i g h t s . a r e a s a r e used a s w e i g h t s s o t h a t regions contribute more to f3.

Note more

EXPERIMENTAL RESULTS

E x p e r i m e n t s a r e made t o r e c o g n i z e two k i n d s o f scenes: one w i t h b l o c k s and one w i t h machine parts. In the first experiment, ten kinds of objects with planar and/or quadratic curved s u r f a c e s ( shown i n F i g . 7) are used f o r object models. F i g u r e 8 i l l u s t r a t e s a n example o f a d e s c r i p t i o n o f a model. The result of recognition( which corresponds to F i g . A ) i s shown i n F i g . 9. The first two letters i n a r e g i o n i n d i c a t e t h e model o b j e c t and t h e l e t t e r i n t h e p a r e n t h e s e s indicates the corresponding r e g i o n In the model.

that the reliable

The r e g i o n w i t h minimum f 3 i s chosen. If the value Is s m a l l enough, t h e match I s c o n s i d e r e d t o be acceptable. Generally If a surface of an o b j e c t i s n o t f u l l y s e e n , t h e r e g i o n may n o t p a s s t h e t e s t and I t may r e m a i n unknown. However, i f a p l a n e Is partially o c c l u d e d b y o t h e r s u r f a c e s , t h e t y p e and

604

In the second experiment, machine p a r t s ( p u l l e y , l i n e r , p i s t o n , and conrod of a car) are used f o r object models. Figure 10 shows an example of an input image. Figure 11 shows elementary regions. In F i g . 12, the r e s u l t of r e c o g n i t i o n is shown. Experiments f o r several s i m i l a r scenes were so f a r s a t i s f a c t o r y . The processing time f o r a t y p i c a l scene such as shown in F i g . 10 is about 3 min. f o r d e s c r i p t i o n and 1 m i n . f o r matching.

605

REFERENCES

[ I ] Barrow, H. and Popplestone, R. "Relational d e s c r i p t i o n s i n p i c t u r e processing."Machine i n t e l l i g e n c e 6 (1971) 377-396. [ 2 ] Yachida, M. and T s u j i , S. ''A versatile machine v i s i o n system f o r complex i n d u s t r i a l parts." Trans. IEEE C-26:9 (1977) 882-894. [3] T s u j i , S. and Nakamura, A. "Recognition of an object in a stack of i n d u s t r i a l p a r t s . " In Proc. IJCAI-75. 1975, 811-818. [4] S h i r a i , Y. "Recognition of polyhedrons w i t h a range f i n d e r . " P a t t e r n Recognition 4 (1972) 243-250. [5] A g i n , G. "Representation and d e s c r i p t i o n of curved o b j e c t s . " AIM-173 Stanford U n i v e r s i t y , 1972. [6] N e v a t i a , R. and B i n f o r d , T. 0. "Structured d e s c r i p t i o n s of complex o b j e c t s . " In Proc. I J C A I - 7 3 . . 1973, 641-647. [7] Popplestone, R. J. et a l . "Forming models of plane-and-cylinder faceted bodies from l i g h t s t r i p e s . " In Proc. IJCAI-75. 1975, 664-668. [8] Oshima, M. and Shirai, Y. "A scene description method using three-dimensional i n f o r m a t i o n . " Pattern Recognition 11 (1979) 9-17. [ 9 ] Suglhara, K. "Range-data analysis guided by a junction dictionary." A r t i f i c i a l Intelligence 12 (1979) 41-69. [10] Duda, R. 0. et a l . "Use of range and reflectance data to f i n d planar surface regions." Trans. IEEE PAMI-1:3 (1979) 259-271. ( I I ] Gennery, D. B. "Object detection and measurement using stereo v i s i o n . " In Proc. IJCAI-79. 1979, 320-327. [12] Milgram, D. L. and B j o r k l u n d , C. M. "Range image processing: planar surface e x t r a c t i o n . " in proc IJCPR-80. 1980, 912-919.

VI CONCLUSION

We have p r o p o s e d a s y s t e m to r e c o g n i z e s t a c k e d objects using range d a t a . The s y s t e m d e s c r i b e s a scene in terms of planes and smoothly curved surfaces. Models of objects are built i n the s y s t e m b y s h o w i n g them one b y o n e . Objects are recognized by matching the d e s c r i p t i o n of an input scene t o t h o s e o f m o d e l s . The m a t c h i n g program p i c k s u p r e g i o n s w h i c h a r e most r e l i a b l e and u s e f u l f o r r e c o g n i t i o n , and m a t c h e s them to those of a model object. Once a c a n d i d a t e model i s c h o s e n , then by a guidance of the model, the rest of the scene regions are searched f o r . Thus t h e s y s t e m has r e a l i z e d f l e x i b l e and efficient recognition. The r e s u l t o f e x p e r i m e n t s shows t h a t t h i s scheme i s promising.

ACKNOWLEDGEMENTS.

We would l i k e to thank t h e members of the computer vision section at the E l e c t r o t e c h n i c a l Laboratory f o r t h e i r h e l p f u l discussions.

606