ROBUST RECOGNITION OF BUILDINGS IN COMPRESSED LARGE AERIAL SCENES Robert Azencott
FranCois Durbin and Jose‘ Paumard
Ecole Normale Supkrieure DIAM-CMLA F 94230 Cachan
CEA, Centre de Bruykres-le-Chatel - BP12 F 91680 Bruykres-le-Chatel E-mail :
[email protected] ABSTRACT
it efficiently. Let Ij be an image of size N 2 pixels, given at an integer resolution j . We call I j t 1 / 2 the smoothing of Ij by the quincunx wavelet transform (QWT) [l],and J j + l / z the details available in Ij but missing in I j + 1 / 2 , also computed by the QWT. One can prove that I j + 1 / 2 and Jj+l/Z are a complete representation of the image I j . Both Ij+1/2 and J j + l / z are of size N 2 / 2 pixels. By repeating this process on I j t 1 / 2 , we obtain smoother and smoother images that are used to compute the multiscale edges. To extract the multiscale edges of an image 10.We first smooth IO to obtain the images 1 1 1 2 , 11, 4 1 2 , 12, Is12 and 4 . Then, we apply the Shen-Castan edge detector [3] on these images to obtain the quincunx multiscale edges (Fig 2 ) .
This paper shows how it is possible to recognize and localize objects in compressed images. The compression method we choose is based on the extraction of the quincunx multiscale edges. The edges of the object and the scene are both computed, and then matched using the Censored Hausdorff Distance. This distance is computed by double truncation of the classical Hausdorff distance. The localization is based on a coarse-to-fine method. Robustness to noise and possible occlusions of the objects is shown. This algorithm is fast on a workstation and we have implemented it on a massively parallel computer, demonstrating real-time feasability. 1. INTRODUCTION
The images we work on are large aerial images compressed with multiscale edges methods. We want to recognize and localize buildings, extracted from these images, placed on a black background and compressed using the same algorithm. The scenes (size 3500x3500) are 15 times larger than the images of the “buildings” (256x256). The localization process has to be fast, as accurate as possible and robust. Decompression of the image of the scene must be avoided. We begin with a few remarks on quincunx multiresoltion analysis, then give the definition of the Censored Hausdorff Distance (CHD) and describe the multiresolution research. We also explain how to deal with occlusions, and give the performances of the algorithm on a sequential and a massively parallel SIMD computer.
3. THE CENSORED HAUSDORFF
DISTANCE The Hausdorff distance ( H ) has already been used by D. Huttenlocher [4] to compare binary images. If A and B are two sets of points and d a distance over the plane, H ( A , B) is defined by :
In the following, we call h ( A ,B) the measure of inclusion of A in B , and h ( S ,A ) the measure of the reczprocal zncluszon of B zn A. We showed in [SI that, due t o a serious lack of robustness, this distance was not usable in its standard form. We thus introduced a new definition :
2. QUINCUNX MULTISCALE EDGES
It has been proven by S. Mallat et a1 [2] that the multiscale edges (ME) of an image can be used t o compress THIS WORK HAS ALSO BEEN SUPPORTED BY CISI, 3
RUE LE CORBUSIER, SILIC 232, F 94528 RUNGIS CEDEX
0-7803-3258-X/96/$5.00 0 1996 IEEE
D p ( a , B)
617
1
Qp
{ d ( a , b ) 1 b E Bl
(4)
representing this object under different occlusions. Images of this object seen from different camera positions can also be added to this family. Each of these images is then represented by a pyramid of edge images, as many as we want a,t each resolution. The figure 1 shows an example of the representation of an object. It shows how many edge images are needed t o represent these four images, resolution by resolution. For example, we need 2 such images to represent image c at resolution 3/2, four t o represent image a at resolution 1/2. All the edge images are thus associated to a specific image, representing a certain aspect of the object.
Resolution
3
n u n 0
512
2 312 1
112
4.2. Comparison of multiscale edges
Figure 1: Representation of an object with 4 images ( a , b , c and d ) : 2 from “north” camera position and 2 from the “west”. h,,q H,,q
(A, B) (4B)
The multiscale research is then handled as follows : Step i: anataalaxataon at the crudest resolutaon 3. The matching level field at resolution 3 is computed first. Sets of points rx with identical matching level X below A3 are then constructed. Each set is successively submitted to further validation at finer resolution. For each point of this field, we record which representation of the object has the best matching level. Step 2 : processang at resolutaon j . At resolution j we inherit from previous step a set of points for which the matching level is the same at all the coarser resolutions. For each M in TA, we compute the matching level at resolution j for the edge images that belong to the same representation as the one selected at the previous resolution. This step selects the best candidates among I’x and subdivides it into finer subsets. These finer subsets are submitted successively t o the next finer resolution validation. If these subsets are empty, we go back to the coarser resolution and reinitiate this validation on the next best set TA!. Step 3 : final ualadataon ut the finest resolutaon 1/2. The validation of the set we get at the finest resolution 1/2 is the same as in step 2. As it is the finest resolution, the final sets we get are the solution of the localization of the objects B in the scene S.To discriminate these sets, we consider only the matching levels at resolution l/2,as it is the closest to the original image. The research ends when a set has been validated through all the resolutions We can also wait for a few sets to be found, to select the best one, thus improving the accuracy of the localization. If we have exhausted the points selected at the crudest resolution 3 without finding a solution, there are two options We can either decide that the localization fails (ie : the objet B is not found in the scene S),or if the “building” B is small, initialize a new localization, beginning at finer resolution 2.
= QIBI-q {Of, ( a , B) , a E A) (5) = max (h,A ( A ,B) 1 h,A (B, A ) ) (6)
Where &,(A) is the p - t h smallest element of A and (A( is the cardinal of A. The values of p and Q are usually fixed percentages of (AI and IBI,p M 1%IAl and q M 10%lBI. With such a definition, an object A is then closer t o a noisy image of itself, while different objects A and B stay away from each other. We call this distance the censored HausdorfS dastance (CHD), noted Hc. 4. MULTISCALE RESEARCH
Let S be the image of a scene and B the image of an object we want to localize in S.Let B ( M ) be the image B placed over S at the point M. Let ICs(M) be the image S masked by the envelop of B centered on M . h c (0 ( M ) , S) measures the inclusion of B placed over S at the point M , and h c ( I C s ( M ) B) , measures the reciprocal inclusion of S masked by the envelop of B centered on M . These two inclusion measures give us a good evaluation of the plausibility of presence of B in S at point M . The term matchzng level will now refer to the pair ( h c (B( M ) ,S ) , hc ( I C s ( M ) B , ) ) . Let Ad be a fixed threshold at resolution j . If the matching level in M is above A,, then M is rejected as a candidate for the localization of B in S. 4.1. Multiview representation of an object
In order to deal with possible occlusions, we choose to represent a single object by a family of images, each
618
15 only two of them (5%) were lost. The recognition rate significantly decreased for D 5 -D,,,/2 or D > D,,, for N > N,,, and for r > 20 : for these values, between 20% and 30% of the “buildings” were lost. We observed that well-detailed and well-contrasted “buildings” resisted better to stronger noise than the others.
5. RESULTS
5.1. Results
- parallelisation
We built a database of 40 images of “buildings”, extracted from a 3520x3520 pixels scene. Each “building” is 256 x 256 pixels (figure 3). The computing times we give here include the whole research, for Step 1 to Step 3. It does not include the computation of the QWT, nor the edge extraction. The sequential version of the multiscale reserch has been implemented on a HP735-125 workstation. 50% of our “buildings” were found with a few pixels accuracy in less than 15s,and only 5% in more than 45s. This gives an average computing time of about 25s. We also tested this algorithm on smaller scenes (of size 1000x1000 pixels) with objects of size 100x100. The average computing time was then a few seconds only. The parallel version has been implemented on a massively parallel computer : SYMPHONlE. It is an on-board SIMD computer, composed of a ring of 1024 processors, developped by the CEA/LETI. We implemented tlhe computation of the inclusion field at the resolution 3 (Step I), which takes between 2/3 and 9/10 of the total computing time on the sequential version. The field at this resolution is 440x440 pixels, and the object 32x32. The computing time of this inclusion field was less than 300 ms. For a smaller scene of size 1000 x 1000 (thus 125x125 at the resolution 3), the computing time is reduced t o less than 30 ms. Due to the fact that each processor of SYMPHONIE is cadenced at 12 MHz instead of 125 MHz for the PA-Risc of the H P Workstation, the computing time is improved by a factor of about 50.
6. CONCLUSION
We have presented here a new object recognition and localization algorithm, based directly on wavelet compressed images without reconstructing the scenes. We have shown that this algorithm is robust to noise, the recognition rate breaks down only on very hard perturbations. The computing time was short on a HP-735 125 workstation : an average of 25s was observed to localize objects of size 256x256 in aerial scenes of size 3500x3500 . The implementation on a parallel computer proved that real-time is reachable for scenes of size 10OOx 1000. Moreover, the algorithmdoes not need more memory than the one required t o record the image of the compressed scene, so that implementation in an on-board specific massively parallel hardware looks quite promising. 7. REFERENCES [I] J.-C. Feauveau. - “Analyse multirisolution par ondelettes non orthogonales et bancs de filtres numkriques” . Ph. D. Dassertataon, Unaversaty of Paras Sud, France, January 90. [2] S. Mallat, S. Zhong. - “Characterization of signals from multiscale edges”. IEEE Transactzons on PAMI, vol. 14, no 7, January 1992.
5.2. Robustness to noise and luminance changes
[3] J. Shen, S. Castan
- “An Optimal Linear Operator for Edge Detection”. Proceedings of CVPR’86, Miami, 1986.
The images at resolution 3 are shrunk by a factor 8 compared to the original images. This gives an inherent good robustness t o small geometric distortion. Two types of non-linear transformations of the luminance of the scene have been studied. The first one shrinks the range of the luminance : intensities between 0 and N are changed to 0, and between (255 - N ) and 255 changed t o 255). The second one modifies the dynamics of the luminance by 4~0%. We have also applied gaussian noises with standard deviation U . We studied values of N up to N,,, = 25, of D up to D,,, = 10% and of U up to U,,, = 30 (25% of the average grey level of our images). We tried t o localize the same “buildings” on these altered scenes. Until N = Nm,,/2 and D = +Dm,,/2, all the “buildings” were correctly localized. At U =
[4] David P. Huttenlocher, Gregory A. Klanderman, William J. Rucklidge. - “Comparing images using the Hausdorff distance”. IEEE Transactzons on PAMI, vol. 15, nog, September 1993.
[5] R. Azencott, F. Durbin, J. Paumard. - “Multiscale identification of buildings in compressed large aerial scenes”, Proceedangs of ICPR’96, Vzenne, 1996.
[6] Josk Paumard. - Ph. D. Dassertataon, E N S Cachan, to be completed an 1996.
619
Resoluhon
1/2
-1
-312
-2
-512
Figure 2: Smoothed images of the image of a database “building”
Figure 3: Test scene and 4 typical “buildings”
620
-3