A feature-based image registration algorithm using ... - Semantic Scholar

Report 3 Downloads 71 Views
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

2351

A Feature-Based Image Registration Algorithm Using Improved Chain-Code Representation Combined with Invariant Moments Xiaolong Dai, Member, IEEE, and Siamak Khorram, Member, IEEE

Abstract— In this paper, a new feature-based approach to automated image-to-image registration is presented. The characteristic of this approach is that it combines an invariantmoment shape descriptor with improved chain-code matching to establish correspondences between the potentially matched regions detected from the two images. It is robust in that it overcomes the difficulties of control-point correspondence by matching the images both in the feature space, using the principle of minimum distance classifier (based on the combined criteria), and sequentially in the image space, using the rule of root meansquare error (RMSE). In image segmentation, the performance of the Laplacian of Gaussian operators is improved by introducing a new algorithm called thin and robust zero crossing. After the detected edge points are refined and sorted, regions are defined. Region correspondences are then performed by an image-matching algorithm developed in this research. The centers of gravity are then extracted from the matched regions and are used as control points. Transformation parameters are estimated based on the final matched control-point pairs. The algorithm proposed is automated, robust, and of significant value in an operational context. Experimental results using multitemporal Landsat TM imagery are presented. Index Terms—Automated, feature extraction, image matching, image registration, satellite imagery.

I. INTRODUCTION

I

MAGE registration is the process of spatially matching two images so that corresponding pixels in the two images correspond to the same physical region of the scene being imaged. Image registration is an inevitable problem arising in many image-processing applications whenever two or more images of the same scene have to be compared pixel by pixel. These applications include the following. Multisource Data Fusion: Fusion of multisource data usually requires establishing pixel-to-pixel correspondence [1]. Time-Varying Image Analysis for Changes: Change analysis requires reliable and accurate multitemporal image registration. Most existing change detection algorithms depend considerably on reasonably accurate image registration to be workable [2].

Manuscript received March 25, 1998; revised December 11, 1998. This work was supported in part by the Coastal Change Analysis Program (C-CAP) of the National Oceanic and Atmospheric Administration (NOAA), funded through the North Carolina Sea Grant Program and by SGI/Cray Research, Inc., St. Louis, MO. The authors are with the Center for Earth Observation, North Carolina State University, Raleigh, NC 27695-7106 USA. Publisher Item Identifier S 0196-2892(99)06150-1.

Image Mosaicking: In many regional and/or global remote sensing applications involving multiple scenes, a single coverage is needed, and individual satellite scenes must be mosaicked for further processing [3]. Motion Detection and Object Recognition: Image registration also is a required process in other examples such as virtual reality construction, stereo matching and generation, image composite generation, and object height estimation [4]. Existing image registration techniques fall into two broad categories: manual registration and automated registration. Manual registration techniques commonly have been used in practical applications. In these methods, ground control points (GCP’s) are located visually on both the sensed image and the reference image by choosing special and easily recognizable points such as road intersections. By doing so, the steps of feature identification and matching are done simultaneously by human operators. In order to get reasonably good registration results, a large number of control points must be selected across the whole image [4]. This is very tedious, laborintensive, and repetitive work, especially when multiple widearea scenes such as thematic mapper (TM) scenes (over 6000 pixels for each TM scene), are involved in a 6000 regional and/or global change-detection implementation. These techniques also are subject to the problems of inconsistency, limited accuracy, and, in many instances, lack of availability of reference data for locating GCP’s. Therefore, there is a critical need to develop an automated technique that requires little or no operator supervision to coregister multitemporal and/or multisensor images when higher accuracy is desired and image analysis is subject to time constraints [5]. The current automated registration techniques can be classified into two broad categories: area-based and feature-based techniques. In the area-based algorithms, a small window of points in the sensed image is compared statistically with windows of the same size in the reference image [6]. Window correspondence is based on the similarity measure between two given windows. The measure of similarity is usually the normalized cross correlation. Other useful similarity measures are the correlation coefficient and the sequential-similarity detection [7]. Area-based techniques can be implemented by the Fourier transform using the fast Fourier transform (FFT) [8]. The centers of the matched windows then are used as control points to solve for the transformation parameters between the two images. Although a novel approach has been proposed by [9] to register images with large misalignment by

0196–2892/99$10.00  1999 IEEE

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2352

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

correcting the image rotation first and then establishing correspondences, a majority of the area-based methods have the limitation of registering only images with small misalignment, and therefore, the images must be roughly aligned with each other initially. The correlation measures become unreliable when the images have multiple modalities and the gray-level characteristics vary (e.g., TM and synthetic aperture radar (SAR) data). In contrast, the feature-based methods are more robust and more suitable in these cases. There are two critical procedures generally involved in the feature-based techniques: feature extraction and feature correspondence. A variety of imagesegmentation techniques have been used for extraction of edge and boundary features, such as the Canny operator in [10] and [11], the Laplacian of Gaussian (LoG) operator in [12] and [13], the thresholding technique in [14], the classification method in [15], and the region growing in [16]. Images are represented by a set of features after extraction either in the spatial domain [17], [18] or in the transform domain [20]–[22]. Spatial features usually include edges, boundaries, road intersections, and special structures. In the transform domain, a set of transform coefficients represent the images. Feature representations, which are invariant to scaling, rotation, and translation, are desirable in the matching process. The general feature representations are chain code, moment invariants, Fourier descriptors, shape signatures, centroidal profiles, and shape matrices [23]. Centers of gravity of closed boundaries generally have been used as control points [14], [16], [17]. Other points, such as salient points in [6] and locations of maximum curvature in [12], also have been used as control points. Feature correspondence is performed based on the characteristics of the features detected. A robust algorithm to establish control-point correspondences is one of the most important tasks in automated image registration. Existing feature-matching algorithms include binary correlation, distance transform and Chamfer matching, dynamic programming, structural matching, chain-code correlation, distance of invariant moments, and relaxation algorithms [16], [23]. In most existing feature-based techniques, feature correspondence is still the most challenging problem. This paper proposes a new feature-based approach to automated registration of remotely sensed images of the same scene using combined criteria of invariant-moment distance and chain-code matching, as shown in Fig. 1. The critical elements for an automated image registration procedure are explored. These elements include feature identification, feature matching, and transformation parameter estimation. Special attention has been paid to feature extraction by image segmentation and control-point correspondence by region matching. In image segmentation, the performance of the LoG zerocrossing edge detector is improved by developing a selective zero-crossing scheme and defining an edge strength (ES) array. Images are segmented by the improved LoG edgeextraction technique. In region matching, a new method for the determination of corresponding regions is proposed using combined criteria of invariant-moment distance and chaincode matching between the detected regions. Each region is represented by affine invariant moments and improved chain

codes that are affine-invariant features describing the shape of a region. Region matching then is implemented in the feature space and is implemented sequentially in the image space. In the feature space, minimum distance classification is used to identify the most robust control points for initial image transformation. In the image space, region-to-region correspondence is established by the root mean-square error (RMSE) rule. The rest of the paper is organized as follows. In Section II, the image segmentation technique developed in this work is described. In Section III, we focus on the image-matching algorithm proposed to achieve a robust scheme for controlpoint correspondence. The experimental results are presented in Section IV. The final conclusions are included in Section V. II. FEATURE EXTRACTION: IMAGE SEGMENTATION Image segmentation for the purpose of obtaining boundaries is an important step in a feature-based registration system. There are many segmentation techniques available that could be used. However, there is no unique segmentation technique that can perform best on all types of images, and most segmentation techniques are image-dependent [14]. In this paper, we use a technique that can produce a number of desirable boundaries to be used in the control-point finding process. The Laplacian of Gaussian (LoG) operator combines smoothing and second-derivative operations and produces closed-edge contours. Based on the image obtained by convolving the original image with the LoG, edges are found at zero-crossing points. Therefore, the LoG zero-crossing operator is deemed appropriate in this study. The LoG filter is well understood, and there are abundant references on the technique itself and its implementations (see [6], [10], and [23]). However, determination of the zero crossings is not without problems. Conventionally, the resultant convolved image is scanned to detect pixels that have a zero value or pixels at which a change of sign has occurred. These pixels are marked as edge pixels. To avoid noisy pixels, a threshold is used to cut the edge pixels that have small convolution values [23]. However, this method causes discontinuity at the weak edge pixels. Simple zero-crossing patterns, which are composed of signs of pixel values of the filtered image, are detected along both vertical and horizontal directions in [6]. The authors then used a twothreshold method based on the ES at each zero-crossing point. This was used to refine the edge points and assure continuities at weak edge points. A method that conceptually corresponds to intersecting the convolution surface with a horizontal plane at convolution value zero was introduced in [12]. This assured not only closed contours but also correct connections at junctions. Starting with a seed pixel, its neighboring pixels were expanded until a sign change occurred. Pixels on the boundary were flagged as zero crossings. These methods have two general drawbacks when we use the edge points for the purpose of image registration: thick edges and discontinuity at weak edge points. To overcome these drawbacks, we propose a method, called Thin and Robust Zero-Crossing, which produces thin edges and assures continuities at weak edge points as well. This technique

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

DAI AND KHORRAM: A FEATURE-BASED IMAGE REGISTRATION ALGORITHM

2353

Fig. 1. Flowchart of the proposed automatic image registration system.

can be performed in two steps: selective zero-crossing point marking and edge refinement and sorting. At the first stage, we mark as an edge point every pixel that satisfies the following three conditions. 1) The pixel is a zero-crossing point (sign change on the convolved image). 2) The pixel lies in the direction of the steepest gradient change. 3) The pixel is the closest pixel to the virtual zero plane of the LoG image among its eight neighbors. is a zero-crossing point and its Equivalently, if a pixel LoG value satisfies the following condition: sign

LoG

(1)

the LoG value of the pixel neighborhood , and sign is the signum function, it is flagged as an edge point. This condition usually preserves the continuities at the -junctions and assures the detected edge is only one-pixel thick, which indicates the best locational approximation of edge points. For every marked edge point, ES is defined as the steepest gradient slope change along the eight directions in its 8-neighborhood,

where

LoG

is

LoG

based on the LoG values of itself and its eight neighbors [11] ES

LoG

sign

LoG (2)

The results from this stage may be “noisy,” as shown in Fig. 2(a) and (b), but the edges are continuous and thin. More importantly, the noisy edge points can be discarded easily in the edge sorting and refinement processing using the ES array defined above. At the edge refinement and sorting stage, we use the twothreshold method [6] to search and clean the marked edge points to detect contours. This scheme applies the following two conditions to the detected zero-crossing points. 1) The ES at each point along the edges is greater than . 2) At least one point on the contour has an ES greater than . and are the preset thresholds, and . Here, preserves the whole contour around the region boundary without incurring discontinuity at weak edge points. Fig. 2(c) was applied, and (d) show the results after the threshold was set to ten. is set large enough to avoid where

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2354

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 2. Examples of region extraction by the improved LoG zero-crossing edge detector: (a) edges from improved LoG, 1988, (b) edges from improved LoG, 1994, (c) weak edges removed by ES, 1988, (d) weak edges removed, 1994, (e) edges after contour sorting, 1988, (f) edges after contour sorting, 1994, (g) closed contours detected, 1988, and (h) closed contours detected, 1994.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

DAI AND KHORRAM: A FEATURE-BASED IMAGE REGISTRATION ALGORITHM

spurious edges and initiates a new contour search. This twothreshold scheme is implemented by the contour search and sorting program (Fig. 1). A new contour search is initiated is detected. whenever one point with a value greater than When a new search for another contour is initiated, the are accepted neighboring pixels with a value greater than as contour points until no neighboring pixels can satisfy this condition. Then, all the ES values along the detected contour are disabled by setting them to zeros such that these points will not be visited again. The search operation continues until the whole ES array has been scanned. The detected edge points are shown in Fig. 2(e) and (f). The detected closed-contour points for each contour are then sorted and stored in order in an array for further processing. Sorted closed contours are then marked as region boundaries, as shown in Fig. 2(g) and (h). III. CONTROL-POINT CORRESPONDENCE: IMAGE MATCHING After the closed contours are detected from two images, a correspondence mechanism between the regions must be established to match these two images. Region similarity is obtained by comparing the shapes of the segmented regions. Since the images may have translational, rotational, and scaling differences, the shape measures should be invariant with respect to translation, rotation, and scaling. Some of the techniques that measure shape similarity in this fashion are Fourier descriptor, chain-code matching, shape signatures, centroidal profiles, invariant momentum, and shape matrices. In practice, these descriptors have been applied individually or sequentially in the process of image matching. A comparison between these similarity measures can be found in [23]. To achieve better discrimination capability, the modified chaincode matching and the invariant moment are combined to build the correspondence between regions in the two corresponding images.

2355

express the basic properties of symmetry of object. They are equal to zero for objects with central symmetry. Moments of higher order describe more slight variations in shape, but they are more sensitive to noise. Central moments are translational invariants. Under a scale change and , the change to . The moments of normalized moments are then defined as where

(6)

Central moments and scaling invariant-moment representation can be employed to produce a set of invariant moments that are further invariant to rotational and reflectional differences. The lower-order invariants, which are composed of the secondand third-order central moments, are usually sufficient for most registration tasks on remotely sensed image matching. Detailed definitions of these invariants can be found in literature (see [17] and [25]). The following seven invariants are used in this research: (7) (8) (9) (10)

(11)

(12)

A. Moment Representation of Regions: Invariant Moments One set of the useful shape descriptors is based on the theory of moments. The moment invariants are moment-based descriptors of planar shapes, which are invariant under general translational, rotational, scaling, and reflection transformation are given by [24]. The moments of a binary image (3) where and define the order of moment. The center of gravity of the object can be given by using object moments (4) The central moments then are defined as (5) represents the binary object area. Zero-order moment Second-order moments express the distribution of matter around the center of gravity. In the case of objects with mass, they are called moments of inertia. Third-order moments

(13) are scale, rotation, and transAll moments are reflection inlation invariant. Moments is reflection invariant. variant as well. The magnitude of However, its sign changes under reflection. B. Improved Chain-Code Representation of Regions Another efficient representation of digital curves is the modified Freeman chain code. The standard Freeman chain code has certain drawbacks for correlation between two potentially matched contours, such as being noisy and variant to rotational, scaling, and translational differences. Therefore, the standard chain-code representation is improved by the following four operations, as described in [6]. of 1) Shift Operation: A modified version is produced by a the standard chain code

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2356

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

shifting operation defined recursively by

integers mod

(14) while

is minimized.

The shift operation eliminates or reduces the problem of wraparound. 2) Smoothing Operation: Curve smoothing is obtained by a smoothing filter, such as a Gaussian. 3) Normalization Operation: The difference between the means of the corresponding chain-code representations measures the rotational difference between two curves. This rotational difference can be removed by subtracting the means. 4) Resampling Operation: To be scale invariant, the chain code is resampled to the same length as that of its corresponding region. The improvement in similarity measure gained by introducing these operations can be quantified easily with matching coefficients for a pair of chain codes. For a visual demonstration, a graphical comparison is presented in Fig. 3(a)–(h). The standard chain codes, the shifted chain codes, the shifted and smoothed chain codes, and the modified chain codes are computed by all four operations for a pair of example-matched contours and displayed for comparison. The improvement achieved by these four operations easily can be seen from the plots in terms of the accuracy with the similarity measure of matched contours. C. Control-Point Correspondence by Region Matching Control-point correspondence is performed in two steps. The first step includes region matching in the feature space based on combined criteria of minimum distance of seven invariant moments and maximum chain-code matching coefficients between a pair of regions. In this step, two similarity matrices between detected regions in two images are first computed. The first similarity matrix is the Euclidean distance matrix in seven-dimensional (7-D) invariant-moment space. The second similarity matrix is the chain-code matching matrix. Based on these two similarity matrices, the minimum distance classifier is used to identify the most robust matches. For example, a minimum of three GCP’s are required to fit a general affinebivariate polynomial transformation. The centers of gravity for these three matched regions are used as the GCP’s. The second step is performed in image space based on the image registration result of the first step. Given the three most robust GCP’s from the result of the first step, the parameters of the first-order bivariate polynomial can be calculated simply. Based on this transformation, we can map the other potential regions in the sensed image into the reference image. The distance between the center of gravity of the mapped region in the sensed image and the actual center of gravity in the reference image is computed. This distance is actually the RMSE at this pair of corresponding points. A threshold easily can be set to avoid false matches. The final image transformation parameters are based on the matched points resulting from the second step of the image match algorithm.

1) Invariant-Moment Distance Matrix: Let and denote the values of the seven invariant regions detected in the reference image moments of the regions detected in the sensed image, respectively. and the For every region in the reference image and region in the sensed image, we compute the invariant-moment distance between regions and (15) distance matrix in 7-D feature The result is an space, which represents the similarity relationship between the regions in two images. To simplify the search process, the impossible matches between region pairs that have more than a certain amount of difference in their lengths (say 35%) are left out by setting the distance values to one (a special flag for ignoring these region pairs) in the distance matrix. Based on the principle of minimum distance classifier, we conclude that the smaller the distance, the more similar the shapes of two regions. 2) Chain-Code Matching Matrix: A correlation measure of a region pair proposed in [6] can be used to derive the be represented by an matching matrix. Let contour , and let point shifted and smoothed chain code be represented by an -point shifted and smoothed chain . The longer contour (for example, contour ) is code resampled to the same length (denote as ) as the shorter . For any two region contour (contour ), denoted as and in the reference image and the sensed contours image, respectively, we can define the following chain-code over matching coefficient by sliding normalized contour the normalized and resampled contour where [] Max

(16)

(17)

mod

mod (18)

The chain-code curve of closed contour is periodic and can be represented by a modulus operation, as shown in (17). The sliding process between contours is achieved by the maximum operation in (16) to find the best fit between the contour pair. The matching measure is similar to the MSE of two signals. The cosine function guarantees the matching measure to be , there is a perfect match. less than one. When matrix in which each element repThe result is also an resents another reliable similarity measure between the regions in two images. By the same token, the impossible matches between the region pair which have more than a certain amount of difference (say 35%) are disabled by setting the matching coefficient value to zero in the matching matrix. We can notice

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

DAI AND KHORRAM: A FEATURE-BASED IMAGE REGISTRATION ALGORITHM

2357

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Fig. 3. Improved chain-code representation of contours: (a) and (b) matched regions, (c) and (d) original chain-code representations, (e) and (f) shifted chain-code representations, (g) and (h) smoothed and shifted chain-code representations, and (i) and (j) resampled and normalized chain-code representations.

that the greater the chain-code matching coefficient, the more the contours resemble each other in shape. 3) The Algorithm of Image Matching: Based on the two similarity matrices defined above, the following matching algorithm, “ImageMatch,” is developed.

Algorithm ImageMatch: Step 1: Based on the output of the contour search and regions detected in the sorting program, the regions detected in the reference image and the sensed image are identified, ordered, and denoted

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2358

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

as and , respectively. feature matrices: the invariantStep 2: Compute two [defined in moment distance matrix (15)] and the chain-code matching matrix , based on (16). In the invariant-moment computation, we also build two matrices containing the coordinates of the centers of gravity. One is an matrix, denoted as , for the regions in the reference image. The other matrix, denoted as is an , for the regions in the sensed image. Step 3: Searching for potential matches is based on the fol, lowing condition. For any pair of regions and , then the pair if and is accepted as a potential match, where are preset thresholds. Let denote a set of the potential matches detected by the condition above. This step will dramatically reduce search complexity and limit the following search to the matches with highest probabilities. Step 4: Working with the reduced-potential match set , minimum distance classification is performed based on (19) and the three pairs of regions with minimum values in this classification are then identified as the initial matched regions. Step 5: Based on the three initial matched pairs of regions, their corresponding centers of gravity are used as the GCP’s to fit a first-order bivariate polynomial transformation and solve the following two equations in a least-square sense for the transformation and parameters

(20) and are two sets of cowhere ordinates of the corresponding matched points in the reference image and the sensed image, respectively. At the same time, these three pairs of the potential GCP’s are removed from and the remainder of potential matches are denoted as . Step 6: Based on the transformed image in the image space and working with , any additional matches can be detected by examining the following condition: (21) , the th potentially matched pair with where and in the reference coordinates image and sensed image, respectively, and and

is a preset threshold that is actually the maximum RMSE allowed at the GCP’s for the final image registration. If this condition is satisfied, the pair of points is considered a GCP pair. If not, this pair of points is removed from the potential matches. The output from Step 6 of the algorithm described above is a set of matched regions that are considered the final matches between the detected regions in the two images. Their centers of gravity then are used as the ground control points for use in the estimation of transformation parameters. IV. EXPERIMENTAL RESULTS In this section, we provide the experimental results using the proposed algorithms for automated image registration. In the experiments, two multitemporal images of the coastal plain of North Carolina were used. The reference image (image 1) was taken by Landsat 5 TM in November 1988. The sensed image (image 2, the image to be transformed) also was taken by Landsat 5 TM, but in December 1994. Subscenes of 512 512 pixels from the original images were used. Water boundaries usually are defined better in the infrared band images than in the visible band images. Therefore, TM band 5 images were used in the registration experiments, as shown in Fig. 4(a) and (c), respectively. A. Image Segmentation and Region Extraction In the image segmentation, the standard deviation of the LoG operator was set to 1.5 and the kernel size was 15 15 pixels. The minimum value of ES for edge points was set to ten, and the value of ES to initiate a contour search was set to 40. The minimum region boundary length was set to 40 pixels for a region to be qualified in region matching. With these parameters set, there are 22 regions in the reference image and 29 regions in the sensed image that were detected by the contour search and sorting program. These detected features are shown in Fig. 4(b) and (d), respectively. The main region boundaries of interest are clearly recognizable in the stretched gray-scale images. The rest of the feature boundaries, which are short in length and embedded in the image features, are detected falsely for the most part. These features were to be examined for correspondence and removed by the ImageMatch algorithm. B. Feature Extraction and Control-Point Correspondence From the detected regions in two images, we constructed the invariant-moment distance matrix and chain-code matching matrix. Step 3 in the ImageMatch algorithm identified 21 29 possibilities, where the potential matches out of 22 was set to 0.1, and the chaininvariant-moment threshold code matching threshold was set to 0.8. Three pairs of GCP’s with minimum distance and highest correlation were detected and used initially to estimate the transformation parameters. In the transformed image space, four additional matches were detected by Step 6 in the ImageMatch algorithm. The centers of gravity of the corresponding regions were used as control

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

DAI AND KHORRAM: A FEATURE-BASED IMAGE REGISTRATION ALGORITHM

2359

(a)

(b)

(c)

(d)

Fig. 4. Test windows and image segmentation results: (a) reference image: TM band 5, 1988, (b) detected regions from the reference image, (c) sensed image: TM band 5, 1994, and (d) detected regions from the sensed image.

TABLE I COORDINATES OF THE DETECTED CONTROL POINTS

timated for the images. The registration of images with rotational, translational, scaling, and image-skew differences can be approximated by the following general affine relationship [16]

(22)

points. The complete matching results determined after the image matching algorithm are summarized in Table I. C. Estimation of Registration Parameters Given a number of corresponding ground-control points from two images, the transformation parameters can be es-

and are two corresponding points where in the reference image and the sensed image, respectively and are the transformation parameters and express scaling, rotational, translational, and skew differences between two images. The affine mapping function is appropriate for data with flat topography. For some data with nonlinear, local geometric distortions (such as hilly areas with terrain changes), more complex transformation functions may be needed to produce better interpolation results. These mapping functions include thin-plate spline interpolation [26], [27] and adaptive mapping function [28]. For general affine

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2360

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

TABLE II ROOT MEAN-SQUARE ERRORS AT CONTROL POINTS (IN PIXELS)

transformation, a least-square approach is used to determine these six transformation parameters: from matched control points (23) where (24)

and (25) The RMSE at the GCP’s commonly is used to evaluate the performance of the least-square fitting and the accuracy of the image registration, which is usually defined as shown in (26). D. Image Resampling and Performance Comparison Since the determination of parameters of the general affine transformation (22) requires knowledge of only three control points, seven point pairs were sufficient. Given a number of corresponding control points in two images, the transformation parameters can be estimated using the least-square technique in (23). The transformation equations were obtained as follows:

Knowing the mapping function, the sensed image was transformed and resampled using cubic convolution interpolation. The resampled image is shown in Fig. 5(a). The resampled image was then mosaicked with the reference image, as shown in Fig. 5(b). The registration accuracy was estimated using (31) at every control point, as shown in Table II. To do a fair comparison with the result of manual registration, we visually located and digitized seven best-control points using a magnifier on the standard false-color composite

image (TM bands 4, 3, 2) displayed on a 24-bit screen. These seven points were then used in image transformation parameter estimation. The RMSE in both and directions is below 0.7 pixel, which is about the best accuracy that manual registration method can achieve. It seems that manual registration results can be improved by changing control points and other methods. But in practice, this improvement or refinement is limited. These limiting factors mainly include inaccuracy of reference data for image-to-map registration and inevitable positional error or deviation during visual correspondence. These factors are the sources of system errors in the manual registration method. The manual registration result was then compared to that of the proposed automated algorithm. The result of this comparison is shown in Table III. The results show that the proposed automated algorithm has better performance than manual registration, with an improvement in the registration accuracy of more than half a pixel on the average. V. CONCLUSIONS This paper has dealt with automatic registration of remotely sensed imagery. This procedure appears often in the processing of multitemporal and/or multisensor satellite image analyses, such as multisource data fusion, multitemporal change detection, and image mosaicking. In this work, we explored the critical elements for an automated image registration system using Landsat TM imagery. These elements included image segmentation, control-point selection and correspondence, and transformation parameter estimation. Attention has been paid to the region-feature extraction and automated controlpoint correspondence. In image segmentation, we developed a method called thin and robust zero-crossing based on the conventional LoG operator, and we improved the performance of the LoG zero-crossing edge detector by selectively picking zero-crossing points and defining an ES array. The images (both the reference image and the sensed image) were then segmented by the thin and robust zero-crossing-edge extraction technique, and the regions with closed boundaries were extracted. Each region was then represented by modified chain codes and invariant moments, which are affine-invariant descriptors describing the shape of a region. The correspondence

RMSE

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

(26)

DAI AND KHORRAM: A FEATURE-BASED IMAGE REGISTRATION ALGORITHM

2361

TABLE III COMPARISON BETWEEN MANUAL REGISTRATION AND THE PROPOSED ALGORITHM

(a)

(b)

In the feature space, minimum distance classification was used to identify the most robust control points for initial image transformation. In the image space, region-to-region correspondence was established by the RMSE rule. After the correspondence between the regions had been established, the centers of gravity of the corresponding regions were used as control points. The parameters of transformation were computed by the least-square rule based on general affine transformation. The performance of the proposed algorithm has been demonstrated by registering two multitemporal Landsat TM images taken in different years. Registration accuracy of less than one-third of a pixel has been achieved in the experiments. The proposed automated algorithm outperforms manual registration by over half a pixel on the average (in terms of the RMSE at the GCP’s). In summary, the technique of automated image registration developed in this work is potentially powerful in terms of its registration accuracy, the degree of automation, and its significant value in an operational context. The approach is also robust, since it overcomes the difficulties of control-point correspondence caused by the problem of feature inconsistency. It has numerous applications in automated change detection, image fusion, motion detection, and stereo matching. ACKNOWLEDGMENT The authors would like to thank W. E. Snyder and G. L. Bilbro of the Department of Electrical and Computer Engineering and J. Lu of the Department of Biological and Agriculture Engineering, North Carolina State University, Raleigh, for the stimulating discussion and generous help in computer coding. The authors are also thankful to the anonymous referees for their critical review and constructive comments. REFERENCES

(c) Fig. 5. (a) Registered image, (b) mosaic of the two images after registration, and (c) overlapped area between these two images after registration, which is ready for further processing (such as digital-change detection).

between regions was established using the combined criteria of invariant-moment distance and improved chain-code matching. Region matching was first implemented in the feature space and was then implemented sequentially in the image space.

[1] X. Dai and S. Khorram, “A hierarchical methodology framework for multisource data fusion in vegetation classification,” Int. J. Remote Sensing, vol. 19, pp. 3697–3701, Dec. 1998. [2] , “The effects of image misregistration on the accuracy of remotely sensed change detection,” IEEE Trans. Geosci. Remote Sensing, vol. 36, pp. 1566–1577, Sept. 1998. [3] M. S. Tanaka, S. Tamura, and K. Tanaka, “On assembling subpictures into a mosaic picture,” IEEE Trans. Syst., Man, Cybern., vol. SMC-7, pp. 42–48, Mar. 1977. [4] L. M. G. Fonseca and B. S. Manjunath, “Registration techniques for multisensor remotely sensed imagery,” Photogramm. Eng. Remote Sensing, vol. 562, pp. 1049–1056, Sept. 1996. [5] X. Dai and S. Khorram, “Development of a feature-based approach to automated image registration of multisensor and multitemporal remotely sensed imagery,” in Proc. 1997 IEEE Int. Geosci. Remote Sensing Symp. (IGARSS’97), Singapore, Aug. 1997, pp. 243–245.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.

2362

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 37, NO. 5, SEPTEMBER 1999

[6] H. Li, B. S. Manjunath, and S. K. Mitra, “A contour based approach to multisensor image registration,” IEEE Trans. Image Processing, vol. 4, pp. 320–334, Mar. 1995. [7] L. G. Brown, “A survey of image registration techniques,” Comput. Surv., vol. 24, no. 4, pp. 325–376, 1992. [8] A. V. Cideciyan, S. G. Jacobson, C. M. Kemp, R. W. Knightton, and J. H. Magel, “Registration of high resolution images of the retina,” in Proc. SPIE, Medical Imaging VI: Image Processing, Newport Beach, CA, Feb. 1992, vol. 1652, pp. 310–322. [9] Q. Zheng and R. Chellappa, “A computational vision approach to image registration,” IEEE Trans. Image Processing, vol. 2, pp. 311–326, July 1993. [10] E. J. M Rignot, R. Kowk, J. C. Curlander, and S. S. Pang, “Automated multisensor registration: Requirements and techniques,” Photogramm. Eng. Remote Sensing, vol. 57, pp. 1029–1038, Aug. 1991. [11] G. O. Mason and K. W. Wong, “Image alignment by line triples,” Photogramm. Eng. Remote Sensing, vol. 58, pp. 1329–1334, Sept. 1992. [12] T. Schenk, J. Li, and C. Toth, “Toward an autonomous system for orienting digital stereopairs,” Photogramm. Eng. Remote Sensing, vol. 57, pp. 1057–1064, Aug. 1991. [13] J. S. Greenfeld, “An operator-based matching system,” Photogramm. Eng. Remote Sensing, vol. 57, pp. 1049–1055, Aug. 1991. [14] A. Goshtasby, G. C. Stockman, and C. V. Page, “A region-based approach with subpixel accuracy,” IEEE Trans. Geosci. Remote Sensing, vol. GE-24, pp. 390–399, May 1986. [15] A. D. Ventura, A. Rampini, and R. Schettini, “Image registration by recognition of corresponding structures,” IEEE Trans. Geosci. Remote Sensing, vol. 28, pp. 305–314, May 1990. [16] J. Ton and A. K. Jain, “Registering Landsat images by point matching,” IEEE Trans. Geosci. Remote Sensing, vol. 27, pp. 642–651, Sept. 1989. [17] J. Flusser and T. Suk, “A moment-based approach to registration of images with affine geometric distortion,” IEEE Trans. Geosci. Remote Sensing, vol. 32, pp. 382–387, May 1994. [18] A. Goshtasby, “Registration of images with geometric distortions,” IEEE Trans. Geosci. Remote Sensing, vol. 26, pp. 60–64, Jan. 1988. [19] H. H. Li and Y. T. Zhou, “Automatic visual/IR image registration,” Opt. Eng., vol. 35, pp. 391–400, Feb. 1996. [20] B. S. Reddy and B. N. Chatterji, “An FFT-based technique for translation, rotation, and scale-invariant image registration,” IEEE Trans. Image Processing, vol. 5, pp. 1266–1271, Aug. 1996. [21] J. Le Moigne, “Parallel registration of multi-sensor remotely sensed imagery using wavelet coefficients,” in Proc. SPIE: Aerosense Wavelet Conf., Orlando, FL, Apr. 1994, vol. 2242, pp. 432–443. [22] J. P. Djamdji, A. Bijaoui, and R. Maniere, “Geometrical registration of images: The multiresolution approach,” Photogramm. Eng. Remote Sensing, vol. 59, no. 5, pp. 645–653, 1993. [23] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, vol. 1. Reading, MA: Addison-Wesley, 1993. [24] A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [25] M. K. Hu, “Visual pattern recognition by moment invariants,” IEEE Trans. Inform. Theory, vol. IT-8, no. 2, pp. 179–187, 1962. [26] F. L. Bookstein, “Principal warps: thin-plate splines and the decomposition of deformations,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 567–585, June 1989. [27] I. Barrodale, D. Skea, M. Berkley, R. Kuwahara, and R. Poeckert, “Warping digital images using thin plate splines,” Pattern Recognit., vol. 26, no. 2, pp. 375–376, 1993. [28] J. Flusser, “An adaptive method for image registration,” Pattern Recognit., vol. 25, no. 1, pp. 45–54, 1992.

Xiaolong Dai (S’95–M’97) received the B.S.E.E. degree in electrical engineering from the Department of Electronics and Information Engineering, Huazhong University of Science and Technology (HUST), Wuhan, China, in 1986, the M.S. degree in systems ecology under a joint graduate program from the Chinese Academy of Forestry and Tsinghua University, Beijing, China, in 1989, and the Ph.D. degree in both electrical engineering and forestry from North Carolina State University (NCSU), Raleigh, in 1997. He was with the Chinese Academy of Sciences from 1989 to 1992, where he worked as an Engineer and Assistant Director in the Laboratory of Systems Ecology, Beijing, China. He has been affiliated with the Center for Earth Observation (CEO) (formerly the Computer Graphics Center), NCSU, since 1992. He is currently a Senior Research Scientist with CEO. Over the past decade, he has been involved in conducting research in digital image processing and machine vision as applied to remotely sensed data and medical imagery. He has published one book in human ecology and more than 35 articles in refereed journals and international conference proceedings in remote sensing, medical imaging, and image analysis. His interests span a wide gamut of interdisciplinary and multidisciplinary fields in science and technology, from electronic engineering to geosciences and information technology. His primary research interests include the theory, technology, and applications of information extraction from imagery, including automated image analysis systems, digital image processing, pattern analysis and machine intelligence, geographic information systems, data visualization and virtual reality, and multisource data fusion.

Siamak Khorram (M’84) received the M.S. degree in engineering and the M.S. degree in ecology from the University of California, Davis (UCD). In 1975, he received the Ph.D. degree with emphasis on remote sensing and image processing under a joint study program from the University of California, Berkeley, and from the Water Science and Engineering Department, UCD. Since 1975, he has been involved in teaching and conducting research in remote sensing, geoinformatics, environmental science and engineering, the development of computer-based information technologies, and systems integration. In 1982, he established the Computer Graphics Center at North Carolina State University (NCSU), Raleigh, as a university-wide organized research center involved in conducting and facilitating research and training in remote sensing and geoinformatics. In 1995 and 1996, he served as the Dean and Vice President for Academic Programs, International Space University (ISU), Strasbourg, France. Currently, he serves on the Board of Trustees of ISU and is a founding member of the Peaceful Uses of Space for all Humanity, Switzerland. In 1997, he established the Center for Earth Observation and serves as the Professor and the Director of this Center. He has taught courses in air photo interpretation and photogrammetry, environmental remote sensing, digital image processing, and information systems. He has authored more than 130 technical publications.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 26, 2008 at 01:07 from IEEE Xplore. Restrictions apply.