Parameter Optimization for the Extraction of ... - Semantic Scholar

Report 2 Downloads 98 Views
5612

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 9, SEPTEMBER 2014

Parameter Optimization for the Extraction of Matching Points Between High-Resolution Multisensor Images in Urban Areas Youkyung Han, Jaewan Choi, Younggi Byun, and Yongil Kim, Member, IEEE

Abstract—The objective of this paper is to extract a suitable number of evenly distributed matched points, given the characteristics of the site and the sensors involved. The intent is to increase the accuracy of automatic image-to-image registration for highresolution multisensor data. The initial set of matching points is extracted using a scale-invariant feature transform (SIFT)-based method, which is further used to evaluate the initial geometric relationship between the features of the reference and sensed images. The precise matching points are extracted considering location differences and local properties of features. The values of the parameters used in the precise matching are optimized using an objective function that considers both the distribution of the matching points and the reliability of the transformation model. In case studies, the proposed algorithm extracts an appropriate number of well-distributed matching points and achieves a higher correct-match rate than the SIFT method. The registration results for all sensors are acceptably accurate, with a root-mean-square error of less than 1.5 m. Index Terms—Automatic image registration, high-resolution multisensor images, parameter optimization, scale-invariant feature transform (SIFT).

I. I NTRODUCTION

I

NCREASING numbers of studies and tasks are being performed using high-resolution commercial satellite sensors with a spatial resolution of approximately 1 m due to the high accessibility of such data. There are many applications for remote sensing, such as image fusion, change detection, object recognition, and 3-D scene reconstruction. In order to use the sensors into these applications, image registration is a fundamental preprocessing requirement. Image registration is the process of geometrically overlaying two or more images of the same scene that were acquired at different times, from different viewpoints, or by different sensors

Manuscript received December 30, 2012; revised April 21, 2013, July 30, 2013, and October 17, 2013; accepted November 4, 2013. This work was supported by the National Research Foundation of Korea Grant funded by the Korean government (Ministry of Education and Science Technology) (No. 2012R1A2A2A01045157). (Corresponding author: Y. Kim.) Y. Han and Y. Kim are with the Department of Civil and Environmental Engineering, Seoul National University, Seoul 151-744, Korea (e-mail: [email protected]; [email protected]). J. Choi is with the School of Civil Engineering, Chungbuk National University, Cheongju 361-763, Korea (e-mail: [email protected]). Y. Byun is with the Satellite Information Research Center, Korea Aerospace Research Institute, Daejeon 305-333, Korea (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2013.2291001

[1]. Most image registration, particularly for high-resolution images, can be categorized into four steps. First, features, which are the objects that correspond to distinctive and representative points, are extracted from each image. Next, each feature from one image is matched with the corresponding feature of the other image using a similarity measure. The corresponding feature pair is called a matching point. Then, a transformation model is developed with the matching points using a linear or nonlinear function. Finally, one of the images is registered to the other image. The matching points are typically extracted manually, requiring considerable human resources, cost, and time [2]–[4]. Thus, automatic image-to-image registration has received considerable attention. Compared with low- and medium-resolution images, automatic image registration between high-resolution images is more difficult because of the large distortions in the images [5]. These distortions are caused primarily by the various offnadir angles of the images. The sensors can observe the scene from different paths and angles, which generates images with significant nonrigid geometric distortions. Changes in height can cause severe relief displacement and local distortion of the image, particularly in urban areas [6], [7]. Thus, a feature-based method that can correct local distortions is recommended for remote-sensing images rather than an area-based method [8], [9]. It is also necessary to perform image registration using nonlinear functions to reduce the distortions in the high-resolution images [5]. To apply such nonlinear functions, the matchingpoint pairs between images should be distributed evenly over the entire region of the image because the distortions in different regions of the image will be different [10]. Regions where points are not extracted could cause local registration errors if this step is not performed [11]. Given these characteristics of high-resolution images, many studies have suggested algorithms to extract matching points or methodologies appropriate for high-resolution images [11]–[14]. In particular, many studies have focused on extracting matching points using scale-invariant feature transform (SIFT), a typical feature-based matching-point extraction method [15]. These studies have improved the SIFT algorithm and made it more effective [16]–[20]. However, many incorrect matches occur when these algorithms are applied to remotesensing images [21]. Many studies have adopted and improved the SIFT algorithm to make it more appropriate for highresolution remote-sensing images, for applying to multisensor images [22]–[24] or a large scene [8], [10], and mostly for

0196-2892 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

HAN et al.: PARAMETER OPTIMIZATION FOR THE EXTRACTION OF MATCHING POINTS

Fig. 1. Example of geometric distortion caused by excessive extraction of matching points without considering height variations. (a) Extracted matching points. (b) Registration result after applying the nonrigid transformation model.

increasing the registration accuracy [25]–[28]. However, these studies have focused only on resolving particular problems in preexisting algorithms for specific sensors or study sites and have provided no general solutions. Furthermore, there is no proposed approach for removing matching points extracted from such objects as buildings or shadows, which can change location according to the off-nadir angle or acquisition time of the image. Therefore, it is necessary to develop an effective automatic registration technique for which the accuracy is less affected by the sensor type, the time at which the image was acquired, and the topographical features in the acquired image. In our previous research on image registration for highresolution satellite images, we focused on extracting a large number of matching points by improving the SIFT-based method and eliminating outliers using local properties of features [25]. However, our result presented several limitations when applied to the registration of high-resolution multisensor images. First, an excessive number of points in a specific region caused geometric errors in the form of distortions of linear features or areas. Fig. 1 illustrates an example of the geometric distortion caused by excessive points and height variation. The matching points were extracted along roads and the relatively flat adjacent regions [see Fig. 1(a)]. However, the points extracted from the roads and flat regions are at different heights, as the flat regions are approximately 1 m higher than the roads, as estimated by the digital elevation model. This height difference causes the result of the registration, obtained by applying piecewise linear functions, to have linear and shape distortions along the roads and flat regions [see Fig. 1(b)]. Thus, although a large number of matching points is essential for precise automatic registration, inaccuracies caused by height variations and uneven distribution can reduce the reliability of the transformation model. Therefore, the tradeoff between the number of matching points and the reliability of the registration’s accuracy is an important issue for image-to-image registration between high-resolution images. The second limitation of our previous results was that it was difficult to coregister multisensor images because of the diverse properties of the sensors or regions in the image. The terrain and shadows can change according to the off-nadir angle and observation time of the images, even when they are acquired from the same sensor. Moreover, each sensor has its own spatial

5613

and radiometric characteristics, so the diversity of properties becomes more pronounced in multisensor images. As a result, the optimal values of the parameters used to match features can change depending on the sensors and sites. Thus, automatic image registration requires the selection of optimal parameters based on certain properties. The objective of this paper is to extract a suitable number of evenly distributed matching points, given the characteristics of the study site, to increase the accuracy of automatic image-toimage registration for high-resolution multisensor data. We extract features using the SIFT method with a proposed similarity measure that combines descriptor distance with spatial distance. The parameters used in our proposed measure are optimized using an objective function that considers both the distribution of matching points and the reliability of the transformation model. Matching points extracted from shadowed regions or objects with height variations, such as buildings or mountains, are removed using the orientation differences between the features extracted from each image. Then, the matching points are subjected to combined piecewise linear functions and an affine transformation for precise registration. The novelty of this approach in comparison with the previous research is as follows. First, the tradeoff between the number of matching points and the reliability of the registration’s accuracy is controlled by the construction of the objective function. Second, we demonstrate that, by optimizing the values of the parameters, automatic registration is possible regardless of the properties of the sensors or sites. Finally, we demonstrate that evenly distributed matching points can be extracted using local properties of features. II. M ETHODOLOGY In the proposed method, the initial matching-point set is extracted by a SIFT-based feature descriptor and used to minimize the coarse location difference between features of reference and sensed images. The local properties of features are considered to extract precise matching points. The values of the parameters used in the precise matching step are optimized using an objective function, considering the distribution of points and the reliability of the model. Matching points extracted from tall buildings or shadows are removed from the matching-point set using the orientation difference, and the remaining points are used to construct a transformation model that combines the affine and piecewise linear functions for registration accuracy. A flowchart of the proposed method is provided in Fig. 2. The process was programmed using MATLAB release 2012a. A. Matching-Point Extraction Using Local Properties of Features The matching points are extracted using the method proposed in our previous research [25]. The features for initial matching are extracted using the SIFT method. The SIFT method consists of three steps: feature extraction, feature description, and feature matching. The feature extraction step approximates the Laplacian with difference-of-Gaussian filters and detects local extrema in scale space as candidate features. At each

5614

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 9, SEPTEMBER 2014

in each corresponding set as 4-D vectors. Assume that each feature point of the reference image is ci and the corresponding point in the sensed image is ci , where ci = (xci , yci , oci , sci )  , oci , sci ). Then, the affine transformation Ta and ci = (xci , yci is described by      xci x = Ta ci . (1) yci yci Using the affine coefficients, the locations of the features in the sensed image are transformed to the coordinate system of the reference image. Therefore, each feature of the sensed  , opi , spi ) can be represented as pi for image pi = (xPi , ypi precise matching    pi = xPi , ypi , opi , spi        = Ta (xPi ) , Ta ypi , opi , spi . (2)

Fig. 2. Flowchart of the proposed method.

candidate location, a detailed model is fitted to determine the exact location and scale. The main orientation for each feature is identified based on its local image properties to achieve invariance to image rotation. Each feature is thus denoted by location, scale, and orientation in the feature extraction step. The feature description step of the SIFT method describes each feature based on a region of pixels in its neighborhood that is rotated to each orientation to achieve a rotation-invariant descriptor. The feature matching adopts a minimum Euclidean distance on the descriptor vector for each feature in the reference image to find the nearest neighbor in the sensed image as its corresponding feature. To ensure correct matching, the ratio of the distances to the closest and second closest neighbors should be less than a predefined threshold. Interested readers are referred to [15] for a more detailed explanation of the SIFT method. Assume that l feature points are extracted from the reference image and m points are extracted from the sensed image. The feature point sets of each image can be expressed as P = {p1 , p2 , · · · pl } and P = {p1 , p2 , · · · pm }, respectively. Each feature pi and pi consists of four-dimensional vectors of x and y locations, orientation, and scale; these vectors are expressed as pi = (xpi , ypi , opi , spi ) and pi = (xPi , ypi , opi , spi ). Any falsematch pairs present in the matching points extracted using the SIFT method should be removed. The matching points are then used to estimate affine coefficients between the reference and sensed images. We calculate the root-mean-square errors (RMSEs) of all matching points and remove the pair with the largest RMSE. The remaining matching points are used to estimate the affine coefficients again, and this process is repeated until all of the RMSEs of the matching points are less than a predefined threshold. These remaining matching points are used to estimate the final affine coefficients for precise matching. When n points are matched, the matching-point sets C = {c1 , c2 , · · · cn } and C = {c1 , c2 , · · · cn } are also represented

The difference in locations, termed the spatial distance, is calculated as the Euclidean distance between a feature in the sensed image after it is transformed to the reference coordinate system and the corresponding feature in the reference image. The spatial distance is then used for precise matching. A circular buffer that has a specific spatial distance from each feature in the reference image is generated. The features in the sensed image that are within the buffer after being transformed to the reference coordinate system become the candidates for matching for each feature in the reference image. For a feature in the reference image pi and a feature in the sensed image that is within the buffer pj , the spatial distance SD(pi , pj ) is calculated as  2  2   x − x + y − y pi pi pj pj   SD pi , pj = r       xpi − Ta xPj

=

2

 + ypi − Ta yPj

2

r (3)

where r is the radius of the circular buffer. All features of the sensed image within the buffer are candidates to be matching points for each feature of the reference image, so the range of spatial distance is normalized to [0, 1]. We define the distance D(pi , pj ) of a corresponding pair (pi , pj ) for precise matching as       (4) D pi , pj = ED pi , pj 1 + SD pi , pj where ED(pi , pj ) is the Euclidean distance between the descriptor vectors, which was used in initial matching by the SIFT method and SD(pi , pj ) is the spatial distance between the two features in the sensed and reference images. When the locations of features pi and pj are identical, SD(pi , pj ) becomes zero, and only the Euclidean distance between the descriptor vectors is used for precise matching. If the distance between these two features equals the radius of the circular buffer, then SD(pi , pj ) becomes one. A feature pair pi and pj that has less than the specified ratio of the closest to second closest distances becomes a matching point when the calculated

HAN et al.: PARAMETER OPTIMIZATION FOR THE EXTRACTION OF MATCHING POINTS

Fig. 3. Main orientation change of features extracted from a building roof based on the off-nadir angle of the sensor.

distance D(pi , pj ) is smaller than any other distance calculated for the feature pi and another feature of the sensed image within the buffer and smaller than the specific threshold D. B. Outlier Elimination Image characteristics, such as the acquisition time and offnadir observation angle, can change the shadows cast and the visible side of a building from image to image. This property can be particularly noticeable in high-resolution images. Thus, matching points extracted from cast-shadow regions or from objects with height variations, such as buildings and mountains, should be eliminated for more reliable registration accuracy. For this reason, we treat the matching points extracted from objects with height variations and shadows as outliers, as well as those with geometrically incorrect locations. The points extracted within shadows or from building roofs have orientation differences that vary according to the image’s characteristics when compared with the general orientation differences of other matching points. Fig. 3 presents the main orientations of features extracted from a building roof. Although the features are extracted from the same building, the main orientation of the feature changes because of the height of the building and the off-nadir angles of the sensors. Using this property, we eliminate outliers based on the orientation difference between matching points ci and ci , which can be expressed as doi = oci − oci . The average and standard deviations of the orientation differences between the matching points of the two images, d¯o and σo , respectively, are calculated. A simple z-score test is then used to detect the outliers as described in the following: Zoi =

doi − d¯o . σo

(5)

When the absolute z-value is greater than the threshold of one sigma, the matching point is judged to be an outlier and eliminated from the matching-point sets. C. Construction of the Objective Function As mentioned previously, it is difficult to identify and generalize the proper number and distribution of extracted matching points based on the properties of the study sites, such as

5615

image size, land cover, and spatial resolution. An excessive number of matching points may not ensure a more accurate registration result because it can increase the possibility of extracting inaccurate points as well. To overcome this problem, many studies have laid regular grids over the original images to extract evenly distributed matching points [5], [8], [10], [29]. The image is divided using a regular grid, and only one point or a certain proportion of the points from each area is selected as matching points. However, these processes are limited because they do not consider the regional properties of the study site. For example, some regions have complicated terrain, and large geometric distortions can occur if one or only a few points from the corresponding grid area are used for registration. In contrast, there is no need to extract matching points from each grid area to correct a flat region. In other words, although the points are evenly extracted, this method does not consider the properties of the site. Furthermore, the number of extracted points depends largely on the size of the grid. We propose to extract a suitable number of evenly distributed matching points while also considering the properties of the image’s regions by constructing an objective function. The objective function O(D, r), which considers the precision of the matching points and their distribution, is composed of the proposed distance, D, and the radius of the buffer, r, as follows: O(D, r) = RC(D, r)/σRC + DQ(D, r)/σDQ

(6)

where RC is a registration consistency that affects the reliability of the model [30], [31] and DQ is a distribution quality that influences the distribution of the extracted points [32]. σRC and σDQ are the standard deviations of registration consistency and distribution quality, respectively. The registration consistency and distribution quality do not have specified ranges of values, and these two indices have values of zero in the ideal case when the two images are coregistered equally and the matching points are well distributed. The two indices are divided by their standard deviations to normalize their values by the standard deviation units such that they can be compared [33]. The values of D and r are optimized when the objective function is minimized. Registration Consistency: Registration consistency is used as a measure to evaluate the performance of our registration algorithm as the values of buffer and distance are changed to find the optimal values. Defining TA,B as the transformation found using A as the sensed image and B as the reference image and TB,A as the reverse transformation, the registration consistency RC(D, r) of TA,B and TB,A over A and B can be formulated as 1 RC(D, r) = (x, y) − TB,A ◦ TA,B (x, y) NA (x,y)∈(IA ∩I)

(7) where (x, y) are the coordinates of a pixel in the overlapped image, the composition TB,A ◦ TA,B represents the transformation that applies TA,B and then TB,A , I is the overlap region of images A and B, IA is the discrete domain of image A, and NA is the number of pixels in A within the overlap region. When the value of the registration consistency is small,

5616

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 9, SEPTEMBER 2014

the constructed model between the two images is reliable. The matching points for constructing TA,B and TB,A are affected by the parameters D and r, so the registration consistency function is also affected by these parameters. Distribution Quality: Distribution quality quantitatively describes how evenly the points are distributed over the image. To determine the distribution quality, triangulations are constructed using the Delaunay algorithm. The distribution quality consists of area and shape descriptors of the triangles; the area descriptor is a measure of the dispersion or variation in the triangular areas of a distribution of matching points, and the shape descriptor is a measure of the dispersion or variation in the triangular shapes of a distribution of matching points. The area descriptor DA and shape descriptor DS are formulated as follows:

 2 n A

n i=1 Ai − 1 Ai (8) DA = A = i=1 n−1 n  n 2 3 × Max(Ji ) i=1 (Si − 1) DS = Si = (9) n−1 π where n is the number of triangles, Ai is the area of triangle i, and Max(Ji ) is the radian value of the largest internal angle i. Lower descriptor values indicate better point distribution. As the distribution and number of matching points are dependent on the values of the parameters D and r, the area and shape descriptors are also functions of these parameters. Considering both the area and shape descriptors, the distribution of points can be described by the distribution quality DQ(D, r)  2 n n  Ai 2 ¯ −1 × i=1 A i=1 (Si − 1) . DQ(D, r) = DA ×DS = n−1 (10) D. Parameter Optimization To perform the proposed matching step and extract precise matching points, appropriate values for the proposed distance D and buffer radius r must be determined. A simplex optimization algorithm was used to minimize the objective function and optimize the parameter values [34]. This algorithm considers all of the parameters together and is a well-known simple and appropriate technique for seeking extremes without derivatives. The simplex method requires initial parameter values. In this paper, the initial values of the proposed distance and circular buffer radius were selected empirically. It is not essential to select accurate initial values because the optimized values of the two parameters can be estimated by selecting the initial values within a significant range. The initial value of the distance was chosen to be 0.7 in all cases, which is within the significant range examined in [25]. The circular buffer radius can have different effects on the calculation based on the difference in scale of the images. We deduced the radius from the local properties of features calculated in the SIFT procedure. Each point of the reference image ci and the corresponding point of the sensed image ci have their own scales: oci and oci ,

Fig. 4. Objective function optimization process to determine parameter values.

respectively. The average ratio of the scale between matchingpoint pairs d¯s is then calculated as

n (o /oci ) (11) d¯s = i=1 ci n where n is the number of matching points extracted using the SIFT method. The scale ratio indicates the spatial resolution difference between the two images if the matching points are extracted correctly. Therefore, the scale ratio was used to determine the initial radius of the circular buffer because the effect of the buffer can change based on spatial distance. In this paper, the initial buffer radius for the application of the algorithm was selected as 5 × d¯s , regardless of the properties of the sensors or study site. Selecting the initial parameter values without user intervention means that the automatic application of the algorithm to image registration between multisensor images becomes possible. Another use of the scale ratio between matching points is calculating the registration consistency, which is part of the objective function. Registration consistency should evaluate both the forward and inverse transformation models. If there is a scale difference between the images, the effect of a given value of the circular buffer radius on the forward and inverse transformation models will differ. For example, if d¯s is calculated, the initial buffer radius for the forward transformation model is 5 × d¯s , whereas the radius for the inverse transformation model is still 5. Using this procedure, the spatial effect between the forward and inverse transformation models can be controlled regardless of the spatial resolution of each image. Fig. 4 illustrates the objective function optimization process for determining the values of D and r.

HAN et al.: PARAMETER OPTIMIZATION FOR THE EXTRACTION OF MATCHING POINTS

Fig. 5.

5617

Study site: Suburban region in Daejeon, South Korea. (a) QuickBird-2. (b) IKONOS-2. (c) KOMPSAT-2.

E. Global and Local Transformations The extraction of matching points by precise matching according to the proposed similarity measure of distance and the subsequent removal of outliers based on a circular buffer and orientation difference together comprise a transformation model that is a combination of an affine transformation and piecewise linear functions. The piecewise linear functions, which are suitable when the images have local geometric differences because a matching point affects registration in the immediate neighborhood, are employed in the mapping function to reduce local geometric distortions caused by terrain in the high-resolution image [35], [36]. It divides the images into triangular elements by Delaunay’s triangulation method, and an affine transformation is used to map a region in the sensed image to the corresponding region in the reference image. However, with the piecewise linear functions, precise registration accuracy is only achieved within the convex hull of the points from which the triangles are obtained [5]. Thus, we applied a global affine transformation to the areas outside the convex hull of the points, estimated from the points defining the convex hull. A detailed explanation of this transformation method is provided in [25]. III. R ESULTS AND D ISCUSSION A. Study Site The study site for evaluating the proposed method is in Daejeon, South Korea (see Fig. 5). It is an urban area that consists mainly of buildings, roads, and low mountains. Three types of high-resolution satellite sensors, the QuickBird-2, IKONOS-2, and KOMPSAT-2, were employed; these three sensors cover the same region of Daejeon. The QuickBird-2 image has a 0.6-m spatial resolution, whereas the IKONOS-2 and KOMPSAT-2 images have a 1.0-m spatial resolution. The offnadir angles of the QuickBird-2, IKONOS-2, and KOMPSAT-2 images are 13.50◦ , 29.68◦ , and 1.66◦ , respectively. The various off-nadir angles of the images are able to be evaluated by the algorithm as oblique angles. The size of each data set used in the case studies is 1000 × 1000 pixels. Employing local or global affine transformations may not suffice for registration scenarios

TABLE I S ENSOR S PECIFICATIONS

involving full satellite scenes, and sensor model correction needs to be applied [7]. However, registering part of the entire scene, which is 1000 × 1000 pixels in this site, is sufficient for the application of the transformations. The sensor specifications are summarized in Table I. We experimented with three cases that are a combination of the three sensors. The reference and sensed images of case 1 are QuickBird-2 and IKONOS-2, respectively. Similarly, the reference and sensed images of case 2 are QuickBird-2 and KOMPSAT-2, and for case 3, they are IKONOS-2 and KOMPSAT-2, respectively. By examining the sensors in these different combinations, we can evaluate various conditions, such as the spatial resolution, temporal properties, and off-nadir angle. To help extract matching points, a histogram equalization was applied in all cases to enhance the contrast of the images, which were then transformed to an 8-b radiometric resolution as a preprocessing step. B. Results of Parameter Optimization The extraction of the initial matching-point pairs using the SIFT method and the elimination of false-match pairs through geometric error should be performed before the proposed algorithm is applied. Only the remaining points that have an RMSE below the predefined threshold are used to estimate the affine coefficients between the two images, and the features of the sensed image are then transformed to the reference coordinate system using an affine transformation. The transformed location relationship, scale ratio, and orientation difference between the features from the sensed and reference images are then used

5618

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 9, SEPTEMBER 2014

TABLE II O PTIMIZED PARAMETER VALUES FOR THE L OCAL M ATCHING D ISTANCE AND C IRCULAR B UFFER R ADIUS

to find precise matching points. The ratio of the first to the second distance for the extraction of the initial matching points was set at 0.7, and the RMSE threshold was set at 10 pixels for all cases. The scale ratio d¯s between matching-point pairs from the sensed and reference images was 1.846 for case 1, 1.890 for case 2, and 1.060 for case 3. These ratios are similar to the spatial resolution ratios of each case, demonstrating that the algorithm can determine initial parameter values without detailed spatial information about the images. The determined initial parameter values were supplied to the objective function O(D, r), which combines registration consistency with distribution quality, to be minimized using the simplex optimization method. The optimized values of the two parameters are shown in Table II. Matching points were extracted, and image registration was completed using these parameters. C. Results of Matching-Point Extraction and Correct-Match Rates By employing the optimized parameters, 169, 344, and 97 matching points were extracted in cases 1, 2, and 3, respectively. The matching points found by the SIFT method were also extracted for comparison. The ratio of the distances to the closest and second closest points was set at 0.7, which is equal to the value used in the proposed method. The SIFT method extracted 75, 91, and 34 matching points for cases 1, 2, and 3, respectively. These results demonstrate that the proposed method was able to extract more matches than the SIFT method. The original SIFT algorithm detects matching points using the ratio of the distances from all extracted features of the sensed image to their closest and second closest neighbors. However, the proposed method uses the same ratio of distances to only those features within the circular buffer of each feature. This technique increases the possibility of finding matches because the possible corresponding features are restricted to those within the buffer. In other words, the features may be matching points if there is no more than one feature similar to that from the sensed image within the buffer of the reference feature. This step can help to extract more matches than the original SIFT method, even though outlier elimination is included in our algorithm. Table III presents the extracted matching points and correctmatch rates, which represent the number of points that were extracted correctly compared with the total number of matching points obtained by the SIFT and proposed methods. Points

TABLE III C OMPARISON OF C ORRECT-M ATCH R ATES FOR THE SIFT AND P ROPOSED M ETHODS

extracted from building roofs and low mountains with large heights or height variations, as well as those extracted from different geometric locations, were considered false matches. Table III illustrates that the proposed method not only extracts more matching points than the SIFT method but also achieves higher correct-match rates for all cases. When we compare the sensors, we see a tendency toward lower correct-match rates when the KOMPSAT-2 image was included in the analysis. This result can be explained by the fact that the KOMPSAT-2 image has a small off-nadir angle of 1.66◦ , so there is little effect on building orientation differences. Furthermore, images with larger off-nadir angles and a large difference between the angles affect more the orientation differences between matching points extracted from objects with height variations that cause relief displacement. This greater sensitivity to orientation differences is demonstrated by the results for case 2, in which there is a small difference in the off-nadir angles in the QuickBird-2 and KOMPSAT-2 images; this case has the lowest correct-match rate. Fig. 6 presents the matching points extracted by the proposed method, with the reference image overlaid for each case. The points were generally well distributed over the entire image, except for points at mountains and the roofs of tall buildings. False matches mainly occurred for points extracted from the roofs of low buildings. However, false matches caused by differences in geometric location were effectively eliminated by the use of the circular buffer. D. Registration Results and Accuracy Assessment The triangulation is constructed from the extracted matching points to generate the transformation model. Fig. 7 presents the constructed triangulation results for all cases. In each triangulated region, the sensed image is transformed to the reference image’s corresponding triangulated region. For the region outside the convex hull of the points, the affine transformation coefficients that were estimated using the matching points on the convex hull were used for the transformation model. Fig. 8 presents the mosaic image generated by the combination of piecewise linear functions and an affine transformation. Bilinear interpolation is used for resampling the sensed images in all cases. Twenty checkpoint pairs, evenly distributed across the image, were manually selected for each case from the results of both the SIFT and proposed methods to verify registration accuracy. The image registration was performed by the affine transformation and global/local transformation, respectively. From this

HAN et al.: PARAMETER OPTIMIZATION FOR THE EXTRACTION OF MATCHING POINTS

Fig. 6.

Matching points extracted using the proposed method. (a) Case 1. (b) Case 2. (c) Case 3.

Fig. 7.

Triangular construction on the reference images. (a) Case 1. (b) Case 2. (c) Case 3.

Fig. 8.

Mosaic results of the proposed method. (a) Case 1. (b) Case 2. (c) Case 3.

evaluation, we can make two types of comparisons for each case. First, which one can extract the matched points more effectively between the SIFT and proposed methods? Second, which one can generate more reliable registration between the affine transformation and global/local transformation?

5619

The bias and standard deviation of the x and y coordinates of the checkpoints are calculated in Table IV, and the RMSEs for cases 1, 2, and 3 are presented in Table V. Comparing transformation models, the global/local transformation performs better than the affine transformation except in case 2 when performed

5620

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 9, SEPTEMBER 2014

TABLE IV B IAS AND S TANDARD D EVIATION OF THE C HECKPOINTS

TABLE V R EGISTRATION ACCURACY

by the SIFT method. Some points extracted by the SIFT method in case 2 tended to aggregate in specific regions and be extracted from buildings. For the piecewise linear functions, all matching points were used for a unique solution, not a least squares solution. If the points are subject to some geometric error, the region near the points will cause registration error by the piecewise linear function. This potential error is why the nonlinear functions do not always produce better registration results than the linear functions in terms of the precision and distribution of the extracted matching points. In comparing the matching points extracted by the proposed and SIFT methods, the proposed method has higher registration accuracy in all cases regardless of the transformation model. All cases that used the proposed method and the global/local transformation also had lower RMSEs. Cases 1 and 3 had lower RMSEs than case 2 because the matching points extracted from objects with height variations or shadows were effectively removed and the remaining matching points were used to construct a robust transformation model. Case 2, which presented a higher registration error due to the lower correct-match rate, also demonstrated a reliable RMSE value of less than 1.5 m. IV. C ONCLUSION In this paper, we have proposed an automatic image-to-image registration method for high-resolution multisensor images in urban areas using geometric location and the local properties of the features extracted from the SIFT method. The algorithm could be applied regardless of the sensor properties because the parameters were automatically optimized based on those properties. Suitable numbers of evenly distributed matching points, based on the characteristics of the study site, were extracted using an objective function composed of registration consistency and distribution quality. The proposed method achieved a higher correct-match rate than the SIFT method, and the registration results exhibited acceptable accuracy levels, with RMSEs of less than 1.5 m in all cases.

Generally, nonlinear functions are more important in airborne imagery than in satellite imagery because the nature of distortions is more severe at the distance from the sensor to the scene of the airborne acquisitions. The proposed method, however, is appropriate for high-resolution multisensor images, even though the technique uses the nonlinear function because the relief displacement from objects with height variations is dominant in such images. Results from the method may not be highly accurate if the study site is not composed of urban areas that include tall buildings or other objects with height variations. To evaluate the reliability of the proposed method, we will apply the algorithm to various study sites having nearnadir imagery, a large size, or nonurban areas. The issues involved in reducing the computation time or improving the efficiency will also be considered in our future work.

R EFERENCES [1] B. Zitová and J. Flusser, “Image registration methods: A survey,” Image Vis. Comput., vol. 21, no. 11, pp. 977–1000, Oct. 2003. [2] Y. Bentoutou, N. Taleb, K. Kpalma, and J. Ronsin, “An automatic image registration for applications in remote sensing,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 9, pp. 2127–2137, Sep. 2005. [3] R. A. Schowengerdt, Remote Sensing: Models and Methods for Image Processing. San Diego, CA, USA: Academic, 2006, ch. 3, p. 291. [4] R. E. Kennedy and W. B. Cohen, “Automated designation of tie-points for image-to-image coregistration,” Int. J. Remote Sens., vol. 24, no. 17, pp. 3467–3490, Sep. 2003. [5] V. Arévalo and J. González, “An experimental evaluation of non-rigid registration techniques on QuickBird satellite imagery,” Int. J. Remote Sens., vol. 29, no. 2, pp. 513–527, Jan. 2008. [6] S. Suri and P. Reinartz, “Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 2, pp. 939–949, Feb. 2010. [7] P. Reinartz, R. Muller, P. Schwind, S. Suri, and R. Bamler, “Orthorectification of VHR optical satellite data exploiting the geometric accuracy of TerraSAR-X data,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 1, pp. 124–132, Jan. 2011. [8] C. Huo, C. Pan, L. Huo, and Z. Zhou, “Multilevel SIFT matching for large-size VHR image registration,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 2, pp. 171–175, Mar. 2012. [9] D. Capel and A. Zisserman, “Computer vision applied to super resolution,” IEEE Signal Process. Mag., vol. 20, no. 3, pp. 75–86, May 2003. [10] L. Wang, Z. Niu, C. Wu, R. Xiie, and H. Huang, “A robust multisource image automatic registration system based on the SIFT descriptor,” Int. J. Remote Sens., vol. 33, no. 12, pp. 3850–3869, Jun. 2012. [11] G. Hong and Y. Zhang, “Wavelet-based image registration technique for high-resolution remote sensing images,” Comput. Geosci., vol. 34, no. 12, pp. 1708–1720, Dec. 2008. [12] V. Arévalo and J. González, “Improving piecewise linear registration of high-resolution satellite images through mesh optimization,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11, pp. 3792–3803, Nov. 2008. [13] L. Yu, D. Zhang, and E. J. Holden, “A fast and fully automatic registration approach based on point features for multi-source remote-sensing images,” Comput. Geosci., vol. 34, no. 7, pp. 838–848, Jul. 2008.

HAN et al.: PARAMETER OPTIMIZATION FOR THE EXTRACTION OF MATCHING POINTS

[14] Z. Xiong and Y. Zhang, “A novel interest-point-matching algorithm for high-resolution satellite images,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4189–4200, Dec. 2009. [15] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. [16] Y. Ke and R. Sukthankar, “PCA-SIFT: A more distinctive representation for local image descriptors,” in Proc. IEEE Comput. Soc. Conf. Vis. Pattern Recognit., Washington, DC, USA, Jun. 2004, pp. 506–513. [17] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005. [18] A. E. Abdel-Hakim and A. A. Farag, “CSIFT: A SIFT descriptor with color invariant characteristics,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2006, vol. 2, pp. 1978–1983. [19] R. Lemuz-López and M. Arias-Estrada, “Iterative closest SIFT formulation for robust feature matching,” in Proc. Adv. Visual Comput., LNCS, 2006, vol. 4292, pp. 502–513. [20] J. Morel and G. Yu, “ASIFT: A new framework for fully affine invariant image comparison,” SIAM J. Image Sci., vol. 2, no. 2, pp. 438–469, Apr. 2009. [21] Q. Li, G. Wang, J. Liu, and S. Chen, “Robust scale-invariant feature matching for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 287–291, Apr. 2009. [22] S. Suri, P. Schwind, P. Reinartz, and J. Uhl, “Combining mutual information and scale invariant feature transform for fast and robust multisensor SAR image registration,” in Proc. 75th Annu. ASPRS Conf., Baltimore, MD, USA, Mar. 8–13, 2009, pp. 795–806, CD-ROM. [23] P. Schwind, S. Suri, P. Reinartz, and A. Siebert, “Applicability of the SIFT operator to geometric SAR image registration,” Int. J. Remote Sens., vol. 31, no. 8, pp. 1959–1980, Mar. 2010. [24] B. Fan, C. Huo, C. Pan, and Q. Kong, “Registration of optical and SAR satellite images by exploring the spatial relationship of the improved SIFT,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 4, pp. 657–661, Jul. 2013. [25] Y. Han, Y. Byun, J. Choi, D. Han, and Y. Kim, “Automatic registration of high-resolution images using local properties of features,” Photogramm. Eng. Remote Sens., vol. 78, no. 3, pp. 211–221, Mar. 2012. [26] H. Gonçalves, L. Corte-Real, and A. Gonçalves, “Automatic image registration through image segmentation and SIFT,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 7, pp. 2589–2600, Jul. 2011. [27] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scaleinvariant feature matching for optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4516–4527, Nov. 2011. [28] Y. Huachao, Z. Shubi, and W. Yongbo, “Robust and precise registration of oblique images based on scale-invariant feature transformation algorithm,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 4, pp. 783–787, Jul. 2013. [29] D. Liu, P. Gong, M. Kelly, and Q. Guo, “Automatic registration of airborne images with complex local distortion,” Photogramm. Eng. Remote Sens., vol. 72, no. 9, pp. 1049–1059, Sep. 2006. [30] M. Holden, D. Hill, E. Denton, J. Jarosz, T. Cox, T. Rohlfing, J. Goodey, and D. Hawkes, “Voxel similarity measures for 3-D serial MR brain image registration,” IEEE Trans. Med. Imaging, vol. 19, no. 2, pp. 94–102, Feb. 2000. [31] H.-M. Chen, M. K. Arora, and P. K. Varshney, “Mutual information-based registration for remote sensing data,” Int. J. Remote Sens., vol. 24, no. 18, pp. 3701–3706, Sep. 2003. [32] Q. Zhu, B. Wu, and Z. Xu, “Seed point selection method for triangle constrained image matching propagation,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 2, pp. 207–211, Apr. 2006. [33] G. Yang, C. V. Stewart, M. Sofka, and C.-L. Tsai, “Registration of challenging image pairs: Initialization, estimation, and decision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 11, pp. 1973–1989, Nov. 2007. [34] J. A. Nelder and R. Mead, “A simplex method for function minimization,” Comput. J., vol. 7, no. 4, pp. 308–313, Jan. 1965. [35] A. Goshtasby, “Piecewise linear mapping functions for image registration,” Pattern Recognit., vol. 19, no. 6, pp. 459–466, Feb. 1986. [36] L. Zagorchev and A. Goshtasby, “A comparative study of transformation functions for nonrigid image registration,” IEEE Trans. Image Process., vol. 15, no. 3, pp. 529–538, Mar. 2006.

5621

Youkyung Han received the B.S. degree in civil, urban, and geosystem engineering and the M.S. and Ph.D. degrees in civil and environmental engineering from Seoul National University, Seoul, Korea, in 2007, 2009, and 2013, respectively. His major research interests include the image processing of remote-sensing data, image segmentation, image classification, and image registration between high-resolution multisensor data. He recently has got further interests on various applications using remote sensing and geographic information system data such as 3-D model generation, making or updating thematic maps, and disaster monitoring.

Jaewan Choi received the B.S. and M.S. degrees in civil, urban, and geosystem engineering and the Ph.D. degree in civil and environmental engineering from Seoul National University, Seoul, Korea, in 2004, 2006, and 2011, respectively. Since 2011, he has been an Assistant Professor with the School of Civil Engineering, Chungbuk National University, Cheongju, Korea. At present, he is an Editorial Board Member of the Journal of the Korean Society for Geospatial Information Systems. His major research interests are related to image fusion algorithms, including pansharpening, multisensor image fusion and data integration, airborne/satellite hyperspectral image processing, and change detection method using very high resolution imagery.

Younggi Byun received the M.S. and Ph.D. degrees in civil and environmental engineering from Seoul National University, Seoul, Korea, in 2004 and 2011, respectively. He is currently a Senior Researcher in the Satellite Information Research Center at the Korea Aerospace Research Institute, Daejeon, Korea. His major research interests include the image processing of remote-sensing data, image segmentation, change detection, and multisensor image fusion.

Yongil Kim (M’06) received the B.S. degree in urban engineering and the M.S. and Ph.D. degrees in remote sensing from Seoul National University, Seoul, Korea, in 1986, 1988, and 1991, respectively. He joined Seoul National University in 1993, where he is currently a Professor with the Department of Civil and Environmental Engineering. During the past 20 years, he has been a Project Leader for several large projects, such as the standardization of digital road map databases and the development of feature extraction algorithms for remote sensing. His major research interests include remote sensing, global positioning systems, and geographic information systems. Dr. Kim is currently a member of the Surveying Committee of the National Geographic Institute and is also the Director and an Editor of the Journal of the Korean Society for Geo-Spatial Information System, the Journal of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, and the Journal of the Korean Society of Remote Sensing.