Covariance intersection based image fusion ... - Semantic Scholar

Comment

Report 7 Downloads 301 Views

Information Sciences 180 (2010) 3434–3443

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Covariance intersection based image fusion technique with application to pansharpening in remote sensing Qing Guo a,b, Siyue Chen b, Henry Leung b, Shutian Liu a,* a b

Harbin Institute of Technology, Department of Physics, Harbin 150001, PR China Department of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada T2N 1N4

a r t i c l e

i n f o

Article history: Received 14 November 2009 Received in revised form 29 April 2010 Accepted 6 May 2010

Keywords: Covariance intersection Image fusion Expectation maximization Multi-spectral image Panchromatic image Pansharpening

a b s t r a c t Image fusion of multi-spectral images and panchromatic images has been widely applied to imaging sensors. Multi-spectral images are rich in spectral information whereas panchromatic images have relatively higher spatial resolution. In this paper, we consider the image fusion as an estimation problem, that is to estimate the ideal scene of multi-spectral images at the resolution of panchromatic images. We propose a method of combining the covariance intersection (CI) principle with the expectation maximization (EM) algorithm to develop a novel image fusion approach. In contrast to other fusion methods, the proposed scheme takes cross-correlation among data sources into account, and thus provides consistent and accurate estimates through convex combinations. Since the covariance information is usually unknown in practice, the EM method is employed to provide a maximum likelihood estimate (MLE) of the covariance matrix. Real multi-spectral and panchromatic images are used to evaluate the effectiveness of the proposed EM–CI method. The proposed algorithm is found to preserve both the spectral information of the multi-spectral image and the high spatial resolution information of the panchromatic image more effectively than the conventional image fusion techniques. Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction With the rapid growth of the internet and other electronic sources of information, the problem of coherent merging information from multiple sources has become an important issue [21,30,31]. As one type of information processing technique, the image fusion is widely used in many practical applications. Images captured by multiple sensors often contain complementary and redundant information. An effective fusion process will result in an improved image, which contains more complete information and should be more suitable for human visual perception and object recognition [2,20]. In this study, we consider fusion of multi-spectral (MS) images and panchromatic (Pan) images, which are widely used in remote sensing. This fusion scheme is often referred as pansharpening [13]. The MS images, such as acquired by IKONOS and QuickBird satellites, usually have four bands of spectral information (i.e., red (R), green (G), blue (B), and near infrared (NIR)). Meanwhile, the Pan image has good spatial resolution. Therefore, the goal of fusing these images is to generate a high-resolution MS image, which can be used more effectively for applications, such as land classiﬁcation and road detection. Image fusion can be classiﬁed as pixel-level, feature-level and symbol-level [3]. Fusion of MS and Pan images at pixel-level is traditionally handled by the intensity-hue-saturation (IHS) transform methods [4,6], the Brovey transform methods [7], the principal component analysis (PCA) methods [6,5,24], the highpass ﬁltering (HPF) method [25] and by the wavelet

* Corresponding author. Tel./fax: +86 451 86418042. E-mail address: [email protected] (S. Liu). 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.05.010

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

3435

transform (WT) method [9,14,23,33]. The IHS transform method ﬁrst transforms three MS bands from RGB color space to the spatial (I) and spectral (H, S) information space. Fusion is then performed by replacing the intensity components in the IHS space with the Pan image. The IHS method can preserve the spatial resolution of the Pan image, however, according to Prasad et al. [22], it severely distorts the spectral information of the MS image. The method based on Brovey transform is a simple color normalized method. It also usually introduces distortion to the spectral information. The same problem of spectral distortion is also found in the PCA method because fusion is performed by replacing the ﬁrst principal component of the MS image with the Pan image, which results in a signiﬁcant loss of spectral information. The HPF method simply adds the high-frequency components of the Pan image to the MS image. However, choosing a proper ﬁlter size is difﬁcult in this method. The fusion method based on WT is also popular. Zhou et al. [33] employ WT to fuse landsat TM and SPOT Pan images. The SPOT image and each spectral band of the TM images are decomposed into an orthogonal wavelet representation at a given resolution, which consists of an approximation image and a set of spatially-oriented detailed images. Band by band, the approximation image from each TM band is combined with the detailed image from SPOT. Then the inverse WT is performed to obtain the fused image. The spectral and spatial quality of the fused image based on this method is better than those based on the IHS transform, PCA, and the Brovey transform [33]. In this paper, our motivation is to generate a high resolution MS image with more information and better quality than those provided by individual sensor image alone. The image fusion problem is considered as a statistical estimation problem. More speciﬁcally, the objective is to estimate the ideal high-resolution MS image from the observed MS image and the observed Pan image. To this end, covariance intersection (CI), which has been widely used in radar track fusion [28], is employed. CI can produce consistent estimate for any degree of cross-correlation between the input sources through convex combination [12,18]. In addition, CI enforces estimation consistency by means of convex combination of the inverse of covariance matrices. When CI is applied to image fusion, it treats the ideal high-resolution MS image as a linear combination of the ideal MS and Pan images and the fused coefﬁcients are selected through the convex combination of weight optimization. However, many image fusion methods, such as conventional WT, choose the fused coefﬁcients simply by selecting larger or by replacement [32,19,15], without using any rigorous procedure. To use the convex combination in CI, the covariance information of the estimation error is required. But for most imaging applications, this information is not available. Here, we use the expectation maximization (EM) to give a maximum likelihood estimate (MLE) of the covariance information. Formalized by Dempster et al. [8], the EM algorithm is an iterative procedure that estimates both the parameters and the missing or unobservable data during an iteration. The approach ﬁrst computes the approximation to the expectation of the log-likelihood function of the complete data conditioned on the current parameter estimate, which is referred as the expectation (E-step). In this step, the current incomplete data estimate is calculated. Next, a new parameter estimate is computed by ﬁnding the value of the parameter that maximizes the function found in the E-step. This is called the maximization step (M-step). In this study, the ideal MS and Pan images are treated as the missing data, and the covariances of them are treated as the parameters to be estimated by EM. The remainder of this paper is organized as follows. The problem of fusing the MS image and the Pan image is formulated in Section 2. This formulation justiﬁes the capability of CI to give a consistent and unbiased estimation for any degree of the cross-correlation, and can be used to optimally fuse the images. In Section 3, the EM is developed to help performing CI and the fast performing CI with weight in a suboptimal linear way to give the consistent estimation through convex combination is presented. Performance evaluation of the EM–CI method using real image sensory data is accomplished in Section 4 and shows that the EM–CI method preserves the spectral information very well. Section 5 concludes this paper. 2. Problem formulation ~1 and x ~ 2 , respectively, which are the vectors comThe observed MS image and the observed Pan image are denoted as x ~ 1 is used for the derivation example, the other posed of pixel values. To simplify the derivation, one band of the MS image as x bands have the same expressions. Due to sensor distortion and noise in the capture process of MS and Pan images, the observed images can be modeled as ‘‘linear observation plus Gaussian noise” [10,16]. That is,

~i ¼ Hi xi þ ni ; x

i ¼ 1; 2;

ð1Þ

where xi denotes the vector of pixel values from an ideal image, Hi represents the linear observation operator which is a known observation matrix. Typical examples of the linear observation operator Hi include optical blur, motion blur, tomob x x which is deﬁned in the next graphic projections. ni is assumed to be zero-mean white Gaussian noise with covariance P i i paragraph. ^i can From (1), it is noted that neither the ideal image xi nor its statistics are available. However, the consistent estimate x be obtained. That is, if we deﬁne

b x x ¼ ðx ~i x ^i Þðx ~i x ^ i ÞT ; P i i

i ¼ 1; 2:

ð2Þ

and

^i Þðxi x ^i ÞT ; Pxi xi ¼ E½ðxi x

i ¼ 1; 2;

ð3Þ

3436

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

we have [11]

b x x P Px x ; P i i i i

i ¼ 1; 2:

ð4Þ

The inequality is in the sense of matrix positive deﬁniteness, i.e., A > B if and only if A B is positive deﬁnite. It should be ^ x x represents the estimation error covariance which is considered to be equivalent to the covariance noted that the matrix P i i matrix of the noise ni. We further deﬁne the cross-correlation as

^1 Þðx2 x ^2 ÞT : Px1 x2 ¼ E½ðx1 x

ð5Þ

^ that combines x ^ 1 and x ^ 2 , i.e., Our objective is to construct a linear, unbiased estimate x

^ ¼ M1 x ^ 1 þ M2 x ^2 ; x

ð6Þ

so that

^ ¼ 0; E½x x

ð7Þ

provided that

M1 þ M2 ¼ I:

ð8Þ

^ xx , for Pxx ðPxx ¼ E½ðx x ^Þðx x ^ ÞT Þ, and to ﬁnd a pair of M1 and M2 In addition, we want to determine a consistent estimate P ^ xx is optimal in the sense of minimal trace. such that the upper bound P According to the deﬁnition of Pxx, we have

Pxx ¼ M1 Px1 x1 MT1 þ M1 Px1 x2 MT2 þ M2 Px2 x1 MT1 þ M2 Px2 x2 MT2 :

ð9Þ

Similarly,

b xx ¼ M1 P b x x MT þ M1 P b x x MT þ M2 P b x x MT þ M2 P b x x MT : P 1 2 1 2 1 1 1 2 2 1 2 2

ð10Þ

If Px1 x2 ¼ 0, for any given M1 and M2, the estimate

b xx ¼ M1 P b x x MT þ M2 P b x x MT P 1 2 1 1 2 2

ð11Þ

b xx P Pxx Þ as a direct consequence of (4). The trace of the above P b xx is then minimized by will be consistent ð P

b xx ¼ ð P b 1 þ P b 1 Þ1 ; P x1 x1 x2 x2 b 1 ¼ P b xx P b x x ðP bx x þ P b x x Þ1 ; M1 ¼ P x1 x1 2 2 1 1 2 2

ð12Þ

b 1 ¼ P b xx P b x x ðP bx x þ P b x x Þ1 : M2 ¼ P x2 x2 1 1 1 1 2 2 Since the cross-correlation between x1 and x2 is generally non-zero and unknown, CI is employed here to deal with the situation when Px1 x2 –0. 3. EM–CI for image fusion ~i By virtue of statistical property of noise ni in (1), we can compute the conditional probability density function (PDF) of x as [1]

b 1 ½x b x x j1=2 exp 1 ½x ~i Hi xi T P ~i jxi Þ ¼ ð2pÞS=2 j P ~ pðx x i x i i Hi x i ; i i 2

i ¼ 1; 2;

ð13Þ

~i . Taking the logarithm of (13), the logarithmic likelihood function is where S denotes the number of pixels within the vector x thus

~i Þ ¼ Lðxi jx

n o 1 b 1 ½x b x x j S logð2pÞ 1 ½x ~ ~i Hi xi T P log j P x i x i i Hi x i ; i i 2 2 2

i ¼ 1; 2:

ð14Þ

b x x is the diagonal matrix under the white Gaussian noise assumption. The E-step of EM computes the In factor analysis, P i i expected log-likelihood function, i.e.,

~i Þjx ~i ; EP ¼ E½Lðxi jx

i ¼ 1; 2:

ð15Þ

b x x are obtained by maximizing (15). More speciﬁcally, the partial derivatives of the expected log-like^i and P In the M-step, x i i lihood function are set as zero. With the constant term ignored, we have,

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

@EP b 1 HT Hi x b 1 HT ¼ 0; i ¼ 1; 2; ^i P ~i P ¼x xi xi i xi xi i @xi h i 1 @EP 1 b x x ¼ 0; ~i x ~Ti 2Hi x ^i x ~Ti þ Hi x ^i x ^Ti HTi þ P ¼ x bx x 2 2 ii @P

3437

ð16Þ i ¼ 1; 2:

ð17Þ

i i

Solving these equations generate

b 1 Hi Þ1 HT P b 1 x ^i ¼ ðHTi P ~ x i xi xi xi xi i ;

i ¼ 1; 2;

ð18Þ

and

bx x ¼ x ~i x ~Ti 2Hi x ^i x ~Ti þ Hi x ^i x ^Ti HTi ; P i i

i ¼ 1; 2:

ð19Þ

b x x , i = 1,2, we can combine x b xx using CI. For a covariance matrix P, the covariance ellipse is the locus ^ i and P ^ and P Based on x i i of the points

ÞT P1 ðx x Þ ¼ gg; BP ðgÞ ¼ fx : ðx x

ð20Þ

is the mean of x. The covariance ellipse is a convenient way of visualizing the relative size of where g is a constant and x b xx always covariance matrices. If P1 < P2, BP1 ðgÞ BP2 ðgÞ. From the geometric interpretation of (10), the covariance ellipse of P b b b lies within the intersection of the covariance ellipses of P x1 x1 and P x2 x2 , for any possible choice of P x1 x2 , as shown in Fig. 1. b xx which encloses the intersection region must be consistent even if there is no knowledge Hence, a method which ﬁnds a P about Px1 x2 [26,27]. When Px1 x2 is unknown, in order to obtain an upper bound of (9), the inequality

(

E

pﬃﬃﬃ

1

cM1 ðx1 x^1 Þ pﬃﬃﬃ M2 ðx2 x^2 Þ c

pﬃﬃﬃ

1

T )

cM1 ðx1 x^1 Þ pﬃﬃﬃ M2 ðx2 x^2 Þ c

P0

ð21Þ

is utilized, where c > 0 is a scalar. It follows that

1

cM1 Px1 x1 MT1 þ M2 Px2 x2 MT2 P M1 Px1 x2 MT2 þ M2 Px2 x1 MT1 : c

ð22Þ

b xx for Pxx can be obtained as From (4) and (22), a consistent estimate P

b xx ¼ ð1 þ cÞM1 P b x x MT þ 1 þ 1 M2 P b x x MT : P 1 2 1 1 2 2

c

b xx . Note that The value of c is then chosen to minimize the trace of P

b xx always lies inside the intersection. Fig. 1. Covariance P

ð23Þ

3438

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

b x x MT Þ þ TrðM2 P b x x MT Þ þ cTrðM1 P b x x MT Þ þ 1 TrðM2 P b x x MT Þ P TrðM1 P b x x MT Þ þ TrðM2 P b x x MT Þ b xx Þ ¼ TrðM1 P Trð P 1 2 1 2 1 2 1 1 2 2 1 1 2 2 1 1 2 2

c

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ2 b x x MT ÞTrðM2 P b x x MT Þ ¼ b x x MT Þ þ TrðM2 P b x x MT Þ ; þ 2 TrðM1 P TrðM1 P 1 2 1 2 1 1 2 2 1 1 2 2

ð24Þ

where the equality holds when

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u b x x MT Þ uTrðM2 P 2 2 2 : c¼t b x x MT Þ TrðM1 P 1 1

ð25Þ

1

b xx P b 1 ; M2 ¼ x2 P b xx P b x x ; c ¼ x2 , (23) can be further written as For a particular choice, i.e., if M1 ¼ x1 P x1 x1 2 2 x1

b xx ¼ x1 P b xx P b 1 P b b b 1 b b 1 b 1 b b P x1 x1 xx þ x2 P xx P x2 x2 P xx ¼ P xx ðx1 P x1 x1 þ x2 P x2 x2 Þ P xx ;

ð26Þ

which is satisﬁed by

b 1 ¼ x1 P b 1 þ x2 P b 1 : P xx x1 x1 x2 x2

ð27Þ

Meanwhile, (6) becomes

b xx P b 1 x b b 1 ^ ^ ¼ x1 P ^ x x1 x1 1 þ x2 P xx P x2 x2 x2

ð28Þ

with nonnegative coefﬁcients x1 and x2 obeying

x1 þ x2 ¼ 1:

ð29Þ

Here, (27) and (28) as the CI equations are obtained. b xx . The minimizing process requires The weighting coefﬁcients, x1 and x2, are usually chosen to minimize the trace of P optimization of a nonlinear cost function which is convex with respect to x1 and x2 [18]. The minimized trace is obtained by an exhaustive search of x1 and x2 within the range of [0, 1]. This process has a high computational burden. Therefore a subb x x , i = 1, 2, prooptimal non-iterative linear algorithm is necessary. It is well known that the trace of the covariance matrix P i i b b ^ ^ vides the uncertainty measure of the estimation on xi , i = 1, 2. If Trð P x1 x1 Þ Trð P x2 x2 Þ, it infers that x2 has a much larger b 1 x b xx P b b ^1 . We thus have x ^ ^P estimation error than x x1 x1 1 , which implies x1 1 and x2 0. Similarly, if Trð P x1 x1 Þ ¼ Trð P x2 x2 Þ, 1 b b 1 ^ 1 b xx P b 1 x ^ 12 P ^ it is expected that x x1 x1 1 þ 2 P xx P x2 x2 x2 . In other words, x1 ¼ x2 ¼ 2. Considering these, an additional linear constraint

x1 Trð Pb x1 x1 Þ x2 Trð Pb x2 x2 Þ ¼ 0

ð30Þ

is proposed. Substituting (29) into (30), it is obtained that

x1 Trð Pb x1 x1 Þ ð1 x1 ÞTrð Pb x2 x2 Þ ¼ 0; b x x Þ x2 Trð P b x x Þ ¼ 0: ð1 x2 ÞTrð P 1 1 2 2

ð31Þ

b x x Þ þ Trð P b x x Þ > 0, x1 and x2 can be computed by When Trð P 1 1 2 2

bx x Þ Trð P 2 2 ; bx x Þ b Trð P x1 x1 Þ þ Trð P 2 2 bx x Þ Trð P 1 1 : x2 ¼ bx x Þ b Trð P x1 x1 Þ þ Trð P 2 2

x1 ¼

ð32Þ

b x x Þ P 0 ensure that 0 6 xi 6 1, i = 1, 2. It is noted that Trð P i i 4. Performance evaluation We treat image fusion as an estimation problem in this paper. The observed images (MS and Pan images) are modeled as a linear transformation of the ideal images plus Gaussian noise. The goal is to recover the ideal images and combine them through convex combination of the inverse of covariance matrices. And we use the trace of the covariance matrix of the estimation error as the metric of image fusion performance. The expression of this metric cannot be derived, because there is no way to get the cross-correlation between x1 and x2. But its upper bound can be obtained due to the CI theory. To minimize b x ;x and the unbiased estimate x ^ i are estimated using the EM algorithm. The weights the upper bound, the covariance matrix P i i w1 and w2 for the observed images, respectively, are also obtained by an exhaustive search. Furthermore, to reduce the computing complexity required to ﬁnd w1 and w2, we also derived a suboptimal linear algorithm. In this paper, QuickBird MS and Pan images are used to evaluate the performance of the proposed EM–CI method. The spatial resolution of the MS image is 2.8 m (meters) while it is 0.7 m for the Pan image. The low-resolution MS image is

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

3439

re-sampled using cubic interpolation to the same size as the high-resolution Pan image, as shown in Figs. 2(a) and 3(a). The Pan images are shown in Figs. 2(b) and 3(b). Our objective is to increase the spatial resolution of MS image by injecting spatial details of the Pan image into the MS image, while preserving spectral information of the MS image as much as possible. It is known that the spatial detailed information of Pan image is mostly carried by its high-frequency components, while the spectral information of MS image is

Fig. 2. Experiments on the ﬁrst pair of QuickBird images: (a) the re-sampled multi-spectral image; (b) the panchromatic image; (c) the fused image by EM– CI; (d) the fused image by WT; (e) the fused image by PCA.

3440

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

Fig. 3. Experiments on the second pair of QuickBird images: (a) the re-sampled multi-spectral image; (b) the panchromatic image; (c) the fused image by EM–CI; (d) the fused image by WT; (e) the fused image by PCA.

mostly carried by its low-frequency components. If the high-frequency components of the MS image are simply substituted by the high-frequency components of the Pan image [33,15], the spatial resolution is improved but with the loss of spectral information from the high-frequency components of MS image. To avoid this, WT is ﬁrst applied to the MS and Pan images to extract the spatial detailed information and spectral information, respectively. Then keeping the spectral information of MS image untouched, the spatial details of the MS and Pan images are fused using EM–CI. After the inverse WT, the fused images can be obtained, which are shown in Figs. 2(c) and 3(c). In order to further minimize the spectral distortion, histogram

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

3441

matching [17] is applied to the Pan image to make its brightness and contrast best match that of each MS image band by band. The fusion process of Pan details is applied with a ﬁxed decomposition scale which is given by the spatial resolution of Pan and MS images (log2 of scale ratio). The performance of EM–CI is compared to two conventional methods. One is WT method. WT is ﬁrst applied to each MS band and Pan images. Band by band, the approximations are kept untouched and the spatial details are replaced by corresponding coefﬁcients of the Pan image. Then, the inverse WT is performed to obtain the fused images, which are shown in Figs. 2(d) and 3(d). Another method is based on PCA. The MS image is transformed with PCA, and the principle components are obtained. The ﬁrst principle component of the MS image is replaced with the histogram-matched Pan image. The fused images are obtained when the new ﬁrst principle component and the other principle components are transformed with the inverse PCA transform, and are shown in Figs. 2(e) and 3(e). In these ﬁgures, WT and EM–CI methods utilize the same WT decomposition scale. From these ﬁgures, it is observed that the fused images using EM–CI preserve most of the spectral information of MS images and improve the spatial resolution simultaneously. As for the fused images by WT, the spectral performance is a little inferior to that of EM–CI, and the spatial performance is improved. For the PCA results, they have serious spectral distortion, especially in the white areas of the fused image, also with the improved spatial details. Besides the subjective evaluation, the quantitative assessment of fusion performance is also necessary. Two sets of criteria are used in our study to evaluate the spectral and spatial performance, respectively. The spectral quality of the fused image can be measured by the correlation coefﬁcient (CC) and the spectral discrepancy (SPD). CC describes the correlation degree between two images, which provides a similarity measure of the spectral information between the fused image and the multi-spectral image. For each band, the deﬁnition of CC can be written as

PM PN m¼1 n¼1 ½MSFðm; nÞ l½MSRðm; nÞ lMSR ﬃ; CC ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ PM PN PM PN m¼1 n¼1 ½MSFðm; nÞ l m¼1 n¼1 ½MSRðm; nÞ lMSR

ð33Þ

where MSF(m, n) and MSR(m, n), respectively, denote the pixel value of the fused MS image and the reference image at the position (m, n), M N is the image size, and l and lMSR are the intensity averages of MSF and MSR, respectively. Because we do not have any MS image at 0.7 m to compare with, following the Wald protocol [29], the original MS image at the resolution of 2.8 m is used as MSR. The fused image MSF is obtained by fusing the MS and Pan images at the degraded scale. More speciﬁcally, the original MS image at the resolution of 2.8 m is degraded to the one at the resolution of 11.2 m. Meanwhile, the resolution of the original Pan image is also reduced from 0.7 m to 2.8 m. Both the degraded MS image and the degraded Pan image are then fused to generate MSF at the resolution of 2.8 m. And CC is computed using MSF and MSR at a degraded scale, i.e., 2.8 m. The desirable value of CC is 1. At each band, the SPD function is computed by

P SPD ¼

m;n jMSFðm; nÞ

MSRðm; nÞj : MN

ð34Þ

A small SPD indicates a good spectral performance. These indices deﬁned above only evaluate the spectral difference between corresponding bands of the original and fused images. In order to evaluate the global spectral quality, the following index is used. The ERGAS coefﬁcient is calculated as the global spectral quality description on all the fused multi-spectral bands. It is deﬁned as

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u k¼K 2 dh u 1 X RMSEðkÞ ; ERGAS ¼ 100 t dl K k¼1 lðkÞ

ð35Þ

where dh/dl is the ratio between resolutions of the high-resolution Pan image and the low-resolution MS image (for QuickBird data, dh/dl = 1/4), l(k) is the average value of the kth band and K is the number of bands. The RMSE(k) of each spectral band is computed by

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u M N XX 1 u t RMSEðkÞ ¼ ½ MSFk ðm; nÞ MSRk ðm; nÞ2 : MN m¼1 n¼1

ð36Þ

Lower values of the ERGAS coefﬁcient imply higher spectral quality for the fussed image. The desirable value of ERGAS is therefore 0. For the spatial quality evaluation, we use the average gradient (AG) index. AG describes the changing feature of image texture and the detailed information. Larger values of the AG index correspond to higher spatial resolution. The AG index of the fused images at each band can be computed by

AG ¼

1 MN

M X m¼1

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ uh i h i2 u @MSFðm;nÞ 2 þ @MSFðm;nÞ @m @n : 2 n¼1

N t X

ð37Þ

3442

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

Table 1 Objective evaluation of fusion performance using the ﬁrst pair of QuickBird images. Fusion method

Band

Spectral quality CC

EM–CI

WT

PCA

Spatial quality SPD

ERGAS

AG

R G B

0.9801 0.9770 0.9799

6.7148 6.2489 6.8925

2.6290

25.3757 20.9784 25.7214

R G B

0.9489 0.9418 0.9474

11.1940 10.1281 11.5785

4.1885

28.1817 23.6103 28.9022

R G B

0.8972 0.8721 0.8733

13.7137 12.5861 15.1401

6.2148

25.3470 23.4976 26.0551

ERGAS

AG

Table 2 Objective evaluation of fusion performance using the second pair of QuickBird images. Fusion method

Band

Spectral quality CC

EM–CI

WT

PCA

Spatial quality SPD

R G B

0.9775 0.9762 0.9792

6.7511 6.7138 6.9887

3.4483

23.5711 21.0510 24.1559

R G B

0.9463 0.9428 0.9478

10.8203 10.5158 11.3393

5.3411

26.3574 23.3114 27.4033

R G B

0.9117 0.9226 0.8977

13.2300 11.9986 14.7983

6.7806

24.6685 23.7542 25.9798

The results of these objective evaluations on the two pairs of testing images are listed in Tables 1 and 2. From these values, it is obvious that the spectral distortion introduced by EM–CI is less than that of WT and PCA for both the local and global evaluation of spectral quality. For the spatial performance, EM–CI has a slightly degraded performance comparing with WT and PCA. As human visual system is more sensitive to the spatial resolution than to the spectral one, results of EM–CI approach are not seem to be signiﬁcantly better than WT approach. The WT method preserves the detailed spatial information through the multi-scale decomposition and through the complete replacement by the Pan image at each discrimination scale. However, due to this replacing process, the partial spectral information still contained in the high-frequency components of MS image is lost. For PCA, the simple replacement results in the serious loss of spectral information, because the ﬁrst principle component of the MS image, which contains much spectral information, is completely discarded. In contrast, EM–CI utilizes all the information from the two source images, while keeping the low-frequency components of MS image untouched. In addition, EM–CI uses convex combination and weight optimization to fuse two images in order to minimize the spectral distortion. This guarantees that EM–CI does not cause signiﬁcant spectral distortion in fusion. These two reasons explain the superior spectral performance of EM–CI over WT and PCA. Meanwhile, EM–CI optimally injects spatial details of the Pan image into the MS image. Thus, the detailed information from both images is preserved. Overall speaking, EM–CI is capable of enhancing the spatial quality of the MS image while preserving the spectral content to a greater extent. Comparing with WT and PCA methods, the proposed EM–CI method preserves more signiﬁcant spectral information at the cost of slightly lower improvement on spatial quality. 5. Conclusions Covariance intersection, which combines multiple data sources through convex combinations for any degree of cross-correlation, is one of the most popular information fusion approach. In this paper, we propose a novel EM–CI approach for fusion of MS and Pan images. More speciﬁcally, the ideal MS and Pan images are estimated by EM along with the covariance matrices of the estimation error. Then, CI is applied to combine the two images and provide a consistent estimate of the high-resolution MS image. The proposed method can accurately preserve both the spectral information and the high-resolution spatial information. Its effectiveness is demonstrated with the real remote sensing images. Acknowledgements This work was supported by the National Natural Science Foundation of China under Grant Nos. 10674038 and 10974039, and jointly supported by the State Scholarship of the China Scholarship Council for study abroad (Grant No. [2007]3020). Ms.

Q. Guo et al. / Information Sciences 180 (2010) 3434–3443

3443

Qing Guo thanks Rob Edwards for his help with the English wording of the manuscript. The authors would like to thank the anonymous reviewers for their comments and suggestions. References [1] J.A. Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, Technical Report, University of California, Berkeley, ICSI-TR-97-021, 1998. [2] F. Bovolo, L. Bruzzone, L. Capobianco, A. Garzelli, S. Marchesi, F. Nencini, Analysis of the effects of pansharpening in change detection on VHR images, IEEE Geoscience and Remote Sensing Letters 7 (1) (2010) 53–57. [3] D.M. Bulanona, T.F. Burksa, V. Alchanatisb, Image fusion of visible and thermal images for fruit detection, Biosystems Engineering 103 (2009) 12–22. [4] W.J. Carper, T.M. Lillesand, R.W. Kiefer, The use of intensity-hue-saturation transformations for merging SPOT panchromatic and multispectral image data, Photogrammetric Engineering and Remote Sensing 56 (1990) 459–467. [5] P.S. Chavez, A.Y. Kwarteng, Extracting spectral contrast in Landsat thematic mapper image data using selective component analysis, Photogrammetric Engineering and Remote Sensing 55 (1989) 339–348. [6] P.S. Chavez, S.C. Sildes, J.A. Anderson, Comparison of three different methods to merge multiresolution and multispectral data: Landsat TM and SPOT panchromatic, Photogrammetric Engineering and Remote Sensing 57 (1991) 295–303. [7] D.L. Civco, Y. Wang, J.A. Silander, Characterizing forest ecosystems in Connecticut by integrating Landsat TM and SPOT panchromatic data, Proceedings of the ACSM/ASPRS Annual Convention, Charlotte, NC, vol. 2, ASPRS, Reston, VA, 1995, pp. 216–224. [8] P. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B 39 (1977) 1–38. [9] B.G. Duport, The use of multiresolution analysis and wavelet transform for merging Spot panchromatic and multispectral image Data, Photogrammetric Engineering and Remote Sensing 62 (1996) 1057–1066. [10] M.A.T. Figueiredo, R.D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing 12 (8) (2003) 906–916. [11] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, 1970. [12] S.J. Julier, J.K. Uhlmann, A non-divergent estimation algorithm in the presence of unknown correlations, Proceedings of the American Control Conference 4 (1997) 2369–2373. [13] M.M. Khan, J. Chanussot, L. Alparone, Hyperspectral pansharpening using QNR optimization constraint, in: WHISPERS Conference, 2009. [14] H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform, Graphical Models and Image Processing 57 (1995) 235–245. [15] S. Li, J.T. Kwok, Y. Wang, Using the discrete wavelet frame transform to merge Landsat TM and SPOT panchromatic images, Information Fusion 3 (2002) 17–23. [16] J.S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall Press, New Jersey, USA, 1990. [17] J. Morovic, J. Shaw, P.L. Sun, A fast, non-iterative and exact histogram matching algorithm, Pattern Recognition Letters 23 (2002) 127–135. [18] W. Niehsen, Information fusion based on fast covariance intersection ﬁltering, in: IEEE Proceedings of the Fifth International Conference on Information Fusion, vol. 2, 2002, pp. 901–904. [19] J. Núñez, X. Otazu, O. Fors, A. Prades, V. Palà, R. Arbiol, Multiresolution based image fusion with additive wavelet decomposition, IEEE Transactions on Geoscience and Remote Sensing 37 (1999) 1204–1211. [20] C. Pohl, J.L. Van Genderen, Multisensor image fusion in remote sensing: concepts, methods, and application, International Journal of Remote Sensing 19 (1998) 823–854. [21] L.I. Perlovsky, Cognitive high level information fusion, Information Sciences 177 (2007) 2099–2118. [22] N. Prasad, S. Saran, S.P.S. Kushwaha, P.S. Roy, Evaluation of various image fusion techniques and imaging scales for forest features interpretation, Current Science 81 (2001) 1218–1224. [23] P. Scheunders, S.D. Baeker, Fusion and merging of multispectral images using multiscale fundamental forms, Journal of the Optical Society of America A 18 (2001) 2468–2477. [24] V.P. Shah, N.H. Younan, R.L. King, An efﬁcient pan-sharpening method via a combined adaptive PCA approach and contourlets, IEEE Transactions on Geoscience and Remote Sensing 46 (5) (2008) 1323–1335. [25] V.K. Shettigara, A generalized component substitution technique for spatial enhancement of multispectral images using a higher resolution data set, Photogrammetric Engineering and Remote Sensing 58 (1992) 561–567. [26] J.K. Uhlmann, Dynamic map building and localization: new theoretical foundations, PhD Thesis, University of Oxford, 1995. [27] J. Uhlmann, General data fusion for estimates with unknown cross covariances, in: Proceedings of the SPIE Aerosense Conference, 1996, pp. 165–173. [28] N.G. Wah, Y. Rong, Comparison of decentralized tracking algorithms, in: Information Fusion, Sixth International Conference, 2003, pp. 107–113. [29] L. Wald, T. Ranchin, M. Mangolini, Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images, Photogrammetric Engineering and Remote Sensing 63 (6) (1997) 691–699. [30] R.R. Yager, A framework for multi-source data fusion, Information Sciences 163 (2004) 175–200. [31] G. Yang, Y. Lin, P. Bhattacharya, A driver fatigue recognition model based on information fusion and dynamic Bayesian network, Information Sciences 180 (2010) 1942–1954. [32] D.A. Yocky, Multiresolution wavelet decomposition image merger of Landsat Thematic Mapper and SPOT panchromatic data, Photogrammetric Engineering and Remote Sensing 62 (1996) 1067–1074. [33] J. Zhou, D.L. Civco, J.A. Silander, A wavelet transform method to merge Landsat TM and SPOT panchromatic data, International Journal of Remote Sensing 19 (1998) 743–757.

Recommend Documents

Generalized Covariance Intersection based on Noise ... - ISAS

Using covariance intersection for SLAM

Maximum Likelihood Based Noise Covariance ... - Semantic Scholar

Covariance Intersection in Nonlinear Estimation based on ... - KIT ISAS

Image Fusion Based on Fast and Adaptive ... - Semantic Scholar