Information Sciences 180 (2010) 3434–3443
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Covariance intersection based image fusion technique with application to pansharpening in remote sensing Qing Guo a,b, Siyue Chen b, Henry Leung b, Shutian Liu a,* a b
Harbin Institute of Technology, Department of Physics, Harbin 150001, PR China Department of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada T2N 1N4
a r t i c l e
i n f o
Article history: Received 14 November 2009 Received in revised form 29 April 2010 Accepted 6 May 2010
Keywords: Covariance intersection Image fusion Expectation maximization Multi-spectral image Panchromatic image Pansharpening
a b s t r a c t Image fusion of multi-spectral images and panchromatic images has been widely applied to imaging sensors. Multi-spectral images are rich in spectral information whereas panchromatic images have relatively higher spatial resolution. In this paper, we consider the image fusion as an estimation problem, that is to estimate the ideal scene of multi-spectral images at the resolution of panchromatic images. We propose a method of combining the covariance intersection (CI) principle with the expectation maximization (EM) algorithm to develop a novel image fusion approach. In contrast to other fusion methods, the proposed scheme takes cross-correlation among data sources into account, and thus provides consistent and accurate estimates through convex combinations. Since the covariance information is usually unknown in practice, the EM method is employed to provide a maximum likelihood estimate (MLE) of the covariance matrix. Real multi-spectral and panchromatic images are used to evaluate the effectiveness of the proposed EM–CI method. The proposed algorithm is found to preserve both the spectral information of the multi-spectral image and the high spatial resolution information of the panchromatic image more effectively than the conventional image fusion techniques. Ó 2010 Elsevier Inc. All rights reserved.
1. Introduction With the rapid growth of the internet and other electronic sources of information, the problem of coherent merging information from multiple sources has become an important issue [21,30,31]. As one type of information processing technique, the image fusion is widely used in many practical applications. Images captured by multiple sensors often contain complementary and redundant information. An effective fusion process will result in an improved image, which contains more complete information and should be more suitable for human visual perception and object recognition [2,20]. In this study, we consider fusion of multi-spectral (MS) images and panchromatic (Pan) images, which are widely used in remote sensing. This fusion scheme is often referred as pansharpening [13]. The MS images, such as acquired by IKONOS and QuickBird satellites, usually have four bands of spectral information (i.e., red (R), green (G), blue (B), and near infrared (NIR)). Meanwhile, the Pan image has good spatial resolution. Therefore, the goal of fusing these images is to generate a high-resolution MS image, which can be used more effectively for applications, such as land classification and road detection. Image fusion can be classified as pixel-level, feature-level and symbol-level [3]. Fusion of MS and Pan images at pixel-level is traditionally handled by the intensity-hue-saturation (IHS) transform methods [4,6], the Brovey transform methods [7], the principal component analysis (PCA) methods [6,5,24], the highpass filtering (HPF) method [25] and by the wavelet
* Corresponding author. Tel./fax: +86 451 86418042. E-mail address:
[email protected] (S. Liu). 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.05.010
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
3435
transform (WT) method [9,14,23,33]. The IHS transform method first transforms three MS bands from RGB color space to the spatial (I) and spectral (H, S) information space. Fusion is then performed by replacing the intensity components in the IHS space with the Pan image. The IHS method can preserve the spatial resolution of the Pan image, however, according to Prasad et al. [22], it severely distorts the spectral information of the MS image. The method based on Brovey transform is a simple color normalized method. It also usually introduces distortion to the spectral information. The same problem of spectral distortion is also found in the PCA method because fusion is performed by replacing the first principal component of the MS image with the Pan image, which results in a significant loss of spectral information. The HPF method simply adds the high-frequency components of the Pan image to the MS image. However, choosing a proper filter size is difficult in this method. The fusion method based on WT is also popular. Zhou et al. [33] employ WT to fuse landsat TM and SPOT Pan images. The SPOT image and each spectral band of the TM images are decomposed into an orthogonal wavelet representation at a given resolution, which consists of an approximation image and a set of spatially-oriented detailed images. Band by band, the approximation image from each TM band is combined with the detailed image from SPOT. Then the inverse WT is performed to obtain the fused image. The spectral and spatial quality of the fused image based on this method is better than those based on the IHS transform, PCA, and the Brovey transform [33]. In this paper, our motivation is to generate a high resolution MS image with more information and better quality than those provided by individual sensor image alone. The image fusion problem is considered as a statistical estimation problem. More specifically, the objective is to estimate the ideal high-resolution MS image from the observed MS image and the observed Pan image. To this end, covariance intersection (CI), which has been widely used in radar track fusion [28], is employed. CI can produce consistent estimate for any degree of cross-correlation between the input sources through convex combination [12,18]. In addition, CI enforces estimation consistency by means of convex combination of the inverse of covariance matrices. When CI is applied to image fusion, it treats the ideal high-resolution MS image as a linear combination of the ideal MS and Pan images and the fused coefficients are selected through the convex combination of weight optimization. However, many image fusion methods, such as conventional WT, choose the fused coefficients simply by selecting larger or by replacement [32,19,15], without using any rigorous procedure. To use the convex combination in CI, the covariance information of the estimation error is required. But for most imaging applications, this information is not available. Here, we use the expectation maximization (EM) to give a maximum likelihood estimate (MLE) of the covariance information. Formalized by Dempster et al. [8], the EM algorithm is an iterative procedure that estimates both the parameters and the missing or unobservable data during an iteration. The approach first computes the approximation to the expectation of the log-likelihood function of the complete data conditioned on the current parameter estimate, which is referred as the expectation (E-step). In this step, the current incomplete data estimate is calculated. Next, a new parameter estimate is computed by finding the value of the parameter that maximizes the function found in the E-step. This is called the maximization step (M-step). In this study, the ideal MS and Pan images are treated as the missing data, and the covariances of them are treated as the parameters to be estimated by EM. The remainder of this paper is organized as follows. The problem of fusing the MS image and the Pan image is formulated in Section 2. This formulation justifies the capability of CI to give a consistent and unbiased estimation for any degree of the cross-correlation, and can be used to optimally fuse the images. In Section 3, the EM is developed to help performing CI and the fast performing CI with weight in a suboptimal linear way to give the consistent estimation through convex combination is presented. Performance evaluation of the EM–CI method using real image sensory data is accomplished in Section 4 and shows that the EM–CI method preserves the spectral information very well. Section 5 concludes this paper. 2. Problem formulation ~1 and x ~ 2 , respectively, which are the vectors comThe observed MS image and the observed Pan image are denoted as x ~ 1 is used for the derivation example, the other posed of pixel values. To simplify the derivation, one band of the MS image as x bands have the same expressions. Due to sensor distortion and noise in the capture process of MS and Pan images, the observed images can be modeled as ‘‘linear observation plus Gaussian noise” [10,16]. That is,
~i ¼ Hi xi þ ni ; x
i ¼ 1; 2;
ð1Þ
where xi denotes the vector of pixel values from an ideal image, Hi represents the linear observation operator which is a known observation matrix. Typical examples of the linear observation operator Hi include optical blur, motion blur, tomob x x which is defined in the next graphic projections. ni is assumed to be zero-mean white Gaussian noise with covariance P i i paragraph. ^i can From (1), it is noted that neither the ideal image xi nor its statistics are available. However, the consistent estimate x be obtained. That is, if we define
b x x ¼ ðx ~i x ^i Þðx ~i x ^ i ÞT ; P i i
i ¼ 1; 2:
ð2Þ
and
^i Þðxi x ^i ÞT ; Pxi xi ¼ E½ðxi x
i ¼ 1; 2;
ð3Þ
3436
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
we have [11]
b x x P Px x ; P i i i i
i ¼ 1; 2:
ð4Þ
The inequality is in the sense of matrix positive definiteness, i.e., A > B if and only if A B is positive definite. It should be ^ x x represents the estimation error covariance which is considered to be equivalent to the covariance noted that the matrix P i i matrix of the noise ni. We further define the cross-correlation as
^1 Þðx2 x ^2 ÞT : Px1 x2 ¼ E½ðx1 x
ð5Þ
^ that combines x ^ 1 and x ^ 2 , i.e., Our objective is to construct a linear, unbiased estimate x
^ ¼ M1 x ^ 1 þ M2 x ^2 ; x
ð6Þ
so that
^ ¼ 0; E½x x
ð7Þ
provided that
M1 þ M2 ¼ I:
ð8Þ
^ xx , for Pxx ðPxx ¼ E½ðx x ^Þðx x ^ ÞT Þ, and to find a pair of M1 and M2 In addition, we want to determine a consistent estimate P ^ xx is optimal in the sense of minimal trace. such that the upper bound P According to the definition of Pxx, we have
Pxx ¼ M1 Px1 x1 MT1 þ M1 Px1 x2 MT2 þ M2 Px2 x1 MT1 þ M2 Px2 x2 MT2 :
ð9Þ
Similarly,
b xx ¼ M1 P b x x MT þ M1 P b x x MT þ M2 P b x x MT þ M2 P b x x MT : P 1 2 1 2 1 1 1 2 2 1 2 2
ð10Þ
If Px1 x2 ¼ 0, for any given M1 and M2, the estimate
b xx ¼ M1 P b x x MT þ M2 P b x x MT P 1 2 1 1 2 2
ð11Þ
b xx P Pxx Þ as a direct consequence of (4). The trace of the above P b xx is then minimized by will be consistent ð P
b xx ¼ ð P b 1 þ P b 1 Þ1 ; P x1 x1 x2 x2 b 1 ¼ P b xx P b x x ðP bx x þ P b x x Þ1 ; M1 ¼ P x1 x1 2 2 1 1 2 2
ð12Þ
b 1 ¼ P b xx P b x x ðP bx x þ P b x x Þ1 : M2 ¼ P x2 x2 1 1 1 1 2 2 Since the cross-correlation between x1 and x2 is generally non-zero and unknown, CI is employed here to deal with the situation when Px1 x2 –0. 3. EM–CI for image fusion ~i By virtue of statistical property of noise ni in (1), we can compute the conditional probability density function (PDF) of x as [1]
b 1 ½x b x x j1=2 exp 1 ½x ~i Hi xi T P ~i jxi Þ ¼ ð2pÞS=2 j P ~ pðx x i x i i Hi x i ; i i 2
i ¼ 1; 2;
ð13Þ
~i . Taking the logarithm of (13), the logarithmic likelihood function is where S denotes the number of pixels within the vector x thus
~i Þ ¼ Lðxi jx
n o 1 b 1 ½x b x x j S logð2pÞ 1 ½x ~ ~i Hi xi T P log j P x i x i i Hi x i ; i i 2 2 2
i ¼ 1; 2:
ð14Þ
b x x is the diagonal matrix under the white Gaussian noise assumption. The E-step of EM computes the In factor analysis, P i i expected log-likelihood function, i.e.,
~i Þjx ~i ; EP ¼ E½Lðxi jx
i ¼ 1; 2:
ð15Þ
b x x are obtained by maximizing (15). More specifically, the partial derivatives of the expected log-like^i and P In the M-step, x i i lihood function are set as zero. With the constant term ignored, we have,
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
@EP b 1 HT Hi x b 1 HT ¼ 0; i ¼ 1; 2; ^i P ~i P ¼x xi xi i xi xi i @xi h i 1 @EP 1 b x x ¼ 0; ~i x ~Ti 2Hi x ^i x ~Ti þ Hi x ^i x ^Ti HTi þ P ¼ x bx x 2 2 ii @P
3437
ð16Þ i ¼ 1; 2:
ð17Þ
i i
Solving these equations generate
b 1 Hi Þ1 HT P b 1 x ^i ¼ ðHTi P ~ x i xi xi xi xi i ;
i ¼ 1; 2;
ð18Þ
and
bx x ¼ x ~i x ~Ti 2Hi x ^i x ~Ti þ Hi x ^i x ^Ti HTi ; P i i
i ¼ 1; 2:
ð19Þ
b x x , i = 1,2, we can combine x b xx using CI. For a covariance matrix P, the covariance ellipse is the locus ^ i and P ^ and P Based on x i i of the points
ÞT P1 ðx x Þ ¼ gg; BP ðgÞ ¼ fx : ðx x
ð20Þ
is the mean of x. The covariance ellipse is a convenient way of visualizing the relative size of where g is a constant and x b xx always covariance matrices. If P1 < P2, BP1 ðgÞ BP2 ðgÞ. From the geometric interpretation of (10), the covariance ellipse of P b b b lies within the intersection of the covariance ellipses of P x1 x1 and P x2 x2 , for any possible choice of P x1 x2 , as shown in Fig. 1. b xx which encloses the intersection region must be consistent even if there is no knowledge Hence, a method which finds a P about Px1 x2 [26,27]. When Px1 x2 is unknown, in order to obtain an upper bound of (9), the inequality
(
E
pffiffiffi
1
cM1 ðx1 x^1 Þ pffiffiffi M2 ðx2 x^2 Þ c
pffiffiffi
1
T )
cM1 ðx1 x^1 Þ pffiffiffi M2 ðx2 x^2 Þ c
P0
ð21Þ
is utilized, where c > 0 is a scalar. It follows that
1
cM1 Px1 x1 MT1 þ M2 Px2 x2 MT2 P M1 Px1 x2 MT2 þ M2 Px2 x1 MT1 : c
ð22Þ
b xx for Pxx can be obtained as From (4) and (22), a consistent estimate P
b xx ¼ ð1 þ cÞM1 P b x x MT þ 1 þ 1 M2 P b x x MT : P 1 2 1 1 2 2
c
b xx . Note that The value of c is then chosen to minimize the trace of P
b xx always lies inside the intersection. Fig. 1. Covariance P
ð23Þ
3438
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
b x x MT Þ þ TrðM2 P b x x MT Þ þ cTrðM1 P b x x MT Þ þ 1 TrðM2 P b x x MT Þ P TrðM1 P b x x MT Þ þ TrðM2 P b x x MT Þ b xx Þ ¼ TrðM1 P Trð P 1 2 1 2 1 2 1 1 2 2 1 1 2 2 1 1 2 2
c
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 b x x MT ÞTrðM2 P b x x MT Þ ¼ b x x MT Þ þ TrðM2 P b x x MT Þ ; þ 2 TrðM1 P TrðM1 P 1 2 1 2 1 1 2 2 1 1 2 2
ð24Þ
where the equality holds when
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u b x x MT Þ uTrðM2 P 2 2 2 : c¼t b x x MT Þ TrðM1 P 1 1
ð25Þ
1
b xx P b 1 ; M2 ¼ x2 P b xx P b x x ; c ¼ x2 , (23) can be further written as For a particular choice, i.e., if M1 ¼ x1 P x1 x1 2 2 x1
b xx ¼ x1 P b xx P b 1 P b b b 1 b b 1 b 1 b b P x1 x1 xx þ x2 P xx P x2 x2 P xx ¼ P xx ðx1 P x1 x1 þ x2 P x2 x2 Þ P xx ;
ð26Þ
which is satisfied by
b 1 ¼ x1 P b 1 þ x2 P b 1 : P xx x1 x1 x2 x2
ð27Þ
Meanwhile, (6) becomes
b xx P b 1 x b b 1 ^ ^ ¼ x1 P ^ x x1 x1 1 þ x2 P xx P x2 x2 x2
ð28Þ
with nonnegative coefficients x1 and x2 obeying
x1 þ x2 ¼ 1:
ð29Þ
Here, (27) and (28) as the CI equations are obtained. b xx . The minimizing process requires The weighting coefficients, x1 and x2, are usually chosen to minimize the trace of P optimization of a nonlinear cost function which is convex with respect to x1 and x2 [18]. The minimized trace is obtained by an exhaustive search of x1 and x2 within the range of [0, 1]. This process has a high computational burden. Therefore a subb x x , i = 1, 2, prooptimal non-iterative linear algorithm is necessary. It is well known that the trace of the covariance matrix P i i b b ^ ^ vides the uncertainty measure of the estimation on xi , i = 1, 2. If Trð P x1 x1 Þ Trð P x2 x2 Þ, it infers that x2 has a much larger b 1 x b xx P b b ^1 . We thus have x ^ ^P estimation error than x x1 x1 1 , which implies x1 1 and x2 0. Similarly, if Trð P x1 x1 Þ ¼ Trð P x2 x2 Þ, 1 b b 1 ^ 1 b xx P b 1 x ^ 12 P ^ it is expected that x x1 x1 1 þ 2 P xx P x2 x2 x2 . In other words, x1 ¼ x2 ¼ 2. Considering these, an additional linear constraint
x1 Trð Pb x1 x1 Þ x2 Trð Pb x2 x2 Þ ¼ 0
ð30Þ
is proposed. Substituting (29) into (30), it is obtained that
x1 Trð Pb x1 x1 Þ ð1 x1 ÞTrð Pb x2 x2 Þ ¼ 0; b x x Þ x2 Trð P b x x Þ ¼ 0: ð1 x2 ÞTrð P 1 1 2 2
ð31Þ
b x x Þ þ Trð P b x x Þ > 0, x1 and x2 can be computed by When Trð P 1 1 2 2
bx x Þ Trð P 2 2 ; bx x Þ b Trð P x1 x1 Þ þ Trð P 2 2 bx x Þ Trð P 1 1 : x2 ¼ bx x Þ b Trð P x1 x1 Þ þ Trð P 2 2
x1 ¼
ð32Þ
b x x Þ P 0 ensure that 0 6 xi 6 1, i = 1, 2. It is noted that Trð P i i 4. Performance evaluation We treat image fusion as an estimation problem in this paper. The observed images (MS and Pan images) are modeled as a linear transformation of the ideal images plus Gaussian noise. The goal is to recover the ideal images and combine them through convex combination of the inverse of covariance matrices. And we use the trace of the covariance matrix of the estimation error as the metric of image fusion performance. The expression of this metric cannot be derived, because there is no way to get the cross-correlation between x1 and x2. But its upper bound can be obtained due to the CI theory. To minimize b x ;x and the unbiased estimate x ^ i are estimated using the EM algorithm. The weights the upper bound, the covariance matrix P i i w1 and w2 for the observed images, respectively, are also obtained by an exhaustive search. Furthermore, to reduce the computing complexity required to find w1 and w2, we also derived a suboptimal linear algorithm. In this paper, QuickBird MS and Pan images are used to evaluate the performance of the proposed EM–CI method. The spatial resolution of the MS image is 2.8 m (meters) while it is 0.7 m for the Pan image. The low-resolution MS image is
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
3439
re-sampled using cubic interpolation to the same size as the high-resolution Pan image, as shown in Figs. 2(a) and 3(a). The Pan images are shown in Figs. 2(b) and 3(b). Our objective is to increase the spatial resolution of MS image by injecting spatial details of the Pan image into the MS image, while preserving spectral information of the MS image as much as possible. It is known that the spatial detailed information of Pan image is mostly carried by its high-frequency components, while the spectral information of MS image is
Fig. 2. Experiments on the first pair of QuickBird images: (a) the re-sampled multi-spectral image; (b) the panchromatic image; (c) the fused image by EM– CI; (d) the fused image by WT; (e) the fused image by PCA.
3440
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
Fig. 3. Experiments on the second pair of QuickBird images: (a) the re-sampled multi-spectral image; (b) the panchromatic image; (c) the fused image by EM–CI; (d) the fused image by WT; (e) the fused image by PCA.
mostly carried by its low-frequency components. If the high-frequency components of the MS image are simply substituted by the high-frequency components of the Pan image [33,15], the spatial resolution is improved but with the loss of spectral information from the high-frequency components of MS image. To avoid this, WT is first applied to the MS and Pan images to extract the spatial detailed information and spectral information, respectively. Then keeping the spectral information of MS image untouched, the spatial details of the MS and Pan images are fused using EM–CI. After the inverse WT, the fused images can be obtained, which are shown in Figs. 2(c) and 3(c). In order to further minimize the spectral distortion, histogram
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
3441
matching [17] is applied to the Pan image to make its brightness and contrast best match that of each MS image band by band. The fusion process of Pan details is applied with a fixed decomposition scale which is given by the spatial resolution of Pan and MS images (log2 of scale ratio). The performance of EM–CI is compared to two conventional methods. One is WT method. WT is first applied to each MS band and Pan images. Band by band, the approximations are kept untouched and the spatial details are replaced by corresponding coefficients of the Pan image. Then, the inverse WT is performed to obtain the fused images, which are shown in Figs. 2(d) and 3(d). Another method is based on PCA. The MS image is transformed with PCA, and the principle components are obtained. The first principle component of the MS image is replaced with the histogram-matched Pan image. The fused images are obtained when the new first principle component and the other principle components are transformed with the inverse PCA transform, and are shown in Figs. 2(e) and 3(e). In these figures, WT and EM–CI methods utilize the same WT decomposition scale. From these figures, it is observed that the fused images using EM–CI preserve most of the spectral information of MS images and improve the spatial resolution simultaneously. As for the fused images by WT, the spectral performance is a little inferior to that of EM–CI, and the spatial performance is improved. For the PCA results, they have serious spectral distortion, especially in the white areas of the fused image, also with the improved spatial details. Besides the subjective evaluation, the quantitative assessment of fusion performance is also necessary. Two sets of criteria are used in our study to evaluate the spectral and spatial performance, respectively. The spectral quality of the fused image can be measured by the correlation coefficient (CC) and the spectral discrepancy (SPD). CC describes the correlation degree between two images, which provides a similarity measure of the spectral information between the fused image and the multi-spectral image. For each band, the definition of CC can be written as
PM PN m¼1 n¼1 ½MSFðm; nÞ l½MSRðm; nÞ lMSR ffi; CC ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PM PN PM PN m¼1 n¼1 ½MSFðm; nÞ l m¼1 n¼1 ½MSRðm; nÞ lMSR
ð33Þ
where MSF(m, n) and MSR(m, n), respectively, denote the pixel value of the fused MS image and the reference image at the position (m, n), M N is the image size, and l and lMSR are the intensity averages of MSF and MSR, respectively. Because we do not have any MS image at 0.7 m to compare with, following the Wald protocol [29], the original MS image at the resolution of 2.8 m is used as MSR. The fused image MSF is obtained by fusing the MS and Pan images at the degraded scale. More specifically, the original MS image at the resolution of 2.8 m is degraded to the one at the resolution of 11.2 m. Meanwhile, the resolution of the original Pan image is also reduced from 0.7 m to 2.8 m. Both the degraded MS image and the degraded Pan image are then fused to generate MSF at the resolution of 2.8 m. And CC is computed using MSF and MSR at a degraded scale, i.e., 2.8 m. The desirable value of CC is 1. At each band, the SPD function is computed by
P SPD ¼
m;n jMSFðm; nÞ
MSRðm; nÞj : MN
ð34Þ
A small SPD indicates a good spectral performance. These indices defined above only evaluate the spectral difference between corresponding bands of the original and fused images. In order to evaluate the global spectral quality, the following index is used. The ERGAS coefficient is calculated as the global spectral quality description on all the fused multi-spectral bands. It is defined as
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u k¼K 2 dh u 1 X RMSEðkÞ ; ERGAS ¼ 100 t dl K k¼1 lðkÞ
ð35Þ
where dh/dl is the ratio between resolutions of the high-resolution Pan image and the low-resolution MS image (for QuickBird data, dh/dl = 1/4), l(k) is the average value of the kth band and K is the number of bands. The RMSE(k) of each spectral band is computed by
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M N XX 1 u t RMSEðkÞ ¼ ½ MSFk ðm; nÞ MSRk ðm; nÞ2 : MN m¼1 n¼1
ð36Þ
Lower values of the ERGAS coefficient imply higher spectral quality for the fussed image. The desirable value of ERGAS is therefore 0. For the spatial quality evaluation, we use the average gradient (AG) index. AG describes the changing feature of image texture and the detailed information. Larger values of the AG index correspond to higher spatial resolution. The AG index of the fused images at each band can be computed by
AG ¼
1 MN
M X m¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uh i h i2 u @MSFðm;nÞ 2 þ @MSFðm;nÞ @m @n : 2 n¼1
N t X
ð37Þ
3442
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
Table 1 Objective evaluation of fusion performance using the first pair of QuickBird images. Fusion method
Band
Spectral quality CC
EM–CI
WT
PCA
Spatial quality SPD
ERGAS
AG
R G B
0.9801 0.9770 0.9799
6.7148 6.2489 6.8925
2.6290
25.3757 20.9784 25.7214
R G B
0.9489 0.9418 0.9474
11.1940 10.1281 11.5785
4.1885
28.1817 23.6103 28.9022
R G B
0.8972 0.8721 0.8733
13.7137 12.5861 15.1401
6.2148
25.3470 23.4976 26.0551
ERGAS
AG
Table 2 Objective evaluation of fusion performance using the second pair of QuickBird images. Fusion method
Band
Spectral quality CC
EM–CI
WT
PCA
Spatial quality SPD
R G B
0.9775 0.9762 0.9792
6.7511 6.7138 6.9887
3.4483
23.5711 21.0510 24.1559
R G B
0.9463 0.9428 0.9478
10.8203 10.5158 11.3393
5.3411
26.3574 23.3114 27.4033
R G B
0.9117 0.9226 0.8977
13.2300 11.9986 14.7983
6.7806
24.6685 23.7542 25.9798
The results of these objective evaluations on the two pairs of testing images are listed in Tables 1 and 2. From these values, it is obvious that the spectral distortion introduced by EM–CI is less than that of WT and PCA for both the local and global evaluation of spectral quality. For the spatial performance, EM–CI has a slightly degraded performance comparing with WT and PCA. As human visual system is more sensitive to the spatial resolution than to the spectral one, results of EM–CI approach are not seem to be significantly better than WT approach. The WT method preserves the detailed spatial information through the multi-scale decomposition and through the complete replacement by the Pan image at each discrimination scale. However, due to this replacing process, the partial spectral information still contained in the high-frequency components of MS image is lost. For PCA, the simple replacement results in the serious loss of spectral information, because the first principle component of the MS image, which contains much spectral information, is completely discarded. In contrast, EM–CI utilizes all the information from the two source images, while keeping the low-frequency components of MS image untouched. In addition, EM–CI uses convex combination and weight optimization to fuse two images in order to minimize the spectral distortion. This guarantees that EM–CI does not cause significant spectral distortion in fusion. These two reasons explain the superior spectral performance of EM–CI over WT and PCA. Meanwhile, EM–CI optimally injects spatial details of the Pan image into the MS image. Thus, the detailed information from both images is preserved. Overall speaking, EM–CI is capable of enhancing the spatial quality of the MS image while preserving the spectral content to a greater extent. Comparing with WT and PCA methods, the proposed EM–CI method preserves more significant spectral information at the cost of slightly lower improvement on spatial quality. 5. Conclusions Covariance intersection, which combines multiple data sources through convex combinations for any degree of cross-correlation, is one of the most popular information fusion approach. In this paper, we propose a novel EM–CI approach for fusion of MS and Pan images. More specifically, the ideal MS and Pan images are estimated by EM along with the covariance matrices of the estimation error. Then, CI is applied to combine the two images and provide a consistent estimate of the high-resolution MS image. The proposed method can accurately preserve both the spectral information and the high-resolution spatial information. Its effectiveness is demonstrated with the real remote sensing images. Acknowledgements This work was supported by the National Natural Science Foundation of China under Grant Nos. 10674038 and 10974039, and jointly supported by the State Scholarship of the China Scholarship Council for study abroad (Grant No. [2007]3020). Ms.
Q. Guo et al. / Information Sciences 180 (2010) 3434–3443
3443
Qing Guo thanks Rob Edwards for his help with the English wording of the manuscript. The authors would like to thank the anonymous reviewers for their comments and suggestions. References [1] J.A. Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, Technical Report, University of California, Berkeley, ICSI-TR-97-021, 1998. [2] F. Bovolo, L. Bruzzone, L. Capobianco, A. Garzelli, S. Marchesi, F. Nencini, Analysis of the effects of pansharpening in change detection on VHR images, IEEE Geoscience and Remote Sensing Letters 7 (1) (2010) 53–57. [3] D.M. Bulanona, T.F. Burksa, V. Alchanatisb, Image fusion of visible and thermal images for fruit detection, Biosystems Engineering 103 (2009) 12–22. [4] W.J. Carper, T.M. Lillesand, R.W. Kiefer, The use of intensity-hue-saturation transformations for merging SPOT panchromatic and multispectral image data, Photogrammetric Engineering and Remote Sensing 56 (1990) 459–467. [5] P.S. Chavez, A.Y. Kwarteng, Extracting spectral contrast in Landsat thematic mapper image data using selective component analysis, Photogrammetric Engineering and Remote Sensing 55 (1989) 339–348. [6] P.S. Chavez, S.C. Sildes, J.A. Anderson, Comparison of three different methods to merge multiresolution and multispectral data: Landsat TM and SPOT panchromatic, Photogrammetric Engineering and Remote Sensing 57 (1991) 295–303. [7] D.L. Civco, Y. Wang, J.A. Silander, Characterizing forest ecosystems in Connecticut by integrating Landsat TM and SPOT panchromatic data, Proceedings of the ACSM/ASPRS Annual Convention, Charlotte, NC, vol. 2, ASPRS, Reston, VA, 1995, pp. 216–224. [8] P. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B 39 (1977) 1–38. [9] B.G. Duport, The use of multiresolution analysis and wavelet transform for merging Spot panchromatic and multispectral image Data, Photogrammetric Engineering and Remote Sensing 62 (1996) 1057–1066. [10] M.A.T. Figueiredo, R.D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing 12 (8) (2003) 906–916. [11] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, 1970. [12] S.J. Julier, J.K. Uhlmann, A non-divergent estimation algorithm in the presence of unknown correlations, Proceedings of the American Control Conference 4 (1997) 2369–2373. [13] M.M. Khan, J. Chanussot, L. Alparone, Hyperspectral pansharpening using QNR optimization constraint, in: WHISPERS Conference, 2009. [14] H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform, Graphical Models and Image Processing 57 (1995) 235–245. [15] S. Li, J.T. Kwok, Y. Wang, Using the discrete wavelet frame transform to merge Landsat TM and SPOT panchromatic images, Information Fusion 3 (2002) 17–23. [16] J.S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall Press, New Jersey, USA, 1990. [17] J. Morovic, J. Shaw, P.L. Sun, A fast, non-iterative and exact histogram matching algorithm, Pattern Recognition Letters 23 (2002) 127–135. [18] W. Niehsen, Information fusion based on fast covariance intersection filtering, in: IEEE Proceedings of the Fifth International Conference on Information Fusion, vol. 2, 2002, pp. 901–904. [19] J. Núñez, X. Otazu, O. Fors, A. Prades, V. Palà, R. Arbiol, Multiresolution based image fusion with additive wavelet decomposition, IEEE Transactions on Geoscience and Remote Sensing 37 (1999) 1204–1211. [20] C. Pohl, J.L. Van Genderen, Multisensor image fusion in remote sensing: concepts, methods, and application, International Journal of Remote Sensing 19 (1998) 823–854. [21] L.I. Perlovsky, Cognitive high level information fusion, Information Sciences 177 (2007) 2099–2118. [22] N. Prasad, S. Saran, S.P.S. Kushwaha, P.S. Roy, Evaluation of various image fusion techniques and imaging scales for forest features interpretation, Current Science 81 (2001) 1218–1224. [23] P. Scheunders, S.D. Baeker, Fusion and merging of multispectral images using multiscale fundamental forms, Journal of the Optical Society of America A 18 (2001) 2468–2477. [24] V.P. Shah, N.H. Younan, R.L. King, An efficient pan-sharpening method via a combined adaptive PCA approach and contourlets, IEEE Transactions on Geoscience and Remote Sensing 46 (5) (2008) 1323–1335. [25] V.K. Shettigara, A generalized component substitution technique for spatial enhancement of multispectral images using a higher resolution data set, Photogrammetric Engineering and Remote Sensing 58 (1992) 561–567. [26] J.K. Uhlmann, Dynamic map building and localization: new theoretical foundations, PhD Thesis, University of Oxford, 1995. [27] J. Uhlmann, General data fusion for estimates with unknown cross covariances, in: Proceedings of the SPIE Aerosense Conference, 1996, pp. 165–173. [28] N.G. Wah, Y. Rong, Comparison of decentralized tracking algorithms, in: Information Fusion, Sixth International Conference, 2003, pp. 107–113. [29] L. Wald, T. Ranchin, M. Mangolini, Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images, Photogrammetric Engineering and Remote Sensing 63 (6) (1997) 691–699. [30] R.R. Yager, A framework for multi-source data fusion, Information Sciences 163 (2004) 175–200. [31] G. Yang, Y. Lin, P. Bhattacharya, A driver fatigue recognition model based on information fusion and dynamic Bayesian network, Information Sciences 180 (2010) 1942–1954. [32] D.A. Yocky, Multiresolution wavelet decomposition image merger of Landsat Thematic Mapper and SPOT panchromatic data, Photogrammetric Engineering and Remote Sensing 62 (1996) 1067–1074. [33] J. Zhou, D.L. Civco, J.A. Silander, A wavelet transform method to merge Landsat TM and SPOT panchromatic data, International Journal of Remote Sensing 19 (1998) 743–757.