SALIENCE PRESERVING IMAGE FUSION WITH DYNAMIC RANGE COMPRESSION Chao Wang1 , Qiong Yang2 , Xiaoou Tang2 and Zhongfu Ye1 1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China 2. Microsoft Research Asia, Beijing Sigma Center, Zhichun Road, Hai Dian District, Beijing, China
ABSTRACT Gradient conveys important salient features in images. Traditional fusion methods based on gradient generally treat gradients from multichannels as a multi-valued vector, and compute its global statistics under the assumption of identical distribution. However, different source channels may reflect different important salient features, and their gradients are basically non-identically distributed. This prevents existing methods from successful salience preservation. In this paper, we propose to fuse the gradients from multi-channels in the concept of saliency. We first measure the salience map of each channel’s gradient, and then use their saliency to weight their contribution in computing the global statistics. Gradients with high saliency are properly highlighted in the target gradient, and thereby salient features in the sources are well preserved. Furthermore, we handle the dynamic range problem by applying range compression on the target gradient, and thereby halo effect is effectively reduced.
8
8
8
6
6
6
4
4
4
2
2
2
0
0
−2
0
2
4
6
12
−2
0
0
2
4
6
8
10
12
−2
0
2
(b)
4
6
8
10
12
(c)
Fig. 1. PCA on i.i.d. and non-i.i.d. samples. Blue dot: sample of class 1; Green dot: sample of class 2; Red line: component axis. (a) is the PCA on i.i.d. samples; (b) is PCA on double-distribution samples; (c) is the weighted PCA on double distribution samples in (b). The weight of each sample is the square of l2 -norm. 1.5
1.5
f1
1 0.5
−0.5
1.5 df1
1
0
20
40
0.5
0 20
40
−0.5 60 0
20
40
1.5 f2
1
f
d
1 reconstruction of df with range limitaion
0.5
0
0 reconstruction of df −0.5 60 0 20 40 60 1.5
df2
1
0.5
f
1
0.5
0 −0.5 60 0
1.5 df
1
0.5
1.5
Multi-channel image fusion is an area of intense research activity in recent years. A number of fusion strategies existed for general image fusion problem. One is the preset weighted fusion, which uses a preset weight to average multi-bands into a single channel image. Using this strategy, salient feature in different bands may be completely obliterated when the weight is inappropriately chosen, and it is always difficult to determine a priori which bands should be emphasized over others. Another is fusion decision map, which is often used in multi-resolutional methods based on pyramidal decompositions [4,7] and wavelet transform [1,2,3]. It usually selects the “most relevant” band to be incorporated into the fused image. However, it is sensitive to outliers. Zhang & Blum [2] used a weight based on the activity level and the similarity between multi-bands to fuse the multiple sources. But they did not take into account global statistics. Therefore, the fused result is not guaranteed to be statistically optimal. Furthermore, when the number of bands increases, the weighting strategy becomes remarkably more complex. Statistical information about multi-band images has been investigated in many tasks, such as visualization of remote sensing imagery as well as multimodal medical imaging. Multi-band images are generally treated as a multi-valued vector, and their global statistics are analyzed. For instance, principal component analysis (PCA) is used to find the linear mapping function which is optimal in statistical sense. Following the widely accepted assumption that human visual system is more sensitive to local intensity changes than the absolute luminance [8], Socolinsky and Wolff[5,6] utilized global statistics of multi-channel image’s gradients instead of intensities. They constructed the contrast vector field for multispectral image visualization. In this way, they reported to reveal significantly more interpretive information than intensity-based statistical methods.
10
(a)
0
1. INTRODUCTION
8
0
0.5 0 halo
−0.5
0
20
40
−0.5 60 0
20
40
60
−0.5
0
20
40
60
Fig. 2. The origination of halo in gradient visualization: a 1D case. f1 and f2 are two source signals, df1 and df2 are their corresponding gradient, df is the fused gradient using max operation; f and fd are the reconstruction results, without and with dynamic range constraints. In Socolinsky’s method, the multi-channels are assumed to be identically distributed. However, gradients from different channels may reflect different important features, and their distributions are basically non-identical. Thus when a minority of channels embody the important features, such features may be obliterated by the unimportant ones. For example, in Fig.3, the source channel (a)-(c) do not clearly show the books on the bottom shelves behind the right chair, but (d) and (e) contain such information. The identical distribution assumption may cause that the salient features from channel (d) and (e) are obliterated by other channels. For an illustrative view, we take PCA as an example as Fig.1 shows, since Socolinsky’s target gradient can be deemed as the principle component of all source channels’ gradients except for a normalization factor, if using the Euclidean metric. Assume that the samples are gradients at a given pixel from multiple channels. In Fig.1a, all channels manifest salient features, and PCA finds their principal component and thereby gives a good representation of the salient feature. But when there are some channels which have small gradients (green dots in Fig.1b), the PCA
(a)
(b)
(f)DWT[1]
(c)
(d)
(g)Socolinsky’s [6]
(e)
(i)Goshtasby’s [9]
(h)proposed
Fig. 3. Multi-exposure image fusion: room. (a)-(e) are source channels [9], (f)-(i) are fusion results by DWT, Socolinsky’s, the proposed, and Goshtasby’s method. results become meaningless (Fig.1b). Therefore, different weights should be assigned to different channels: the more salient feature the channel conveys, the larger weight is assigned. This protects the important information from being obliterated by those unimportant, and thereby preserves salience in the channels properly in the fusion process. This can be shown in Fig.1c, where the principal axis is quite close to that in Fig.1a when we weight each sample by the square of its l2 -norm. In addition, since the images are fused in the gradient field, not directly by intensity, the visualization may cause a result with a dynamic range larger than the permitted one. In such a case, when considering the dynamic range constraint, halo occurs. A 1D example is shown in Fig.2. Assume the dynamic range is [0, 1]. f1 and f2 are two source signals, with a same step at different locations. After their gradients df1 and df2 are fused by max operation, denoted as df , the direct reconstruction leads to a result f exceeding the range [0, 1]. Then we consider the dynamic range limitation. Such a hard limitation acts like a hard constraint enforcing a stick being bent. And thus halos occur at the sharp jumps in the result fd . Such halos result in degradation in the visual perception, such as the door frame in Fig.3g. Thus the range of the target gradient should be controlled for halo reduction. In this paper, we propose to assign each channel’s gradient an importance weight in computing the global statistics. We first measure the salience map of each channel, and then compute the importance weight based on the saliency. After that, we use the weighted gradient to construct the contrast form, thus gradients with high saliency are properly highlighted in the target gradient, and thereby salient features in the sources are well preserved, which is similar to Fig.1c. To further improve the result, we apply dynamic range compression on the target gradient, so that halos are effectively reduced. 2. SALIENCE PRESERVING FUSION 2.1. Gradient fusion with salience map 2.1.1. Importance weights based on salience map Denote N -channel registered images as {fk , k = 1, 2, · · · , N } and Ω is the region of the whole image. All source channels are adjusted
to have the same mean gradients as the maximum mean gradient channel, so that they can be compared and computed at the same level. For channel fk , and for pixel p, we measure the salience of p in fk as follows [11]: Sk0 (p) = meanq∈Θp {d(fk (p), fk (q))},
(1)
Sk0 (p)/maxq∈Ω (Sk0 (q))},
Sk (p) = rescale{1 − (2) where Θp is the neighborhood of pixel p; rescale is an operation to ensure the dynamic range of Sk within [0, 1], rescale(A) = (A − Amin )/(Amax − Amin ), and d(a, b) is defined as −
(b−a)2
d(a, b) = e 2σ2 . In this paper, we set Θp to be a 5 × 5 neighborhood of p, and σ 2 = 100. Such Sk (p) represents the contrast around p, and thus measures the local salience. Sk (p) with a value closer to 1 means the pixel p is more important within channel k. We compare all the salience maps Sk , and assign a normalized weight to each pixel in each channel Sk (p)n ωk (p) = qP . N 2n ) (S (p) l l=1
(3)
Here, ωk is defined to be the importance weight of channel k. The positive parameter n reflects that on what degree the fused gradient resonates with the channel of high salience. 2.1.2. Importance-weighted contrast form We then construct the importance-weighted contrast form as follows: " P # P 2 ∂fk 2 ∂fk ∂fk k (ωk (p) ∂x ) k ωk (p) ∂x · ∂y P C(p) = P 2 . (4) ∂fk ∂fk ∂fk 2 k ωk (p) ∂x · ∂y k (ωk (p) ∂y ) It can be rewritten as "
1 ω1 ∂f ∂x 1 ω1 ∂f ∂y
2 ω2 ∂f ∂x 2 ω2 ∂f ∂y
··· ···
N ωN ∂f ∂x N ωN ∂f ∂y
1 ω1 ∂f ∂x # ∂f2 ω 2 ∂x .. . N ωN ∂f ∂x
1 ω1 ∂f ∂y 2 ω2 ∂f ∂y .. . N ωN ∂f ∂y
.
(a)
(b)
(c)
(d)
(e)
(g)DWT [1]
(h)Socolinsky’s [6]
(i)proposed w/o DRC
(j)proposed with DRC
(k)target gradient of(h)
(l)target gradient of(i)
(f)
Fig. 4. Multi-exposure image fusion: garage. (a)-(f) are source channels [10]. (g) and (h) are DWT and Socolinsky’s results respectively. (i) and (j) are the proposed results, without and with dynamic range compression. (k) and (l) are the target gradients of (h) and (i). k ∂fk t From a statistical perspective, if we deem ( ∂f , ∂y ) , k = 1, 2, ..N ∂x as the samples, the contrast form C(p) is exactly their weighted covariance matrix with the weight ωk (p).
2.1.3. Target gradient field The target gradient V(p), at pixel p, is constructed by eigen-decomposition on C(p). The magnitude of gradient is the square root of the maximum eigenvalue of C(p), and the direction is the corresponding eigenvector. Such a target gradient V is actually the principle component of the source channels’ gradients weighted by their importance, which is the optimal representation for the weighted gradients in the sense of least-mean-square-error. By applying the importance weight as eq.(3) on the contrast form, the target gradient field preserves the salience of the sources well (see Fig.1c and Fig.1a).
(multiplied by a scale factor of 1). Gradients of magnitude larger than α are attenuated, while gradients of magnitude smaller than α are slightly magnified. In the following sections, β is set to 0.8, and α = 0.8 · mean{|V|}. Such a modified gradient V∗ may result in halo reduction in general. Since the strong edges can also be easily observed, the target V∗ preserves the salience, and we use it as the final target gradient of our proposed method. 2.3. Reconstruction from target gradient Given the target gradient V∗ , the fused result can be found to be a 2D function g which minimizes Z |∇g − V∗ |2 dΩ, g(x, y) ∈ [0, 255] (6) Ω
2.2. Dynamic range compression for halo reduction As has been analyzed in Sec. 1, the gradient V may be over-enlarged and thus may result in halo artifacts. Since a large gradient can be still easily perceived after slight attenuation, and a slight boost on the small gradient can help improve the weak detail, we adopt the following function to modify the target gradient, similar to [8], V∗ (p) = (
α )1−β · V(p) |V(p)|
(5)
This is a two-parameter family of functions. The parameter β is within (0, 1), it controls the strength of the attenuation for large gradients (also the strength of the boost for small gradients). The parameter α determines which gradient magnitudes remain unchanged
Such a functional as (6) can be solved by iterative steps [6]: ( 1 g(p)t+ 2 = g(p)t + 14 (∆g t (p) − divV∗ (p)) 1 g(p)t+1 = max(0, min(255, g(p)t+ 2 ))
(7)
Note that our approach is presented in a general mathematical form, so it can be implemented to an arbitrary number of channels. Furthermore, since it is based on the global statistics such as first- and second order moment, it will be robust to outliers. 3. EXPERIMENTAL RESULTS 3.1. Image fusion A variety of experiments are conducted to show the effectiveness of our methods. We first compare our results with that of Socolinsky’s
(a)color source
(b)Photoshop gray
(c)Socolinsky’s[6]
(d)proposed
(e)Gooch’s [12]
Fig. 5. color removal results. method. As Fig.4 shows, by using importance to weight the gradients from 6 channels, our methods (Fig.4i and Fig. 4j) present the car in the garage more clearly than Socolinsky’s result (Fig.4h). This can be also seen from the target gradient maps (Fig.4k and Fig.4l), where our target gradient in the garage is more clear than Socolinsky’s, relative to other regions. The figures also reflect the effectiveness of the dynamic range compression in Sec.2.2. Comparing Fig.4h, Fig.4i and Fig.4j, we can observe that the halo in Socolinsky’s result (such as the halo around the pipeline) is the most serious. The halo in Fig.4i still exists although it’s much slighter than Fig.4h. After applying dynamic range compression, the halo artifacts are greatly reduced. Note that the car in the garage is as clear as Fig.4i, which indicates the salience is still well preserved after range compression. We also compare our result with DWT result [1]. As shown in Fig.4g, the car in the garage is less clear than our results, furthermore “ringing” occurs near strong edges due to the spatial extension of the filters. Another multi-exposure image fusion result has been shown in Fig.3. In our result (Fig.3h), more salient features are uncovered than that in Socolinsky’s method (Fig.3g) [6], such as the books on the bottom shelves behind the right chair. This is because that we weight each channel’s gradient based on their importance, while Socolinsky treats them equally. The halo is also obvious in Socolinsky’s result, while unobservable in our method. Compared with DWT result (Fig.3f) [1], we have no side effect of “ringing”, which may obliterate details at the same location. The result from [9] (Fig.3i) is also given, where the visualization for the region of bottom shelves is worse than our result.
3.2. Other application Color removal is similar to image fusion in the sense that the information from RGB channels is integrated into a single channel. As a simple example, we directly implement our fusion method to perform the task of color removal, as shown in Fig.5. (a) is a map with RGB color, where the island (green) has a same luminance with the sea (blue). The direct gray scale version by Photoshop cannot discriminate them, thus the island disappears in (b). (c) is the result of RGB fusion using the method in [6], where halo artifacts are severe around the island and around the characters. (d) is the fusion result using the proposed method, where information in the source is well preserved and halo is absent. (e) is the state-of-the-art color removal result [12].
4. CONCLUSION In this paper, a salience-preserving image fusion algorithm is proposed to fuse multi-channel images into one single image based on importance-weighted gradient, where the importance weight is measured through the salience map of each channel. In this way, salient features in the sources are well preserved, which is demonstrated by a variety of experiments. The dynamic range compression on the target gradient is further employed for halo reduction. The new algorithm can be used to fuse images with arbitrary number of bands, and it is also valuable for other applications where it is applied to color removal as an example. 5. REFERENCES [1] H.Li, B.S.Manjunath, and S.K.Mitra, “Multisensor Image Fusion Using the Wavelet Transform”, Graphical Models and Image Processng, 1995, 57, (3), pp.235-245 [2] Z.Zhang, and R.S.Blum, “A Categorization of MultiscaleDecomposition-Based Image Fusion Schemes with a Performance Study for a Digital Camera Application”, Proceedings of the IEEE, 1999, 87, (8), pp.1315-1326. [3] G.Pajares, and J.M.Cruz, “A wavelet-based image fusion tutorial”, Pattern Recognition, 2004, 37, pp.1855-1872. [4] Z.Liu, K.Tsukada, K.Hanasaki,Y.K.Ho and Y.P.Dai, “Image fusion by using steerable pyramid”, Pattern Recognition Letters, 2001, 22, pp.929-939. [5] D.A.Socolinsky, and L.B.Wolff, “Multispectral Image Visualization Through First-Order Fusion”, IEEE Transactions on Image Processing, 2002, 11, (8), pp.923-931. [6] D.A.Socolinsky, “A Variational Approach to Image Fusion”, Ph.D thesis, the Johns Hopkins University, 2002 [7] P.R.Burt, and E.H.Adelson, “The Laplacian Pyramid as a Compact Image Code”, IEEE Trans. Comm,1983,(4), pp.532-540. [8] R.Fattal, D.Lischinski, and M.Werman, “Gradient Domain High Dynamic Range Compression”, ACM SIGGRAPH, 2002. [9] A.Goshtasby, “Image Fusion systems Research”, availible at http://www.ablen.com/hosting/imagefusion/resources/images/ imgfsr/imgfsr.html, 2006. [10] “source exposures courtesy of Shree Nayar”, availible at ˜ http://www.cs.huji.ac.il/danix/hdr/pages/ columbia.html, 2006 [11] Y.-F.Ma, H.-J.Zhang, “Contrast-based image attention analysis by using fuzzy growing”, ACM Multimedia, 2003, pp.374-381. S.C.Olsen, J.Tumblin, and B.Gooch, [12] A.A.Gooch, “Color2Gray: Salience-Preserving Color Removal”, ACM SIGGRAPH, 2005, pp.634-639.