A translation- and scale-invariant adaptive wavelet transform - Image ...

Report 5 Downloads 46 Views
2100

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

A Translation- and Scale-Invariant Adaptive Wavelet Transform Huilin Xiong, Tianxu Zhang, and Y. S. Moon

Abstract—This paper presents a new approach to deal with the translation- and scale-invariant problem of discrete wavelet transform (DWT). Using a signal-dependent filter, whose impulse response is calculated by the first two moments of the original signal and a scale function of an orthonomal wavelet, we adaptively renormalized a signal. The renomalized signal is then decomposed by using the algorithm of the conventional DWT. The final wavelet transform coefficients, called adaptive wavelet invariant moments (AWIM), are proved to be both translation- and scale-invariant. Furthermore, as an application, we define a new textural feature in the framework of our adaptive wavelet decomposition, show its stability to shift and scaling, and demonstrate its efficiency for the task of scale-invariant texture identification. Index Terms—Scale-invariance, texture identification, translation-invariance, wavelet transform.

I. INTRODUCTION

I

T IS well known that the conventional discrete wavelet transform (DWT) of a digital signal is sensitive to the location of the signal, and the energy distribution of wavelet coefficients of two signals may be quite different even if the two signals just differ by a time (or space) shift. This drawback can bring serious problems in using the wavelet multiresolution representation for such tasks as pattern discrimination or recognition. To deal with this shift-invariant problem of DWT, some methods [2], [7], [8], [11], [12] have been proposed. Liang and Parks [2], [3] calculated the wavelet transforms for all circular shifts and selected the “best” one that minimized a cost function. The wavelet coefficients, together with the shift, give the original signal a translation-invariant representation. Cohen et al. [6]–[8] also proposed a similar algorithm to achieve shift-invariant wavelet packet decomposition. The notion of translation invariance (or shift invariance) in literatures mostly involves a procedure of finding a best set of DWT coefficients among all time (or space) shifts to represent the signal. However, what is a good representation of a signal? This is an aim-dependent question. For the applications such as data compression or coding, a good representation of a signal Manuscript received June 10, 1998; revised June 2, 2000. This work was supported in part by the National Natural Science Foundation of China under Grant 69875005 and Doctoral Program Foundation of the State Education Commission of China. H. Xiong and T. Zhang are with the State Key Laboratory for Image Processing and Intelligence Control, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China (e-mail: [email protected]; [email protected]). Y. S. Moon is with the Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin NT, Hong Kong (e-mail: ysmoon@ cse.cuhk.edu.hk). Publisher Item Identifier S 1057-7149(00)09396-9.

should be “compact” so as to reduce the degree of the correlation and redundancy of the data, and energy distribution of the representation should be concentrated (in the literature involving “best basis” representation, entropy is used to measure the concentration). The conventional discrete wavelet or wavelet package transform provides an ideal framework for data compression, because the redundancy and correlation of the wavelet coefficients are very small. Moreover, by using the entropy criterion [4], we can adaptively decompose a signal in a tree structure so as to minimize the entropy of the representation. But, when we face a problem such as pattern discrimination or recognition, it is very desirable that a good representation of a signal be insensitive to shift and scaling. In this case, it is not important whether a representation adopts a “compact” format or not, and this is why over-sampled wavelet transforms [11] or even continuous wavelet transforms are used frequently. In this paper, from a viewpoint of functional analysis, we propose a new way to deal with the translation- and scale-invariant problem of discrete wavelet transform (DWT). Firstly, we adaptively renormalize the original signal. This procedure can be accomplished by using a signal-dependent filter whose impulse response is adaptively calculated by the first two moments of the original signal and a scale function of an orthonormal wavelet. Then, the renomalized signal is decomposed using the conventional DWT, and the final wavelet coefficients, called adaptive wavelet invariant moments (AWIM), are proved to be both translation- and scale-invariant. The adaptive wavelet decomposition we propose represents a signal in a multi-scale format, and it may be not the “best” according to the entropy criterion, but it is translation- and scale-invariant, therefore, it’s more suited for recognition purpose. As an application, we apply our translation- and scale-invariant wavelet decomposition to the task of texture identification. We define a new texture feature in the framework of our adaptive wavelet decomposition. This texture feature, consists of the relative energy values of AWIM at each scale, is invariant with respect to shift, scaling and gray scale transforms, and, as experiments show, very effective for the task of scale-invariant texture discrimination. This paper is organized as follows. In Section II, we prove that AWIM of a nonnegative signal is both translation- and scale-invariant. In Section III, an efficient algorithm to approximate AWIM is proposed. Section IV presents some experiment results for two-dimensional (2-D) digital signals (images) to verify the invariance of AWIM. In Section V, we define a new multiresolution texture feature, illustrate its stability to shift and scaling, and demonstrate its efficiency for the task of scale-invariant texture discrimination.

1057–7149/00$10.00 © 2000 IEEE

XIONG et al.: TRANSLATION- AND SCALE-INVARIANT ADAPTIVE WAVELET TRANSFORM

II. WAVELET INVARIANT MOMENTS First of all, in this paper, by translation- and scale-invariance, , the transform coefficients of we mean that, for a signal are the same as the transform coefficients of , and is an arbitrary real number. where From the viewpoint of functional analysis, the wavelet are the projections transform coefficients of a signal onto the multiresolution subspaces of the function and , where , [1], is a orthonormal is the corresponding scale function. wavelet function, and can also be viewed as projections The moments of a signal onto the polynomial spaces , where of function (naturally, the basis of and are orthonormal, but the basis of are not). So, in a sense, we can call the wavelet coefficients wavelet moments. Like the general moments, wavelet moments lack translation and scale invariance. Nevertheless, we can construct a translation-invariant and scale-invariant wavelet moments by imitating the procedure of constructing invariant moments from the general moments. is a multiresoluFollowing the notation in [1], such that tion analysis (MRA) of

We assume a finite energy signal satisfying and [because the digital signal we consider is of finite length, these conditions can be easily satisfied. For is not positive, we can find a real number example, if such that ]. The following are some notations that we will use

2101

where

Theorem: If exist,

, (

, and ,

), then

Proof: Because , , according to the theory of probability, there exists a random vari, and , able , whose density function is , where and are the mathematical expectation and variance of respectively. Hence

(2.1) , let . We can easily verify For that the density function of random variable is , i.e. , and , . Hence

Because of the orthonomality of and , we can easily prove the following proposition. are the orthonormal bases of Proposition: , are the orthonormal bases of , and furthermore, According to the proposition above, as

. can be decomposed

(2.2) , therefore . From (2.1) and (2.2), combining , we have . Similarly, we can . prove The above theorem tells us that, for a nonnegative signal , the wavelet transform coefficients of the “renormalized” signal (we omit the subindex here) are translation-invariant and scale-invariant. We call these coefficients adaptive wavelet Since

,

2102

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

invariant moments (AWIM). The main problem is how to efficiently calculate the AWIM for a given digital signal.

To compute , we can view the above formula as a convolution of a mask

III. EFFICIENT ALGORITHM FOR COMPUTING AWIM is the scale function of the Daubechies orthonomal If which has the minimal support,. The support of wavelet is [0, 3]. Let us assume a signal belongs to the wavelet is the discrete-time signal of subspace , and . Walter [9] showed that an interpolant can be constructed such that , where in . Therefore, , , where can be derived from and and (in actual computation, the convolution of are regarded nonzeros, for example, only finite numbers of ). Consequently, we have

with another mask

where and can be calculated by using the “cascade algorithm” [1]. We call the sequence

“mother mask” of the renormalizing procedure. Subsequently, the projection of the renormalized signal onto the subspace can be computed as

and

(3.1) For a given integer , we first find all integers, , which satisfy the inequality Let project

be an integer satisfying , we will and onto the subspace Then, from the “mother mask,” we select , where ( ), to construct a signal-de, where pendent filter whose impulse response is

Let . Because the and are [0, 3] and support of respectively, the support of is . Taking as a unit, we divide the interval into intervals

where of . When

,

, and

indicates the integer part

is large enough, the value in every interval approximates a constant, Therefore, approximately, we have a formula

Finally, according to (3.1), we can compute , the onto . projections of the renormalized signal Now, let us summarize the algorithm for calculating AWIM in Table I. We can see that the computational complexity of the first two ( is the steps of the algorithm is in the order of length of the signal), and the third and fourth steps are in the . Therefore, the computational complexity of the order of . whole algorithm is in the order of This algorithm can be easily extended to the case of 2-D . First, compute the first order moments digital signal and the second order central moments . Then, for every row of the 2-D digital signal, implement steps 1 3 of the algorithm described in Table I, substituting the parameters by , and sequentially, for every column, also implement the 1 3 steps of the algorithm in Table I, substituting by . Finally, following the conthe parameters ventional algorithm of 2-D DWT, we obtain all the wavelet inat each scale. variant moments of the original signal IV. EXPERIMENTS FOR VERIFING THE INVARIANCE OF AWIM Using a linear interpolation algorithm to change the resolupixels), we first obtain some Lenna tion of big images ( and boat images in different sizes, and embedded each of them

XIONG et al.: TRANSLATION- AND SCALE-INVARIANT ADAPTIVE WAVELET TRANSFORM

2103

TABLE I STEPS OF THE ALGORITHM FOR CALCULATING AWIM

Fig. 2.

2

2

Fig. 1. (a) Original Lenna image (205 205 8 ppb) embedded in a black frame (256 256), (b) renormalizied Lenna image, (c) AWIM image, and (d) binary AWIM image.

2

in a black frame image in size of pixels. Then, we calculated the corresponding AWIM at each scale according to the algorithm described in Table I, and made comparisons of these AWIM at the same scale. In the procedure of renormalizing an original image, we , moved the coordinate origin of the coefficients , to the center of the original image. Fig. 1 which means shows a renormalizied Lenna image [Fig. 1(b)], its AWIM image [Fig. 1(c)], and the binary AWIM image [Fig. 1(d)] (threshold is set to be 7). Fig. 2 shows the arrangement of scales. Fig. 3 presents two Lenna images which are in same size but locate at different positions, and their binary AWIM images (threshold is 7). Table II shows the average gray value error per pixel (GEPP) of the two

Arrangement of scales.

2

2

Fig. 3. Two Lenna images which in same size ((154 154 8 ppb) but locate at different positions. (c) and (d) The binary AWIM image of (a) and (b) (threshold is 7). TABLE II AVERAGE GRAY VALUE ERROR PER PIXEL IMAGES IN FIG. 3

OF THE

TWO AWIM

AWIM images [Fig. 3(b)] at each scale. The GEPP at scale is calculated by

2104

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

(a)

(b)

(c) Fig. 4. (a) and (b) Four boat images and four Lenna images in different sizes (the sizes of them are 144 and 218 218 8 ppb, respectively). (c) Binary AWIM images of boat images (threshold is 7).

2

2

where and denote the corresponding AWIM of the two is the size of the subimage of AWIM images at scale , and the AWIM image at scale 0, whose center locates at the center of the AWIM subimage of scale 0. As we see, the errors of the two AWIM images at different scales are zero, and this demonstrates the translation-invariance of AWIM. Fig. 4 shows four boat images and four Lenna images in different sizes [Fig. 4(a) and (b)] (the sizes of them are pixels, pixels, pixels, and pixels, respectively), and their binary AWIM images [Fig. 4(c)], with threshold set to 7. Table III presents the GEPP and the relative energy errors (REE) of AWIM at every scale for Lenna images in Fig. 4. Table IV shows the GEPP and REE of AWIM at every

2 144 2 8 ppb, 166 2 1662 bppb, 192 2 192 2 8 ppb,

scale for boat images in Fig. 4. The REE at scale is calculated by the formula

where and denote the corresponding AWIM of the two and denote the energy AWIM images at scale , and of the two AWIN images at scale . It may been seen that, as a whole, the errors (GEPP and REE) at every scale are minor or even negligible at some scales relative to the total energy of AWIM. Moreover, the less the size difference of the two image, the less will be the errors of AWIM.

XIONG et al.: TRANSLATION- AND SCALE-INVARIANT ADAPTIVE WAVELET TRANSFORM

2105

TABLE III EXPERIMENT RESULTS TO SHOW THE AWIM ERRORS BETWEEN LENNA IMAGES

TABLE IV EXPERIMENT RESULTS TO SHOW THE AWIM ERRORS BETWEEN BOAT IMAGES

Fig. 5. Six versions of a texture (row 1: T1, T2, T3; row 2: T4, T5, T6)

V. AN APPLICATION FOR SCALE-INVARIANT TEXTURE IDENTIFICATION In the methods [14], [15] of texture analysis based on wavelet or wavelet package transforms, some feature vectors, which consist of a set of energy or entropy values of wavelet coefficients, are constructed for texture classification, segmentation and recognition. However, this kind of features are not stable with respect to shift and scaling because the conventional discrete wavelet or wavelet package transform

lacks the translation-invariant property, and this may cause some problems. In this section, as an application, we apply our translation- and scale-invariant wavelet decomposition to the task of scale-invariant texture identification. We assume that the relative energy distribution of AWIM in each scale provides unique information for texture discrimination. Therefore, to characterize a texture pattern, we define a feature vector, which consists of the relative energy values of AWIM in each scale, and experiments show that this texture feature is very effective for the task of scale-invariant texture discrimination.

2106

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

(a)

(b)

(c) Fig. 6. (a) Values of RES in Fig. 5, which were calculated in the framework of conventional wavelet translation. (b) Values of RES in Fig. 5 calculated in the framework of the shift-invariant wavelet decomposition [2]. (c) Values of RES in Fig. 5, which were calculated in the framework of our adaptive wavelet decomposition.

A. An Multiresolution Texture Signature As shown in Fig. 2, the total energy of AWIM is

example, if the size of is number of AWIM is equal to

and

(Fig. 2), the total

Therefore

A relative energy signature (RES for short) of a texture is defined as

where , a compensatory factor to balance the size difference of different subbands, is the ratio of the total number of AWIM ( ) (see Fig. 2). For to the number of AWIM in

Because the AWIMs are translation- and scale-invariant, the feature vector we defined is stable to shift and scaling. In addition, it is also invariant to the grayscale transform as evident from the definition of the signature. Fig. 5 shows six versions of a texture, which were obtained by changing the resolution of a texture image along the vertical and horizontal directions according to different scale factors. Fig. 6(c) shows values of the relative energy signature (RES). Fig. 6(a) and (b) also give the values of RES for the images in Fig. 5, however, at this time, these values were calculated in the framework of the conventional DWT [Fig. 6(a)] and the shiftinvariant wavelet decomposition [2] [Fig. 6(b)] . We can see that RES in the framework of our adaptive wavelet decomposition is

XIONG et al.: TRANSLATION- AND SCALE-INVARIANT ADAPTIVE WAVELET TRANSFORM

2107

TABLE V EXPERIMENT RESULTS FOR SCALE-INVARIANT TEXTURE DISCRIMINATION

Fig. 7. Twenty classes of texture pattern (from ! to ! ).

much more stable than the RES calculated in the framework of the conventional DWT and the shift-invariant DWT [2]. B. Scale-Invariant Texture Identification We employ a simple minimum-distance classifier to evaluate the efficiency of our multiresoluton texture signature for scale-invariant texture discrimination. Each texture pattern is represented by a prototype signature , calculated by using such that some typical samples of the pattern (5.1) in Considering the relatively small values of the energy for a given texture pattern , we define the distance of each and as

where and represent the components of if spectively. Texture belongs to a pattern among all possible patterns.

and , reis minimum

C. Experiment Result For Scale-Invariant Texture Identification We selected 20 textures as our basic classes of texture patterns to ). Each of them was stored as a (from ppb digital image as shown in Fig. 7. For each texture pattern, (pixel)] to comwe selected 25 typical samples [size in pute the representative signature by (5.1), and 100 random sam(pixel)] from which a set of testing ples [also size in samples were produced. In detail, for each random sample, six testing samples were generated through changing the resolution along the vertical and the horizontal directions in different factors. Fig. 5 shows an example. In other words, each random sample has six different versions. Therefore, for each texture pattern, we have 600 testing samples so that the total number of testing samples is 12 000. Table V presents the results of texture identification by using the texture signature computed in the framework of conven-

tional DWT, the shift-invariant DWT [2], and our adaptive wavelet transform (AWT) respectively. It may be seen that, in the framework of our adaptive multiresolution representation, the feature vector (or signature), due to its insensitivity to translation and scale transform, is much more effective for scale-invariant texture discrimination than the one obtained from the framework of conventional wavelet multiresolution representation. VI. CONCLUSION From a viewpoint of functional analysis, we present a new approach to deal with the translation- and scale-invariant problem of discrete wavelet transform. Based on the theory of interpolation in wavelet subspace, we adaptively renormalize the original signal by using an orthonormal scale function and the first two moments of the signal. This procedure can be accomplished by using an efficient algorithm. The renormalize signal is then decomposed according to conventional wavelet decomposition, and the final wavelet coefficients, called AWIM, are proved to be both translation- and scale-invariant (naturally, the AWIM are not strictly scale-invariant because the renormalizing algorithm is approximate). As an application, we define a relative energy signature for texture identification in the framework our adaptive wavelet decomposition. Experiment results show that this multiresolution signature is suitable for the task of scale-invariant texture identification due to its stability to shift and scaling. ACKNOWLEDGMENT The authors would like to thank Dr. K. Ramchandran and anonymous referees for their encouragement and helpful comments, which have significantly improved the presentation of the paper.

2108

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 12, DECEMBER 2000

REFERENCES [1] I. Daubechies, “Ten lecture on wavelet,” in CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: SIAM, 1992. [2] J. Liang and T. W. Parks, “A translation invariant wavelet representation algorithm with applications,” IEEE Trans. Signal Processing, vol. 44, pp. 225–232, Feb. 1996. , “A two-dimensional translation invariant wavelet representation [3] and its applications,” in Proc. Int. Conf. Image Processing, Austin, TX, Nov. 13–16, 1994, pp. 66–70. [4] R. R. Coifman and M. V. Wickhauser, “Entropy-based algorithms for best basis selection,” IEEE Trans. Inform. Theory, vol. 38, pp. 713–718, Mar. 1992. [5] I. Cohen, S. Raz, and D. Malah, “Shift invariant wavelet packet bases,” in Proc. 20th IEEE Int. Conf. Acoustics, Speech, Signal Processing, Detroit, MI, May 8–12, 1995, pp. 1081–1084. , “Shift-invariant adaptive trigonometric decomposition,” in Proc. [6] 4th Eur. Conf. Speech, Communication, Technology, Madrid, Spain, Sept. 18–21, 1995, pp. 247–250. , “Orthonormal shift-invariant adaptive local trigonometric decom[7] position,” Signal Process., vol. 57, no. 1, 1997. , “Orthonormal shift-invariant wavelet packet decomposition and [8] representation,” Signal Process., vol. 57, no. 3, 1997. [9] G. Walter, “A sampling for wavelet subspaces,” IEEE Trans. Inform. Theory, vol. 38, Mar. 1992. [10] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet decomposition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674–693, July 1989. [11] S. Mallat, “Zero-crossings of a wavelet transform,” IEEE Trans. Inform. Theory, vol. 37, pp. 1019–1033, July 1991. [12] S. G. Mallat and S. Zhang, “Characterization of signals from multiscale edges,” IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 710–732, July 1992. [13] A. Lain and J. Fan, “Texture classification by wavelet packet signatures,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, no. 11, pp. 1186–1190, 1993. [14] O. Pichler, A. Teuner, and B. J. Hosticka, “A comparison of texture feature extraction using adaptive Gabor filtering, pyramidal and tree structured wavelet transforms,” Pattern Recognit., vol. 29, no. 5, pp. 733–742, 1996. [15] M. Unser, “Texture classification and segmentation using wavelet frames,” IEEE Trans. Image Processing, vol. 4, pp. 1549–1560, Nov. 1995.

Huilin Xiong was born in Hubei, China, in 1964. He received the B.Sc. and M.Sc. degrees in applied mathematics from Wuhan University, Wuhan, China, in 1985 and 1988, respectively, and the Ph.D. degree in pattern recognition and intelligent control from the Institute of Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, in 1999. He is now a Researcher with the MIP Research Group, Department of Computer Science and Engineering, Chinese University of Hong Kong. His research interests include wavelet based image multiresolution analysis, contentbased image database indexing and retrieval, image data compression, and parallel image processing algorithms.

Tianxu Zhang was born in Chongqing, China, on May 18, 1947. He received the B.Sc. degree from the University of Science and Technology of China, the M.Sc. degree from Huazhong University of Science and Technology, China, and the Ph.D. degree in biomedical engineering from Zhejiang University, Wuhan, China. He has been a Professor and Director with the Institute of Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology since 1993. His research interests include image processing, computer vision, pattern recognition, parallel processing, and medical imaging. He has published more than 100 academic and technical papers.

Y. S. Moon graduated from the University of Manitoba, Winnipeg, MB, Canada, in 1973, and received the M.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada. Since 1978, he has been with the Chinese University of Hong Kong, where he is now a Senior Lecturer of computer science. His research interests include e-commerce, smart card systems, multimedia networks, and Chinese computing. He has published more than 80 scientific papers.