FRACTAL CODING VERSUS CLASSIFIED TRANSFORM CODING Jaroslaw Domaszewicz, Slawomir Kuklinski, and Vinay A. Vaishampayan
Institute of Telecommunications, Warsaw University of Technology Nowowiejska 15/19, 00-665 Warsaw, Poland Department of Electrical Engineering, Texas A&M University College Station, Texas, 77843, USA
[email protected],
[email protected],
[email protected] ABSTRACT
Fractal coding and classi ed transform coding exhibit strong structural similarity,but they use dierent types of redundancy in image data: piecewise self-similarity in one case and local correlation in the other. Comparing performance of the two techniques leads to a quantitative, compression-oriented de nition of piecewise self-similarity. The amount of piecewise self-similarity is evaluated for sample images. For a moderate number of domain blocks, classi ed transform coding consistently outperforms fractal coding, and the images are not found to be piecewise self-similar. As the number of domain blocks increases, the performance gap becomes negligible.
1. INTRODUCTION Fractal coding [1] is a recent approach to image compression. Its basic premise is that images exhibit a type of redundancy called piecewise self-similarity. In a piecewise self-similar image, a block of waveform data can be related to another one so that the two resemble each other. Compression is achieved if one of them (a range block) is encoded by providing a reference to the other (a domain block). The encoding of a range consists of choosing the most similar domain and approximating the range by a linear combination of the domain and some prede ned vectors. Since a range block and the corresponding domain block can be located anywhere in the image, piecewise self-similarity is dierent from local redundancy exploited by more traditional image compression techniques. Classi ed transform coding1 (CTC), an extension of well-known transform coding, is another recent ap1 In [2] the CTC technique is called \adaptive transform coding using a mixture of principal components." Since the term \adaptive transform coding" has been used to denote transform coding with adaptive bit allocation, we use the name introduced above.
proach to image compression. During the design of a CTC coder, a collection of classes is speci ed, and a class-speci c linear transform is associated with each class. An image block to be encoded is classi ed and transformed by the transform associated with the class to which it belongs. A rationale for CTC, as opposed to regular transform coding, is that the assumption of stationarity, implied by using a single set of basis blocks (e.g., KLT or DCT), is not true for images. A conceptual formulation of the CTC approach can be found in [3]. More recently, the CTC technique, design algorithms, and performance results have been presented in [4, 5, 2]. Results in [4, 2] show that a CTC coder outperforms a traditional KLT transform coder for sample images. In [5] performance results for a memoryless Gaussian source are given. Fractal coding and classi ed transform coding exhibit strong structural similarity. In both approaches a block from the original image is encoded by selecting a basis and then using the basis vectors to approximate the block; encoded data consist of the basis index and quantized coecients of the linear combination. The crucial dierence lies in the nature of the bases: the key elements of the bases used by a fractal coder are image-dependent domain blocks. In CTC, on the other hand, the bases are image-independent (but carefully designed). Dierent collections of bases re ect dierent redundancies used by the two approaches: local correlation in one case and piecewise self-similarity in the other. Since in both cases the quality of the approximation produced by the encoder is the primary factor aecting the quality of the reconstructed image (the other one has to do with quantization), comparing the approximations can be a means to assess the relative usefulness of the two redundancies for compression. Based on the results of the comparisons, a compression-oriented evaluation of the amount of self-similarity in images can be performed.
2. FRACTAL CODING In a simple fractal transform algorithm used in the experiments presented below, the original image is partitioned into non-overlapping range blocks of size N = B B . Also, a collection of domain blocks, fd1, d2 , . . ., dK g, is extracted from the original. The number of domain blocks, K , and the way they are extracted from the image are a designer's choice. Originally the domain blocks are of size 2B 2B , so they are reduced to the size of a range block by averaging four adjacent pixels. They are then orthogonalized with respect to the xed subspace, which is spanned by an orthonormal basis fb1,b2,. . .,bM 1g. Next, the algorithm nds the best domain block for every range block. Let S (x1 ; x2 ; . . . ; xn) denote the subspace of RN spanned by x1 , x2 , . . ., xn . Let Sn = S (b1 ; b2; . . . ; bM 1; O(dn)), where O() is the operation of size reduction and orthogonalization. Let PS () be the operator projecting its argument onto the subspace S . Then dk is the best domain block for the range block x if Sk is the subspace closest to x. In other words
kx
PSk
(x)k kx
PSn
(x)k;
(1)
for n = 1; 2; . . . ; K . The index k and the quantized coecients of the representation of PS (x) with respect to the basis fb1,b2,. . .,bM 1,O(dk )) are sent to the decoder. The MSE distortion incurred by fractal coding can be written as k
D
= Daf + Dqf + Dbf ;
(2)
where Daf , which we call the approximation distortion, is the average of kx PS (x)k2 over all the range blocks x in the image (this term is related to the collage error), Dqf is due to the quantization of the coecients, and f Db is due to a dierence between the original domain blocks and the reconstructed ones (the subspaces Sn at the decoder are somewhat dierent than those at the encoder). These terms are not independent, e.g., quality of the quantization can aect Dbf . Experience shows that the last term is fairly small. k
3. CLASSIFIED TRANSFORM CODING In the following description we use a notation that emphasizes similarities with fractal coding. A CTC algorithm processes non-overlapping blocks of size N = B B . At the design stage K subspaces, S1 , S2 ,..., SK , of RN are selected. Each subspace is of dimension M < N and is speci ed by an orthonormal basis. The
Given: n 0, repeat n
f (0) 1
(0)
K ,M , S1 ; S2 ; D(0) n
. . . ; SK(0) g,hx1; x2 ; . . .i,
+1
/* classify */ for k 1; 2; . . . ; K
n
( )
Ck
D(n)
P fP: i
k
k
= arg minj kxi
i2Ck(n) kxi
P
Sk(n
Sj(n 1) (xi )kg 2 1) (xi )k P
/* recalculate the bases */ for k 1; 2; . . . ; K
n
(KLTM (Ck(n) )) 1) until (( D(n) )=D(n) < ) (n) (n) (n) return fS1 ; S2 ; . . . ; SK g ( )
Sk D(n
S
Figure 1: A CTC design algorithm block x belongs to the k th class if Sk is the subspace closest to x. Transform coecients are calculated using the basis of Sk (M < N means that zonal sampling is applied). The class index and the quantized transform coecients are sent to the decoder. The MSE distortion incurred by CTC can be written as D = Dac + Dqc ; (3) where the two terms originate the same way as in fractal coding. Since the basis vectors used by the encoder and the decoder are identical, there is no third term in the CTC case.
3.1. A CTC Design Algorithm The collection of subspaces (or, more precisely, bases) used by a CTC coder is designed to match the distribution of image blocks. Since a coder restricted to a single class works just like a regular transform coder with zonal sampling, the optimum subspace should be spanned by the rst M basis vectors of the KLT corresponding to the blocks in the class. In [2] design algorithms, which integrate classi cation and KLT basis vector extraction, are based on neural networks. An alternative approach using the eigenvectors of an autocorrelation matrix estimate is also possible, [1], Chapter 9 and [5]. Such an algorithm is presented in Fig. 1 (the operation KLTM () returns the rst M KLT basis vectors). It closely resembles the LBG algorithm for vector quantizer design. The initial subspaces can be obtained by running the algorithm for half the number of classes and splitting each subspace into two.
3.2. Comparing CTC to Fractal Coding
The common notation exposed similarity of the two techniques. Here we only notice that in both of them encoding is computationally intensive encoding, while decoding is quite simple. The basic dierence has to do with the above mentioned implied redundancies and ways to choose the bases. Further, in fractal coding the bases are constrained in that they contain only one unique vector (a domain block); all the remaining vectors, i.e., the xed subspace basis vectors, are common to all of them. Using two or more domains would make the approach computationally unfeasible. A CTC, on the other hand, can be designed so that the entire basis is class speci c. Moreover, even though the CTC design is carried out for , the bases can afterwards be extended to the full KLT sets thus allowing very high quality encodings with low compression ratios. In fractal coding quality is limited by the approximation distortion. The decoding procedures are also dierent. A code produced by a fractal coder can be easily decoded at any resolution (creating arti cial details if needed); a CTC code does not have this property. M < N
4. DEFINING PIECEWISE SELF-SIMILARITY Note that even though the bases used by the fractal transform depend on the image being encoded, the technique cannot be called adaptive. The collection of the bases is not a result of any signal-speci c optimization procedure. Rather, one has to make the best use of what one is given. The rationale for using domain blocks is the assumed piecewise self-similarity of images. It is clear that in most images at least some range blocks can be approximated quite well by some domain blocks. The essence of piecewise self-similarity, however, as considered for the purpose of compression, is the possibility to use domain blocks to consistently obtain high quality approximations of the range blocks, so that the overall distortion is small. We propose a compression-oriented and quantitative de nition of piecewise self-similarity of a given image. The de nition is based on comparing performance results of the two coding techniques. To make the comparison fair, we choose the same size of a block , number of bases , and dimensionality . Further, we impose the same constraint on the bases used by the CTC coder as the one in eect for the fractal transform: they all contain the basis of some xed subspace. We assume that the remaining vectors in the bases for the CTC coder (the counterparts of the domain blocks) N
K
M
have been carefully designed using a training sequence drawn from many dierent images. This way the dependence of the results on the training sequence can be neglected. Intuitively, in the compression context, an image is
piecewise self-similar if image-dependent domain blocks can outperform generic, image-independent vectors in approximating the range blocks, i.e., if Daf < Dac . Hence
the amount of piecewise self-similarityfor a given image can be de ned as = 10 log( ac af ) (4) The larger the more self-similar an image is. The de nition is compression-oriented since af and ac constitute a major component of the overall distortion introduced by the respective coder. When quoting numerical values of , one should specify , , , and the xed subspace used. s
D =D
:
s
D
s
N
K
D
M
5. NUMERICAL RESULTS In order to nd the amount of piecewise self-similarity in sample images, collections of CTC bases were designed for blocks of size 8 8 and 4 4 (blocks of these sizes are most frequently used in fractal transforms). The training sequence was obtained from six 256 256 images taken from the USC image collection: \chemical," \couple," \girl," \house," \town," and \tree." The total of 23814 (96774) training sequence blocks were used for blocks of size 8 8 (4 4). The results were obtained for ve 512 512 test images. In all the experiments the xed subspace was spanned by the DC vector only ( = 2). Prior to running the CTC design algorithm presented in Section 3, the DC component was removed from all the training sequence blocks. The domain blocks used in the fractal coder were located on a lattice whose spacing was selected so as to obtain a desired number of blocks. Spatial contraction by averaging was applied. No rotations or re ections of the domain blocks were used. Full search was applied in both algorithms. Figure 2 presents the approximation distortion a , as a function of , for the \Lenna" image. Tables 1 and 2 contain the values of a for all the test images and two values of . As can be seen from these results, for the number of subspaces not exceeding 4096, CTC consistently outperforms fractal coding (except for one case, in which fractal coding is better by 0.02 dB). Hence, according to (4), the images are not self-similar for this range of . However, the performance gap becomes insigni cant as increases. An additional experiment consisted of extracting the domain blocks from a test image and using them to M
D
K
D
K
K
K
Table 2: Approximation distortion (PSNR [dB]) for test images: block size 4 4.
36
K = 1024 K = 4096 Image Fractal CTC Fractal CTC Lenna 34.56 35.21 35.76 36.05 Peppers 34.02 34.35 36.16 35.14 Baboon 25.48 25.97 26.52 26.79 Bridge 29.00 29.48 30.06 30.28 Sailboat 29.73 30.38 30.73 31.10
34
PSNR [dB]
32 30 28 26 24 4
16
64
256
1024
4096
K
Figure 2: Approximation distortion, Da , as a function of the number of subspaces K for the 512 512 Lenna image: + |8 8 CTC, |8 8 fractal, |4 4 CTC, |4 4 fractal. Table 1: Approximation distortion (PSNR [dB]) for test images; block size 8 8.
K = 1024 K = 2048 Image Fractal CTC Fractal CTC Lenna 29.07 29.46 29.46 29.64 Peppers 28.74 29.07 29.23 29.27 Baboon 21.08 21.50 21.26 21.66 Bridge 24.09 24.72 24.34 24.86 Sailboat 24.19 25.06 24.49 25.20 approximate the range blocks of the other test images (8 8 blocks, K = 1024). Only in two out of ve cases (\lenna" and \peppers"), were the domain blocks extracted from an image itself best in approximating its range blocks. For other test images, using \foreign" domain blocks incurred smaller approximation distortion. 6.
CONCLUSION
For a moderate number of domain blocks, no piecewise self-similarity is found in sample images. On the other hand, the bene t of carefully designing the basis vectors, as opposed to drawing them from the image itself, becomes quite small for a large number of sub-
spaces. We hypothesize that the behavior of fractal coding can be explained best not in terms piecewise self-similarity, but using the concept of a random \codebook of subspaces" drawn from the appropriate source (in our case|images of natural objects). The results indicate that when the size of the codebook is comparable or exceeds the number of blocks to be encoded, the advantages of a design based on MSE are not signi cant, and the random codebook performs just as well. An additional conclusion is that in fractal coding a large number of domains should be used. Otherwise, CTC is a preferable technique. 7.
REFERENCES
[1] Yuval Fisher, editor. Fractal Image Compression. Springer-Verlag, New York, 1995. [2] Robert Douglas Dony. Adaptive Transform Coding
of Images Using a Mixture of Principal Components. PhD thesis, McMaster University, Hamilton,
Ontario, Canada, 1996. [3] Manfred Tasto and Paul A. Wintz. Image coding by adaptive block quantization. IEEE Transactions on Communication Technology, COM-19(6):957{972, December 1971. [4] Robert D. Dony and Simon Haykin. Optimally integrated adaptive learning. In Proc. ICASSP-93, pages I{609 { I{612, Minneapolis, MN, Apr. 1993. [5] J. Domaszewicz and V. A. Vaishampayan. Structural limitations of self-ane and partially self-ane fractal compression. In Proc. SPIE's Visual Communications and Image Processing, Cambridge, MA, Nov. 1993.