Lossless Compression of Grayscale Images via Context Tree Weighting Nicklas Ekstrand
Dept. of Information Theory Lund University P.O. Box 118 S-221 00 Lund, Sweden email:
[email protected] REFERENCE: N. Ekstrand, \Lossless Compression of Grayscale Images via Context Tree Weighting", In Proceedings Data Compression Conference (DCC), IEEE Computer Society Press, pp. 132-139, April 1996. Abstract
In this article we report on a study of how to use the context tree weighting (CTW) algorithm for lossless image compression. This algorithm has been shown to perform optimally, in terms of redundancy, for a wide class of data sources. Our study shows that this algorithm can successfully be applied to image compression even in its basic form. We also report on possible modi cations of the basic CTW algorithm to let it work more eciently for image data. Our research is momentarily focussed on the compression of medical gray scale images.
1 Introduction In various situations one should be able to recover the original image from its compressed representation. In those situations one relies on lossless compression techniques. It is clear from information theoretical arguments that the compression performance of these techniques is noticeable worse than that of techniques in which we allow a certain distortion in the restored picture. Furthermore, the subject of lossless image coding is less well studied than that of the lossy compression methods. Stimulated by the promising results of the CTW algorithm [WST], we have studied ways of applying it to lossless image compression. The central idea upon which this algorithm 1
is based is that we are not interested in a good estimation of the source model per se, but rather in a coding probability distribution that results in short codewords, and that such coding distribution can be obtained through model weighting.
2 Technical Description Our approach to lossless image compression consists of basically ve parts, linear prediction, Gray encoding, serialization, estimation and encoding. In the rst step we subtract a linear combination of the neighbouring pixel values from each pixel value exploiting the fact that usually neighbouring pixels are highly correlated. The dierence between the original and predicted value is then Gray encoded. These two rst steps of recoding lead us to a more simple estimation process than we would have for the original data. In the third step we have to transform the image from two to one dimensional data (serialization). This is usually done by scanning the picture line by line. After the serialization the data is analyzed (estimated) by a suitable statistical method. Our approach is to use the context tree weighting algorithm. Finally, the compression (encoding) of the picture can be done by using, preferably, an arithmetic encoder, [HoVi], [Jones].
2.1 The Context-Tree Weighting Method
In our estimation process we use the context tree weighting method presented in [WST]. Figure 1 shows a traditional context tree together with the estimated and weighted probabilities, Pes and Pws for each node s. The weighted root probability Pw is used as input to an arithmetic coder. For calculating the estimated probability, Pe, (a00 ,b00 ); Pe00 ; Pw00 (a01 ,b01 ); Pe01 ; Pw01 (a10 ,b10 ); Pe10 ; Pw10 (a11 ,b11 ); Pe11 ; Pw11 (a0 ,b0 ); Pe0 ; Pw0
(a1 ,b1 ); Pe1 ; Pw1 1
(a ,b ); Pe ; Pw 0
Figure 1: The Binary Weighted Context Tree. Each node consists of the counters for 0s and 1s, the estimated probability and the weighted probability. The root is denoted with .
[WST] use the Krichevski{Tro mov (KT) estimator, which is de ned as:
De nition 2.1 (Krichevski{Tro mov estimator) For a sequence with a zeroes
and b ones, the estimated probability is:
8 < Pe (a; b) = :
a?1=2 a+b Pe b?1=2 a+b Pe
(a ? 1; b) if a 1 (a; b ? 1) if b 1 1 if a = 0 and b = 0.
(1)
2
It is well-known that this estimator also works well when counts are very skew, e.g. 0. The weighted probabilities in the context tree are de ned as follows: De nition 2.2 In a weighted context tree the estimated and weighted probabilities for a node with context s are recursively (from leaf to root) calculated as:
a >> 1 and b
Pes = Pe (as ; bs );
(2)
where as and bs are the counters for zeroes and ones. Pws
=
if node s has sons if node s is a leaf.
Pes +Pws0 Pws1 2 Pes
(3)
2
2.2 Linear Prediction
As the rst step in our encoder we make a transform, where we replace the original pixel values by the dierence between the pixel value and a linear prediction based on the neighbouring pixels, see Figure 2. When we scan the image data as in the gure
b
c
a
(x,y)
d
Figure 2: Scanning when making the dirential picture is done line by line. The linear predictor may use the values from the dashed pixels.
it is possible to make the inverse operation for the decoder. We use a linear predictor which is similar to the one used in lossless JPEG:
De nition 2.3 (Linear predictor) For a position (x; y) with surrounding pixels a,b,c and d as in Figure 2, the linear prediction is calculated as: L(x;y) =
!a Xa + !b Xb + !cXc + !d Xd ; !a + !b + !c + !d
(4)
where Xp denotes the original data in position p, and !i are positive integer weights. The transformed value for a position, Y(x;y) , is calculated as: Y(x;y) = X(x;y) ? bL(x;y) + 1=2c:
(5)
2 One way of determining the weights, !i, is by minimizing: 2(x?1;y) + 2(x?1;y?1) + 2(x;y?1) + 2(x+1;y?1) ;
(6)
where p = Xp ? Lp. This will unfortunately not always generate the best result for the estimator. A simple choice of constant weights for all pixels will sometimes generate better result. After the linear prediction we have for most images probability distributions as in Figure 3. These results are similar to what is obtained by lossless JPEG. Empirically we believe that for our medical images it is a double exponential distribution. What we gain by using this transform is a reduction in the model complexity in the estimator and, hence, a saving in code word length. However, the disadvantage is, that the remaining information will be harder to extract and thus making the compression process more involved. 200000.0
200000.0
150000.0
150000.0
100000.0
100000.0
50000.0
50000.0
0.0 −20.0
−10.0
0.0
10.0
20.0
0.0 −20.0
−10.0
0.0
10.0
20.0
Figure 3: Two examples of the distribution after linear prediction. The left distribution is from a medical image and the right is from the famous LENA picture.
2.3 The Context-Tree Weighting Method Applied to Images
When applying the context tree weighting algorithm to image data we have to deal with some problems. First of all, from [WST] it is clear that applying the CTW algorithm directly on 8-bit pixel data would lead to a high model cost which results in a large redundancy (poor compression performance). For that reason we apply the CTW algorithm on the binary bitplane image data obtained by the Gray encoding of
the 8 bit pixel data. Next we note that after the linear prediction the correlation to the neighbouring pixels has been reduced. We therefor would need a very deep tree to retrieve the useful information. On the other hand, for every extra node in the tree we have an extra parameter to estimate, resulting in higher model and storage costs. Presently we ignored this aspect at the cost of a possible worst performance since the precise trade-o mechanisms for optimizations are yet not well understood. Moreover, in image data we observe that the least signi cant bitplanes are less compressible than the more signi cant. This is because noise and information in the image will be located in these bitplanes after linear prediction and Gray encoding. For these bitplanes we use the Laplace estimator instead of the KT estimator in the context tree weighting algorithm: De nition 2.4 (Laplace estimator) For a sequence of a zeroes and b ones, we de ne the estimated probability as: 8 a < a+bb+1 Pe(a ? 1; b) if a 1 (7) Pe (a; b) = a+b+1 Pe (a; b ? 1) if b 1 : 1 if a = 0 and b = 0.
2
The Laplace estimator performs better than KT in the probability range approximately 0.12-0.88. Finally, an important decision that has to be made is how the context of a pixel (bit) in a given bitplane is de ned. It is closely related to the process of scanning or serialization of the bitplane, the subject of the next subsection.
2.4 Serialization
We have two ways of scanning the image in the serialization part. One alternative is to make the serialization by scanning the image line by line as in Figure 2, and bitplane by bitplane. In this way we get a context from the pixels in the neighbourhood. This is shown in Figure 4. Another alternative way of scanning is by scanning the image line by line, but taking all bitplanes from all pixels. When we select this alternative it would also be possible to use a multialphabet estimator. As context to the estimator we select only those bits from neighbourhood pixels that will result in better compression. This results in that for photographic images like LENA, it is almost pointless to use the least signi cant bit in the context, since of its noisy characteristics. In our experimental encoder we make a selection of 5 types of contexts. This information must be sent to the decoder, and we refer to it as context selection.
2.5 Our Experimental Encoder
Our experimental encoder is presented in Figure 5. The decoder does the inverse operations in reversed order. The estimator uses one context tree per bitplane since
X Least signi cant bitplane
Bitplane k
Most signi cant bitplane
Figure 4: When at position X on bitplane k, bits from all previous bitplanes and all previous bits on current bitplane may be used as context. These bits are dashed in the gure. the characteristics of the bitplanes are very dierent. The serializer determines how to retrieve the context for each bitplane. This information must be transmitted to the decoder (context selection). In our experimental encoder we limited the size of the context trees by having a maximum depth and a maximum number of nodes constraint. This will in some cases mean that will have less model cost and in some cases higher redundancy compared to the maximum trees.
3 Results Table 1 summarizes our results and we can make a comparison between lossless JPEG, JBIG, CALIC and the algorithm described in this paper. The two rst are for comparison and CALIC1 (Context-based, Adaptive, Lossless Image Coder) is mentioned since it performs well on our test image set2 . Picture Backen Buk Skalle Thorax Lena
JPEG 0.265 0.329 0.252 0.390 0.631
JBIG CALIC Our results 0.219 0.188 0.178 0.300 0.249 0.235 0.184 0.139 0.130 0.361 0.288 0.266 0.613 0.561 0.531
Table 1: Compression ratio for various images. The ratio is the compressed size divided by the original size. All pictures except Lena are images of medical nature.
All the pictures are 8 bit gray scale images with size 512x512. The rst pictures are medical images and the last one is the famous Lena picture. As can be seen, our 1 2
Presented CALIC results are from [CALIC]. Images available for anonymous ftp from: ftp.dit.lth.se/pub/nicklas.
Context selection
Linear Prediction Transformer
Gray Encoder
Serializer
Code
Arithm Encoder Probability
Context
Weighted Context Tree
Figure 5: Block diagram of our experimental encoder. algorithm performs better than lossless JPEG, JBIG and as well as one of the better algorithms available today. Presently our algorithm requires signi cant memory and computational resources. However, since we applied essentially the basic CTW algorithm, we can decrease both memory and computational complexity through specializing it to image data.
4 Discussion We have presented a method for using the CTW algorithm for lossless image compression. The results are comparable to one of the better algorithms available today. Our next step is to reduce the model cost. Our linear predictor is therefor also subject to improvement or replacement. Furthermore, as we used the weighting parameters of the basic CTW algorithm we expect better compression performance when we adopt the weighting to the image data characteristics.
References [CALIC] [HoVi]
Nasir Memon, [private communication], June 1995. P. Howard and J. Vitter, \ Arithmetic Coding for Data Compression", Proceedings IEEE, vol. 82, no. 6, June 1995, pp. 857-865
Figure 6: Our medical images. Original size 512x512 pixels with 256 grayscale levels. [JBIG] [JPEG] [Jones] [WST]
CCITT Draft recommendation T.82, ISO/IEC Draft International Standard 11544, Coded Representation of Picture and Audio Information { Progressive Bi-level Image Compression, April 1992. W. Pennebaker & J. Mitchell, JPEG, still image data compression standard, Van Nostrand Reinhold, 1993. C. Jones, \An ecient coding system for long source sequences", IEEE Trans. Inf. Theory, vol 27, no 3, May 1981, pp 280-291. F. Willems, Y. Shtarkov, T. Tjalkens, "The Context-Tree Weighting Method: Basic Properties", IEEE Trans. Inf. Theory, vol 41, no 3, May 1995, pp 653-664.