LOSSLESS IMAGE COMPRESSION USING BURROWS WHEELER TRANSFORM (METHODS AND TECHNIQUES) Elfitrin Syahrul, Julien Dubois, Vincent Vajnovszki, Taoufik Saidani*, Mohamed Atri* Laboratoire Electronique Informatique et *Laboratoire Electronique et Image – Le2i Université de Bourgogne, Microélectronique – Lab. It06 France Université de Monastir,
[email protected] Tunisie Abstract The Burrows-Wheeler Transform (BWT) is a combinatorial algorithm originally created for text compression such as bzip2, and that has been recently applied to image compression field. This paper focuses on the impact of compression scheme based on the combinatorial transform on high-level resolution medical images. It overviews the original scheme and some improvements that have been develop in post processing of BWT in this context. The performances of these techniques are compared and discussed. Moreover, considerations on the image’s sizes and data formats are also considered.
1. Introduction The performance of Burrows–Wheeler Compression Algorithm (BWCA) has been improved since it was created [1]. Many improvements for this algorithm have been presented in the past years. Some of them treat the calculation of the Burrows–Wheeler Transform (BWT) itself. Other studies treat the entropy coding of the data stream. Finally, many publications concern the middle part of the algorithm, where the BWT output symbols are prepared for the following entropy coding. This paper reveals different techniques of BWCA in image compression.
representative of a GST stage is the Move-To-Front Transform (MTF). Burrows and Wheeler introduced it in their original publication [2]. It was the first algorithm used as a GST stage in a BWCA original scheme. The MTF stage is a List Update Algorithm (LUA), which replaces the input symbols with corresponding ranking values. Just like the BWT stage, the LUA stage does not alter the number of symbols.
Figure 1 Typical scheme of the Burrows-Wheeler Compression Algorithm. The third stage typically shrinks the number of symbols by applying a Run Length Encoding scheme (RLE). Different algorithms have been presented for this purpose, with the Zero Run Transform (RLE-0) from Wheeler found to be an efficient one. The last stage is the Entropy Coding (EC) stage, which compresses the symbols by using an adapted model. We focus on lossless compression due to the aimed applications in medical field, nevertheless this scheme can be considered for lossless image compression as well as for lossy image compression. In lossy configuration a preprocessing based on DCT is added to compression [3].
2. Original scheme
3. Method evolution 3.1 Improvements of BWT
A typical scheme of the Burrows-Wheeler Compression Algorithm (BWCA) has been introduced by Abel [1]. It consists of four stages as shown in Figure 1. Each stage is a transformation of the input data and reaches the output data to the next stage. The stages are processed sequentially from left to right. The first stage is the BWT itself. It sorts the data in a way that symbols with a similar context are grouped closely together and keeps constant the number of symbols during the transformation. The second stage is called in this article Global Structure Transform (GST), which transforms the local context of the symbols to a global context. A typical
Several authors have presented improvements to the original algorithm. Andersson and Nilsson have published in [4] about Radix Sort algorithm, which can be used as a first sorting step during the BWT. In his final BWT research report, Fenwick described some BWT sort improvements including sorting long words instead of single bytes [5]. Kurtz and Balkenhol presented several papers about BWT sorting stages with suffix trees, which needed less space than other suffix tree implementations and are linear in time [6]. Sadakane described a fast suffix array-sorting scheme [7], Larsson presented an extended suffix
array-sorting scheme [8]. Based on already sorted suffices, Seward developed in 2000 two fast suffix sorting algorithms called "copy" and "cache" [9]. Itoh and Tanaka presented a fast sorting algorithm called the "two-stage suffix sort" [10]. Kao improved the two-stage suffix sort by some new techniques, which become very fast for sequences of repeated symbols [11]. Manzini and Ferragina [12] improved suffix array sorting techniques based on the results of Seward and of Itoh and Tanaka.
3.2 Improvement of RLE The main function of the RLE is to support the probability estimation of the next stage. Long runs of identical values tend to overestimate the global symbol probability, which leads to lower compression. Balkenhol and Shtarkov call this phenomenon "the pressure of runs" [13]. The RLE stage helps to decrease this pressure. In order to improve the probability estimation of the EC stage, the common BWCA schemes positions the RLE stage directly in front of the EC stage [1]. One common RLE stage for BWT, based compressors is Run Length Encoding Zero (RLE-0). Wheeler has suggested to code only the runs of the 0 symbols and no runs of other symbols, since 0 is the symbol with the most runs. Hereto an offset of 1 is added to symbols greater than 0. The run length is incremented by one and all bits of its binary representation except the most significant bit – which is always 1 – are stored with the symbols 0 and 1. Some authors have suggested an RLE stage before the BWT stage for speed optimization and for reducing BWT input, but such a stage deteriorates in general the compression ratio [14]. Otherwise, specific sorting algorithms are used to arrange the runs of symbols practically in linear time [9,10,11,12]. Other type of Run Length Encoding is RLE-2 that has been used by Abel [1]. The RLE-2 stage replaces all runs of two or more symbols by a run consisting of exactly two symbols. In contrast to other approaches, the length of the run is not placed behind the two symbols inside the symbol stream but transmitted into a separate data stream, so the length information does not disturb the context of the main data stream.
3.3 Improvement of Global Structure Transform Most GST stages use a recent ranking scheme for the List Update problem like Move-To-Front (MTF) algorithm, which is used in the original BWCA approach from Burrows and Wheeler. Many authors have presented improved MTF stages, which are
based on a delayed behavior, such as the MTF-1 and MTF-2 approaches of Balkenhol et al. or a sticky version by Fenwick [5]. Another approach, which achieved a much better compression ratio than MTF stages, is the Weighted Frequency Count (WFC) stage presented by Deorowicz [14], this scheme has a very high cost of computation. Other GST schemes like Inversion Frequencies (IF) [13] use a distance measurement between the occurrences of same symbol. Similar to the WFC stage of Deorowicz, Abel presented a list of counters, Incremental Frequency Count (IFC) [1]. The difference to the WFC stage is to minimize calculation.
3.4 Improvement of Entropy Coding The very first proposition of Burrows and Wheeler was to use the Huffman coder as the last stage; it is fast and simple, but the arithmetic coder is a better choice to achieve better compression ratio. Abel has modified arithmetic coding, because of the coding type of the IFC output inside the EC stage has a strong influence on the compression rate, indeed it is not sufficient to compress the index stream just by a simple arithmetic coder with a common order-n context. The index frequency of the IFC output has a nonlinear decay. Even after the use of an RLE-2 stage, the index 0 is still the most common index symbol on average.
4. Experiments and Results The experiments use medical images from IRMA (Image Retrieval in Medical Applications) database [15]. This database consists primary and secondary digitized X-ray films in portable network graphics (PNG) and tagged image file format (TIFF) format, 8 bits per pixel (bpp), examples of images are shown in Figure 2. The size of images is between 101 KB and 4684 KB.
Figure 2. Example of tested images. Upper row: directly digital, lower row: secondarily captured. From left to right: hand; head; pelvis; chest, frontal; chest, lateral. The first experiment implemented the original chain of BWCA (Figure 1) for medical image from IRMA database for both digital and secondarily digitized medical image format. These images are highresolution sizes. The results of this test are presented in Table 1.
For this study, the lossless compression schemes are used as references. We selected TIFF, raw image file format, joint photographic experts group format (JPEG and JPEG 2000) that is based on wavelet decomposition (JPEG 2000). Table 1 summarizes the observed compression ratio. BWCA original scheme can get better compression ratio than JPEG but JPEG 2000 is significantly better than BWCA original scheme. The average compression ratio of JPEG 2000 is 2.650 and BWCA original scheme is 2.387. BWCA original scheme get better result than JPEG 2000, in only two images, which are the second images of Heads Secondary and Pelvis Secondary. The compression ratio of BWCA original scheme for Pelvis Secondary is 3.104, while JPEG 2000 could get only 2.178. The difference of compression ratio between JPEG 2000 and BWCA original scheme in this image is quite significant, which is 0.926. This original scheme that was proposed by Burrows and Wheeler has a few flaws. Employing RLE-0 is not effective to decrease data, because many consecutive characters still exist after RLE-0. Employing Move-To-Front (MTF) as one of GST before RLE-0 could not reduce this phenomenon effectively, because MTF transforms one string of symbols into another string of symbols of the same length with different distribution.
Another GST is Incremental Frequency Count (IFC) that was introduced by Abel [1] is compared with MTF. It avoids the disadvantage of the MTF. MTF always sets each new symbol directly to front of the list no matter how seldom the symbol has appeared in the near past. IFC from Abel uses the technique of Weighted Frequency Count (WFC) from Deorowicz [13], by weighting the frequency of all symbols in the near past. Symbols outside the sliding window are no longer taken into account. By choosing the proper window size and weights, the WFC achieves very good results, but it has a high cost of computation, since the weighting of the symbols within the sliding window and the sorting of the list has to be recalculated for each symbol processed, therefore, IFC is proposed to reduce this weakness [1]. In general, the model of our test is based on the model of Lehmann [16] as seen in Figure 3.
Figure 3. The improved BWCA with an RLE-2 stage after the BWT stage.
Table 1. Comparable first results using BWCA original scheme. BWCA original scheme
Name of image
Size of raw image
TIFF
Comp. ratio
Jpeg
Comp. ratio
Jpeg 2000
Comp. ratio
Hands Primary
2 235 688
1 434 628
1.558
994 043
2.249
746 812
2.994
921 077
2.427
1 120 960
778 982
1.439
553 455
2.025
404 790
2.769
503 559
2.226
431 172
227 802
1.893
201 901
2.136
157 759
2.733
201 396
2.141
1 667 040
782 492
2.130
761 412
2.189
573 070
2.909
608 922
2.738
1 515 533
1 071 570
1.414
760 802
1.992
593 391
2.554
681 419
2.224
2 839 656
1 838 850
1.544
1 284 695
2.210
966 688
2.938
1 119 363
2.537
Heads Secondary
2 788 500
1 297 898
2.148
1 179 829
2.363
951 033
2.932
1 041 038
2.679
3 256 000
1 441 664
2.259
1 357 005
2.399
1 277 882
2.548
2.848
Pelvis Primary
3 239 730
2 772 998
1.168
1 877 742
1.725
1 589 535
2.038
1 143 073 1 770 899
3 126 784
2 592 926
1.206
1 740 236
1.797
1 485 588
2.105
1 661 580
1.882
Pelvis Secondary
1 076 768
803 374
1.340
506 967
2.124
420 919
2.558
501 369
2.148
7 036 956
3 184 574
2.210
3 374 061
2.086
3 230 414
2.178
2 267 335
3.104
Thoraces Frontal Primary Thoraces frontal Secondary Thoraces Lateral Primary Thoraces Lateral Secondary
3 713 600
3 244 154
1.145
2 046 205
1.815
1 830 742
2.028
2 011 249
1.846
3 405 076
2 912 946
1.169
1 806 522
1.885
1 611 065
2.114
1 780 515
1.912
6 957 060
2 832 738
2.456
2 651 775
2.624
2 047 942
3.397
2 431 091
2.862
7 006 860
3 374 332
2.077
3 027 914
2.314
2 543 669
2.755
2 607 353
2.687
6 184 913
4 357 022
1.420
2 590 276
2.388
2 115 375
2.924
2 430 634
2.545
2 186 181
1 836 094
1.191
1 227 943
1.780
1 053 533
2.075
1 170 793
1.867
5 859 510
3 611 076
1.623
1 957 078
2.994
1 429 536
4.099
1 773 996
3.303
220 580
220 778
0.999
112 457
1.961
93 861
2.350
114 544
1.926
Hands Secondary Heads Primary
Comp. ratio
1.829
Table 2. The comparison of BWCA original and its improvement scheme results. Name Of Image knee_0 knee_1
Image Size (.tiff)
Image Size (.raw)
2 701 240
2 696 640
Original BWCA 791 153
Comp. Ratio
BWCA Using IFC
Comp. Ratio
BWCA Using RLE-2 Symbols
Comp. Ratio
3.408
736 782
3.660
774 751
3.481
3.211
763 734
3.471
804 860
3.294
537 945
3.208
563 096
3.064
2 655 704
2 651 176
leg_0
1 728 972
1 725 500
568 049
3.038
leg_1
526 059
2.501
498 760
2.638
522 071
2.520
1.992
825 728
1 318 720
1 315 640
pelvis_0
3 124 892
3 119 852
1 642 274
1.900
1 566 239
1 624 185
1.921
pelvis_1
3 034 932
3 029 956
1 571 699
1.928
1 495 185
2.026
1 555 180
1.948
sinus_0
2 424 218
2 419 802
811 206
2.983
761 426
3.178
798 670
3.030
sinus_1
2 241 804
2 237 492
804 833
2.780
760 809
2.941
795 408
2.813
breast_0
3 752 938
3 746 730
983851
3.808
936388
4.001
974 505
3.845
breast_1
3 678 612
3 672 396
1096598
3.349
1046294
3.510
1 084 486
3.386
foot_0
3 125 062
3 119 694
782377
3.987
731163
4.267
766 994
4.067
foot_1
2 235 408
2 231 304
752290
2.966
702304
3.177
736 294
3.030
hand_0
2 500 096
2 484 368
822399
3.021
759646
3.270
799 105
3.109
hand_1
2 535 246
1 279 773
411577
3.109
380944
3.359
400 375
3.196
head_0
1 088 424
2 651 925
599394
4.424
572652
4.631
595 266
4.455
head_1
2 608 068
2 603 188
724980
3.591
691279
3.766
719 246
3.619
spine_0
1 759 608
1 755 944
917233
1.914
873973
2.009
908 556
1.933
877948
2.030
915 842
1.946
1 601 907
2.205
1 216 031
2.343
spine_1
1 786 082
1 782 450
thorax_0
3 537 852
3 531 492
1614415
2.187
1535721
2.300
thorax_1
2 854 408
2 849 280
1227010
2.322
1170481
2.434
924235
The model in Figure 3 is used to compare the effect of IFC and MTF. The results of IFC and MTF comparison can be seen in Table 2 whereas IFC can decrease 4.3% data. We also compare the RLE-0 with other model of RLE, which are RLE-2 that have been proposed by Lehmann [16], where RLE-2, could increase the average of the original BWCA’s compression performance around 1.5 %. The effects of blocked-oriented scheme are also investigated. The compression rates increase with the image resolution. This feature can be observed by splitting the image in blocks. The blocks are processed one by one, and provide one compressed data stream. Each stream is regrouped to produce the compressed image. The bit stream is compared with the full resolution image compression. The results of this test are presented in Table 3 and 4. These tables represent the result of image Hand Primary directly digital (see Figure 1). The image has been split into 10 blocks, see Table 3. Four different block sizes are considered. This block-oriented scheme provides lower compression ratio, nevertheless, it decreases significantly the processing time. This fact is discussed in the next experiments, which is presented in Table 5. We propose this study by investigating data format. In 4 bits process, each pixel of BWT input is split into 2 parts: 4 bits of Least Significant Bits [LSB] and 4 bits of Most Significant Bits [MSB] are
1.929
separated. Each of them becomes a new character or new 8 bits. Therefore, the input of BWT is doubled. Here, the maximum value of each character is 127, where in 8 bits the maximum value of characters is 256. The aim of diminution input symbols of BWT is to increase the number of the same symbols. The same process is done in 2 bits and 1 bit process of bit decomposition. In 2 bits process, the size of image becomes 4 times larger than 8 bits image full processed, but the maximal numbers of symbols are only 16. The results of bit decomposition in BWCA original scheme and BWCA modified are shown in Table 3 and 4. The impact of bit decomposition does not significantly increase compression ratios. There is a little alteration for each block. Therefore, binary plan decomposition can be considered. Indeed, the algorithm can then be adapted to the binary nature of the images. Therefore, binary decomposition will bring the possibility to propose hardware implementation based on logical operators. This solution can represent the simplest and low cost implementation compared to actual solution [17]. The running time is also studied, and the results are presented in Table 5. The results represent in seconds on a 2.13 GHz Pentium with 1 GB RAM. The results show that running time does not just depend on image size, but also on the nature of
Table 3. The results of image Decomposition using BWCA original scheme. Blocks size (.raw)
8 bits
Comp. ratio
4 bits
Comp. ratio
2 bits
Comp. ratio
1 bit
Comp. ratio
262 144
98 309
2,667 1,860
1,834
91 181 138 394
2,875
140 923
95 508 142 920
2,745
262 144
1,894
92 122 142 401
1,841
262 144
135 903
1,929
134 693
1,946
133 530
1,963
135 678
1,932
262 144
130 594
2,007
128 790
2,035
129 165
2,030
130 844
2,003
230 400
69 222
3,328
66 390
3,470
63 063
3,653
62 500
3,686
230 400
113 890
2,023
113 399
2,032
110 177
2,091
113 292
2,034
230 400
129 343
1,781
129 919
1,773
129 312
1,782
132 488
1,739
230 400 141 312
109 198 71 461
2,110
108 026
2,133
104 283
2,209
103 829
2,219
1,977
69 369
2,037
69 511
2,033
72 029
1,962
124 200
50 119
2,478
48 127
2,581
45 893
2,706
46 028
2,698
Average CR
2,216
2,259
2,846
2,324
2,296
Table 4. The results of image Decomposition using BWCA modified scheme. Blocks size (.raw)
8 bits
Comp. ratio
4 bits
Comp. ratio
2 bits
Comp. ratio
1 bit
Comp. ratio
262 144
97 330
2,693
98 790
2,654
95 308
2,750
98 390
2,664
262 144
140 810
1,862
142 920
1,834
140 916
1,860
146 317
1,792
262 144
135 573
1,934
137 500
1,907
137 129
1,912
140 729
1,863
262 144
129 918
2,018
131 853
1,988
132 301
1,981
135 742
1,931
230 400
67 447
3,416
69 272
3,326
67 569
3,410
68 649
3,356
230 400
113 789
2,025
115 532
1,994
113 045
2,038
117 231
1,965
230 400
129 031
1,786
131 357
1,754
132 115
1,744
136 000
1,694
230 400
108 492
2,124
109 792
2,099
108 200
2,129
109 339
2,107
141 312
71 565
1,975
73 601
1,920
72 320
1,954
75 060
1,883
124 200
50 113
2,478
51 537
2,410
49 545
2,507
50 337
2,467
Average CR
2,231
image, as seen the different results in line 1 till 4 in Table 5.
Figure 4. Decomposition of image Hand. The first and the second lines in Table 5 represent the processing time obtained for two different blocks extracted from the whole image. The two blocks are shown in Figure 4. Both blocks have the same size. The left image shows that there are many pixels with similar gray level. Table 5 shows that left image request more processing time than the right image. Therefore, running time process does not depend only on the image size (or block size) but also on image nature. The algorithm aims to sort and regroup the similar grey level pixels. The number of sorting
2,189
2,229
2,172
depends on the number of similar pixels. Therefore, the processing time increases with data redundancy. For 8 bits image decomposition, real time encoding and decoding can be obtained. Obviously, the processing time increases with image size and the decomposition in data plans. For instance, the binary data arrangement requests to process 8 plans instead of only one for the 8 bits data arrangement. Therefore, in this case, the processing time is 8 times higher with a sequential implementation. This binary data arrangement aims to purpose parallel hardware implementation based on logical operators. Therefore, a software sequential implementation is not to be considered for this data arrangement.
5. Conclusion and Perspectives BWCA lossless compression scheme allows reduction rates up to 4 when applied to radiographs.
Table 5. Running time image decomposition. Time (milisecond) Size blocks (.raw)
8 bits
4 bits
2 bits
Orig
Orig
Orig enc
1 bit Orig
enc
dec
enc
dec
dec
enc
262 144
547
32
1 391
31
4 156
125 13 765
281
262 144
469
15
125 10 125
313
500
16
30 47
3 256
262 144
1 251 1 203
3 515
125 11 563
312
262 144
500
31
1 218
31
3 515
187 11 422
312
230 400
484
16
1 359
31
4 265
78
234
32
14 406
dec
230 400
406
16
1 031
3 000
94
9 641
250
230 400
391
31
984
32
2 813
109
8 734
266
230 400 141 312
453
16
1 172
31
3 641
125 12 000
297
235