LOSSLESS IMAGE COMPRESSION USING BURROWS WHEELER ...

Report 1 Downloads 154 Views
LOSSLESS IMAGE COMPRESSION USING BURROWS WHEELER TRANSFORM (METHODS AND TECHNIQUES) Elfitrin Syahrul, Julien Dubois, Vincent Vajnovszki, Taoufik Saidani*, Mohamed Atri* Laboratoire Electronique Informatique et *Laboratoire Electronique et Image – Le2i Université de Bourgogne, Microélectronique – Lab. It06 France Université de Monastir, [email protected] Tunisie Abstract The Burrows-Wheeler Transform (BWT) is a combinatorial algorithm originally created for text compression such as bzip2, and that has been recently applied to image compression field. This paper focuses on the impact of compression scheme based on the combinatorial transform on high-level resolution medical images. It overviews the original scheme and some improvements that have been develop in post processing of BWT in this context. The performances of these techniques are compared and discussed. Moreover, considerations on the image’s sizes and data formats are also considered.

1. Introduction The performance of Burrows–Wheeler Compression Algorithm (BWCA) has been improved since it was created [1]. Many improvements for this algorithm have been presented in the past years. Some of them treat the calculation of the Burrows–Wheeler Transform (BWT) itself. Other studies treat the entropy coding of the data stream. Finally, many publications concern the middle part of the algorithm, where the BWT output symbols are prepared for the following entropy coding. This paper reveals different techniques of BWCA in image compression.

representative of a GST stage is the Move-To-Front Transform (MTF). Burrows and Wheeler introduced it in their original publication [2]. It was the first algorithm used as a GST stage in a BWCA original scheme. The MTF stage is a List Update Algorithm (LUA), which replaces the input symbols with corresponding ranking values. Just like the BWT stage, the LUA stage does not alter the number of symbols.

Figure 1 Typical scheme of the Burrows-Wheeler Compression Algorithm. The third stage typically shrinks the number of symbols by applying a Run Length Encoding scheme (RLE). Different algorithms have been presented for this purpose, with the Zero Run Transform (RLE-0) from Wheeler found to be an efficient one. The last stage is the Entropy Coding (EC) stage, which compresses the symbols by using an adapted model. We focus on lossless compression due to the aimed applications in medical field, nevertheless this scheme can be considered for lossless image compression as well as for lossy image compression. In lossy configuration a preprocessing based on DCT is added to compression [3].

2. Original scheme

3. Method evolution 3.1 Improvements of BWT

A typical scheme of the Burrows-Wheeler Compression Algorithm (BWCA) has been introduced by Abel [1]. It consists of four stages as shown in Figure 1. Each stage is a transformation of the input data and reaches the output data to the next stage. The stages are processed sequentially from left to right. The first stage is the BWT itself. It sorts the data in a way that symbols with a similar context are grouped closely together and keeps constant the number of symbols during the transformation. The second stage is called in this article Global Structure Transform (GST), which transforms the local context of the symbols to a global context. A typical

Several authors have presented improvements to the original algorithm. Andersson and Nilsson have published in [4] about Radix Sort algorithm, which can be used as a first sorting step during the BWT. In his final BWT research report, Fenwick described some BWT sort improvements including sorting long words instead of single bytes [5]. Kurtz and Balkenhol presented several papers about BWT sorting stages with suffix trees, which needed less space than other suffix tree implementations and are linear in time [6]. Sadakane described a fast suffix array-sorting scheme [7], Larsson presented an extended suffix

array-sorting scheme [8]. Based on already sorted suffices, Seward developed in 2000 two fast suffix sorting algorithms called "copy" and "cache" [9]. Itoh and Tanaka presented a fast sorting algorithm called the "two-stage suffix sort" [10]. Kao improved the two-stage suffix sort by some new techniques, which become very fast for sequences of repeated symbols [11]. Manzini and Ferragina [12] improved suffix array sorting techniques based on the results of Seward and of Itoh and Tanaka.

3.2 Improvement of RLE The main function of the RLE is to support the probability estimation of the next stage. Long runs of identical values tend to overestimate the global symbol probability, which leads to lower compression. Balkenhol and Shtarkov call this phenomenon "the pressure of runs" [13]. The RLE stage helps to decrease this pressure. In order to improve the probability estimation of the EC stage, the common BWCA schemes positions the RLE stage directly in front of the EC stage [1]. One common RLE stage for BWT, based compressors is Run Length Encoding Zero (RLE-0). Wheeler has suggested to code only the runs of the 0 symbols and no runs of other symbols, since 0 is the symbol with the most runs. Hereto an offset of 1 is added to symbols greater than 0. The run length is incremented by one and all bits of its binary representation except the most significant bit – which is always 1 – are stored with the symbols 0 and 1. Some authors have suggested an RLE stage before the BWT stage for speed optimization and for reducing BWT input, but such a stage deteriorates in general the compression ratio [14]. Otherwise, specific sorting algorithms are used to arrange the runs of symbols practically in linear time [9,10,11,12]. Other type of Run Length Encoding is RLE-2 that has been used by Abel [1]. The RLE-2 stage replaces all runs of two or more symbols by a run consisting of exactly two symbols. In contrast to other approaches, the length of the run is not placed behind the two symbols inside the symbol stream but transmitted into a separate data stream, so the length information does not disturb the context of the main data stream.

3.3 Improvement of Global Structure Transform Most GST stages use a recent ranking scheme for the List Update problem like Move-To-Front (MTF) algorithm, which is used in the original BWCA approach from Burrows and Wheeler. Many authors have presented improved MTF stages, which are

based on a delayed behavior, such as the MTF-1 and MTF-2 approaches of Balkenhol et al. or a sticky version by Fenwick [5]. Another approach, which achieved a much better compression ratio than MTF stages, is the Weighted Frequency Count (WFC) stage presented by Deorowicz [14], this scheme has a very high cost of computation. Other GST schemes like Inversion Frequencies (IF) [13] use a distance measurement between the occurrences of same symbol. Similar to the WFC stage of Deorowicz, Abel presented a list of counters, Incremental Frequency Count (IFC) [1]. The difference to the WFC stage is to minimize calculation.

3.4 Improvement of Entropy Coding The very first proposition of Burrows and Wheeler was to use the Huffman coder as the last stage; it is fast and simple, but the arithmetic coder is a better choice to achieve better compression ratio. Abel has modified arithmetic coding, because of the coding type of the IFC output inside the EC stage has a strong influence on the compression rate, indeed it is not sufficient to compress the index stream just by a simple arithmetic coder with a common order-n context. The index frequency of the IFC output has a nonlinear decay. Even after the use of an RLE-2 stage, the index 0 is still the most common index symbol on average.

4. Experiments and Results The experiments use medical images from IRMA (Image Retrieval in Medical Applications) database [15]. This database consists primary and secondary digitized X-ray films in portable network graphics (PNG) and tagged image file format (TIFF) format, 8 bits per pixel (bpp), examples of images are shown in Figure 2. The size of images is between 101 KB and 4684 KB.

Figure 2. Example of tested images. Upper row: directly digital, lower row: secondarily captured. From left to right: hand; head; pelvis; chest, frontal; chest, lateral. The first experiment implemented the original chain of BWCA (Figure 1) for medical image from IRMA database for both digital and secondarily digitized medical image format. These images are highresolution sizes. The results of this test are presented in Table 1.

For this study, the lossless compression schemes are used as references. We selected TIFF, raw image file format, joint photographic experts group format (JPEG and JPEG 2000) that is based on wavelet decomposition (JPEG 2000). Table 1 summarizes the observed compression ratio. BWCA original scheme can get better compression ratio than JPEG but JPEG 2000 is significantly better than BWCA original scheme. The average compression ratio of JPEG 2000 is 2.650 and BWCA original scheme is 2.387. BWCA original scheme get better result than JPEG 2000, in only two images, which are the second images of Heads Secondary and Pelvis Secondary. The compression ratio of BWCA original scheme for Pelvis Secondary is 3.104, while JPEG 2000 could get only 2.178. The difference of compression ratio between JPEG 2000 and BWCA original scheme in this image is quite significant, which is 0.926. This original scheme that was proposed by Burrows and Wheeler has a few flaws. Employing RLE-0 is not effective to decrease data, because many consecutive characters still exist after RLE-0. Employing Move-To-Front (MTF) as one of GST before RLE-0 could not reduce this phenomenon effectively, because MTF transforms one string of symbols into another string of symbols of the same length with different distribution.

Another GST is Incremental Frequency Count (IFC) that was introduced by Abel [1] is compared with MTF. It avoids the disadvantage of the MTF. MTF always sets each new symbol directly to front of the list no matter how seldom the symbol has appeared in the near past. IFC from Abel uses the technique of Weighted Frequency Count (WFC) from Deorowicz [13], by weighting the frequency of all symbols in the near past. Symbols outside the sliding window are no longer taken into account. By choosing the proper window size and weights, the WFC achieves very good results, but it has a high cost of computation, since the weighting of the symbols within the sliding window and the sorting of the list has to be recalculated for each symbol processed, therefore, IFC is proposed to reduce this weakness [1]. In general, the model of our test is based on the model of Lehmann [16] as seen in Figure 3.

Figure 3. The improved BWCA with an RLE-2 stage after the BWT stage.

Table 1. Comparable first results using BWCA original scheme. BWCA original scheme

Name of image

Size of raw image

TIFF

Comp. ratio

Jpeg

Comp. ratio

Jpeg 2000

Comp. ratio

Hands Primary

2 235 688

1 434 628

1.558

994 043

2.249

746 812

2.994

921 077

2.427

1 120 960

778 982

1.439

553 455

2.025

404 790

2.769

503 559

2.226

431 172

227 802

1.893

201 901

2.136

157 759

2.733

201 396

2.141

1 667 040

782 492

2.130

761 412

2.189

573 070

2.909

608 922

2.738

1 515 533

1 071 570

1.414

760 802

1.992

593 391

2.554

681 419

2.224

2 839 656

1 838 850

1.544

1 284 695

2.210

966 688

2.938

1 119 363

2.537

Heads Secondary

2 788 500

1 297 898

2.148

1 179 829

2.363

951 033

2.932

1 041 038

2.679

3 256 000

1 441 664

2.259

1 357 005

2.399

1 277 882

2.548

2.848

Pelvis Primary

3 239 730

2 772 998

1.168

1 877 742

1.725

1 589 535

2.038

1 143 073 1 770 899

3 126 784

2 592 926

1.206

1 740 236

1.797

1 485 588

2.105

1 661 580

1.882

Pelvis Secondary

1 076 768

803 374

1.340

506 967

2.124

420 919

2.558

501 369

2.148

7 036 956

3 184 574

2.210

3 374 061

2.086

3 230 414

2.178

2 267 335

3.104

Thoraces Frontal Primary Thoraces frontal Secondary Thoraces Lateral Primary Thoraces Lateral Secondary

3 713 600

3 244 154

1.145

2 046 205

1.815

1 830 742

2.028

2 011 249

1.846

3 405 076

2 912 946

1.169

1 806 522

1.885

1 611 065

2.114

1 780 515

1.912

6 957 060

2 832 738

2.456

2 651 775

2.624

2 047 942

3.397

2 431 091

2.862

7 006 860

3 374 332

2.077

3 027 914

2.314

2 543 669

2.755

2 607 353

2.687

6 184 913

4 357 022

1.420

2 590 276

2.388

2 115 375

2.924

2 430 634

2.545

2 186 181

1 836 094

1.191

1 227 943

1.780

1 053 533

2.075

1 170 793

1.867

5 859 510

3 611 076

1.623

1 957 078

2.994

1 429 536

4.099

1 773 996

3.303

220 580

220 778

0.999

112 457

1.961

93 861

2.350

114 544

1.926

Hands Secondary Heads Primary

Comp. ratio

1.829

Table 2. The comparison of BWCA original and its improvement scheme results. Name Of Image knee_0 knee_1

Image Size (.tiff)

Image Size (.raw)

2 701 240

2 696 640

Original BWCA 791 153

Comp. Ratio

BWCA Using IFC

Comp. Ratio

BWCA Using RLE-2 Symbols

Comp. Ratio

3.408

736 782

3.660

774 751

3.481

3.211

763 734

3.471

804 860

3.294

537 945

3.208

563 096

3.064

2 655 704

2 651 176

leg_0

1 728 972

1 725 500

568 049

3.038

leg_1

526 059

2.501

498 760

2.638

522 071

2.520

1.992

825 728

1 318 720

1 315 640

pelvis_0

3 124 892

3 119 852

1 642 274

1.900

1 566 239

1 624 185

1.921

pelvis_1

3 034 932

3 029 956

1 571 699

1.928

1 495 185

2.026

1 555 180

1.948

sinus_0

2 424 218

2 419 802

811 206

2.983

761 426

3.178

798 670

3.030

sinus_1

2 241 804

2 237 492

804 833

2.780

760 809

2.941

795 408

2.813

breast_0

3 752 938

3 746 730

983851

3.808

936388

4.001

974 505

3.845

breast_1

3 678 612

3 672 396

1096598

3.349

1046294

3.510

1 084 486

3.386

foot_0

3 125 062

3 119 694

782377

3.987

731163

4.267

766 994

4.067

foot_1

2 235 408

2 231 304

752290

2.966

702304

3.177

736 294

3.030

hand_0

2 500 096

2 484 368

822399

3.021

759646

3.270

799 105

3.109

hand_1

2 535 246

1 279 773

411577

3.109

380944

3.359

400 375

3.196

head_0

1 088 424

2 651 925

599394

4.424

572652

4.631

595 266

4.455

head_1

2 608 068

2 603 188

724980

3.591

691279

3.766

719 246

3.619

spine_0

1 759 608

1 755 944

917233

1.914

873973

2.009

908 556

1.933

877948

2.030

915 842

1.946

1 601 907

2.205

1 216 031

2.343

spine_1

1 786 082

1 782 450

thorax_0

3 537 852

3 531 492

1614415

2.187

1535721

2.300

thorax_1

2 854 408

2 849 280

1227010

2.322

1170481

2.434

924235

The model in Figure 3 is used to compare the effect of IFC and MTF. The results of IFC and MTF comparison can be seen in Table 2 whereas IFC can decrease 4.3% data. We also compare the RLE-0 with other model of RLE, which are RLE-2 that have been proposed by Lehmann [16], where RLE-2, could increase the average of the original BWCA’s compression performance around 1.5 %. The effects of blocked-oriented scheme are also investigated. The compression rates increase with the image resolution. This feature can be observed by splitting the image in blocks. The blocks are processed one by one, and provide one compressed data stream. Each stream is regrouped to produce the compressed image. The bit stream is compared with the full resolution image compression. The results of this test are presented in Table 3 and 4. These tables represent the result of image Hand Primary directly digital (see Figure 1). The image has been split into 10 blocks, see Table 3. Four different block sizes are considered. This block-oriented scheme provides lower compression ratio, nevertheless, it decreases significantly the processing time. This fact is discussed in the next experiments, which is presented in Table 5. We propose this study by investigating data format. In 4 bits process, each pixel of BWT input is split into 2 parts: 4 bits of Least Significant Bits [LSB] and 4 bits of Most Significant Bits [MSB] are

1.929

separated. Each of them becomes a new character or new 8 bits. Therefore, the input of BWT is doubled. Here, the maximum value of each character is 127, where in 8 bits the maximum value of characters is 256. The aim of diminution input symbols of BWT is to increase the number of the same symbols. The same process is done in 2 bits and 1 bit process of bit decomposition. In 2 bits process, the size of image becomes 4 times larger than 8 bits image full processed, but the maximal numbers of symbols are only 16. The results of bit decomposition in BWCA original scheme and BWCA modified are shown in Table 3 and 4. The impact of bit decomposition does not significantly increase compression ratios. There is a little alteration for each block. Therefore, binary plan decomposition can be considered. Indeed, the algorithm can then be adapted to the binary nature of the images. Therefore, binary decomposition will bring the possibility to propose hardware implementation based on logical operators. This solution can represent the simplest and low cost implementation compared to actual solution [17]. The running time is also studied, and the results are presented in Table 5. The results represent in seconds on a 2.13 GHz Pentium with 1 GB RAM. The results show that running time does not just depend on image size, but also on the nature of

Table 3. The results of image Decomposition using BWCA original scheme. Blocks size (.raw)

8 bits

Comp. ratio

4 bits

Comp. ratio

2 bits

Comp. ratio

1 bit

Comp. ratio

262 144

98 309

2,667 1,860

1,834

91 181 138 394

2,875

140 923

95 508 142 920

2,745

262 144

1,894

92 122 142 401

1,841

262 144

135 903

1,929

134 693

1,946

133 530

1,963

135 678

1,932

262 144

130 594

2,007

128 790

2,035

129 165

2,030

130 844

2,003

230 400

69 222

3,328

66 390

3,470

63 063

3,653

62 500

3,686

230 400

113 890

2,023

113 399

2,032

110 177

2,091

113 292

2,034

230 400

129 343

1,781

129 919

1,773

129 312

1,782

132 488

1,739

230 400 141 312

109 198 71 461

2,110

108 026

2,133

104 283

2,209

103 829

2,219

1,977

69 369

2,037

69 511

2,033

72 029

1,962

124 200

50 119

2,478

48 127

2,581

45 893

2,706

46 028

2,698

Average CR

2,216

2,259

2,846

2,324

2,296

Table 4. The results of image Decomposition using BWCA modified scheme. Blocks size (.raw)

8 bits

Comp. ratio

4 bits

Comp. ratio

2 bits

Comp. ratio

1 bit

Comp. ratio

262 144

97 330

2,693

98 790

2,654

95 308

2,750

98 390

2,664

262 144

140 810

1,862

142 920

1,834

140 916

1,860

146 317

1,792

262 144

135 573

1,934

137 500

1,907

137 129

1,912

140 729

1,863

262 144

129 918

2,018

131 853

1,988

132 301

1,981

135 742

1,931

230 400

67 447

3,416

69 272

3,326

67 569

3,410

68 649

3,356

230 400

113 789

2,025

115 532

1,994

113 045

2,038

117 231

1,965

230 400

129 031

1,786

131 357

1,754

132 115

1,744

136 000

1,694

230 400

108 492

2,124

109 792

2,099

108 200

2,129

109 339

2,107

141 312

71 565

1,975

73 601

1,920

72 320

1,954

75 060

1,883

124 200

50 113

2,478

51 537

2,410

49 545

2,507

50 337

2,467

Average CR

2,231

image, as seen the different results in line 1 till 4 in Table 5.

Figure 4. Decomposition of image Hand. The first and the second lines in Table 5 represent the processing time obtained for two different blocks extracted from the whole image. The two blocks are shown in Figure 4. Both blocks have the same size. The left image shows that there are many pixels with similar gray level. Table 5 shows that left image request more processing time than the right image. Therefore, running time process does not depend only on the image size (or block size) but also on image nature. The algorithm aims to sort and regroup the similar grey level pixels. The number of sorting

2,189

2,229

2,172

depends on the number of similar pixels. Therefore, the processing time increases with data redundancy. For 8 bits image decomposition, real time encoding and decoding can be obtained. Obviously, the processing time increases with image size and the decomposition in data plans. For instance, the binary data arrangement requests to process 8 plans instead of only one for the 8 bits data arrangement. Therefore, in this case, the processing time is 8 times higher with a sequential implementation. This binary data arrangement aims to purpose parallel hardware implementation based on logical operators. Therefore, a software sequential implementation is not to be considered for this data arrangement.

5. Conclusion and Perspectives BWCA lossless compression scheme allows reduction rates up to 4 when applied to radiographs.

Table 5. Running time image decomposition. Time (milisecond) Size blocks (.raw)

8 bits

4 bits

2 bits

Orig

Orig

Orig enc

1 bit Orig

enc

dec

enc

dec

dec

enc

262 144

547

32

1 391

31

4 156

125 13 765

281

262 144

469

15

125 10 125

313

500

16

30 47

3 256

262 144

1 251 1 203

3 515

125 11 563

312

262 144

500

31

1 218

31

3 515

187 11 422

312

230 400

484

16

1 359

31

4 265

78

234

32

14 406

dec

230 400

406

16

1 031

3 000

94

9 641

250

230 400

391

31

984

32

2 813

109

8 734

266

230 400 141 312

453

16

1 172

31

3 641

125 12 000

297

235