A Generalized Interpolative Vector Quantization Method ... - CiteSeerX

Report 0 Downloads 161 Views
A Generalized Interpolative Vector Quantization Method for Jointly Optimal Quantization, Interpolation and Binarization of Text Images  Faramarz Fekri , Russell M. Mersereau, Ronald W. Schafer Center for Signal & Image Processing Georgia Institute of Technology, Atlanta, GA 30332-0250 email: [email protected] , Tel: (404)894-8361 Fax: (404)894-8363 EDICS: 3.2, 1.10

Abstract This paper presents an approach for the e ective combination of interpolation with binarization of gray level text images to reconstruct a high resolution binary image from a lower resolution gray level one. We study two nonlinear interpolative techniques for text image interpolation. These nonlinear interpolation methods map quantized low dimensional 22 image blocks to higher dimensional 44 (possibly binary) blocks using a table lookup operation. The rst method performs interpolation of text images using context-based, nonlinear, interpolative, vector quantization (NLIVQ). This system has a simple training procedure and has performance (for gray-level high resolution images) that is comparable to our more sophisticated generalized interpolative VQ (GIVQ) approach, which is the second method. In it, we jointly optimize the quantizer and interpolator to nd matched codebooks for the low and high resolution images. Then, to obtain the binary codebook that incorporates binarization with interpolation, we introduce a binary constrained optimization method using GIVQ. In order to incorporate the nearest neighbor constraint on the quantizer while minimizing the distortion in the interpolated image, a deterministic-annealing-based optimization technique is applied. With a few interpolation examples, we demonstrate the superior performance of this method over the NLIVQ method (especially for binary outputs) and other standard techniques e.g., bilinear interpolation and pixel replication.  This work was supported by a grant from the Hewlett Packard Corporation.

1

1 INTRODUCTION Increasing image resolution is of great interest for many imaging applications such as image enlargement (in medical imaging and digital photography), and enhancement of coded images (in multimedia applications). The application that motivated this work is the problem of gray level scanning of essentially binary text images at low resolution (e.g., 300 dpi) followed by reproduction at a higher resolution (e.g., 600 dpi) by printing on binary devices. Standard approaches to interpolation rely on unrealistic assumptions about text images. Bilinear interpolation, for example, assumes continuity of the image. Bicubic interpolation imposes continuity on both the image and its rst derivatives. Spline techniques may make assumptions about the continuity of even higher order derivatives. In e ect, all these standard methods are examples of bandlimited interpolation. But many text images scanned at low to medium resolution are not good candidates for bandlimited interpolation, because the Nyquist condition is not satis ed near the high-contrast edges. Among the standard techniques, nearest neighbor interpolation (pixel replication) requires the least computation, but results in blockiness in the interpolated image. The bilinear interpolation and spline techniques, while providing better quality in the smooth portions of images, tend to smooth edges and other discontinuities. Finally, the sinc interpolator (a spline of in nite order [1]) cannot be used in practice because its in nite lter length causes a lack of locality and high complexity. As an alternative to these standard techniques, other interpolation methods exploit information relevant to edge preservation to enhance the quality of the resulting images. Median ltering [2], [3] preserves approximately the sharpness of isolated image transitions. However, the edge preserving property of the median lter does not apply at corners or certain other two dimensional structures. Another class of edge preserving interpolation methods is the directional interpolation techniques [4], [5], [6], [7], which perform one-dimensional interpolation along the minimum variation 2

direction. Directional interpolation schemes classify image regions as edgy or smooth, and then individually interpolate these regions using an appropriate directional interpolator. Near an edge, the location and orientation of the edge is estimated and used as the basis for the interpolation. One problem with directional interpolation techniques is that downsampling in edgy regions introduces uncertainty in the estimate of the orientation and location. This can create large interpolation errors at sharp edges, such as on the contours of the letters in text images. In multiresolution methods [8], [9] the available bandwidth is extended to half of the new sampling rate. These methods exploit the regularity of edges across resolution scales to estimate the high frequency information that is required for the interpolation. These algorithms estimate the regularity of edges by measuring the rate of decay of wavelet transform coecients across scales and attempt to preserve the underlying regularity by extrapolating a new high frequency subband to be used in high resolution image reconstruction. In [10] the spatial correlation (mainly at the edges) across the successive scales is exploited to estimate a new high frequency subband that is required by the synthesis bank of the lter bank. In all of these interpolation techniques, better performance comes at the expense of higher complexity than required by the standard linear methods. As depicted in Fig. 1, the focus of our work is to produce an interpolated binary image. Interpolation and binarization, when performed in two separate steps, results in suboptimal performance. It is not optimal because the operations are carried out independently, and the distortion measure used for interpolation does not match that used for binarization. This paper addresses the following question: Can interpolation be combined e ectively with binarization using VQ? Vector quantization has established itself as a powerful tool in speech and image processing problems that require clustering and classi cations. An overview of the widespread applications of VQ in image processing is given in [11]. There are varieties of examples in which image compression is combined with classi cation [12], with histogram equalization [13], and with volume rendering [14]. The objective of these techniques is either to lessen the complexity of the processing or to provide better quality by jointly optimizing two individual 3

processing steps. Successful application of vector quantization for combined compression and binarization of natural images [15] encourages us to apply it for the problem of joint optimization of interpolation and binarization. In [16], the author showed that nonlinear interpolative VQ (NLIVQ) can be used to map an observable random vector X to a nite set of estimated values of a random vector Y . However, as we show later, mapping a set of low resolution image blocks into a collection of thresholded higher resolution reproduction blocks by the NLIVQ method does not result in promising binary outputs. As an alternative to the highly suboptimal NLIVQ method, we propose a generalized interpolative VQ (GIVQ) technique [17] which is a special case of generalized vector quantization (GVQ) [18]. GIVQ is a joint optimization method using deterministic annealing [19]. During the design phase of GIVQ, the training vectors are assigned to clusters in a probabilistic fashion. Consequently, the joint optimization of the quantization, interpolation, and binarization can be formulated within a probabilistic structure. The paper is organized as follows. In Section 2, we introduce the necessary concepts for designing the nonlinear interpolative VQ and generalized interpolative VQ for image interpolation. Section 3 is devoted to the mathematical formulation of the generalized interpolative VQ algorithm based on deterministic annealing. Joint optimization of the interpolation and binarization is described in this section as well. In Section 4, we present the experimental results. Finally, Section 5 provides open problems and concluding remarks.

2 IMAGE INTERPOLATION USING NLIVQ 2.1 Interpolative VQ System An interpolative VQ system is fully speci ed by the two codebooks, one (the encoder codebook of size

N

labeled

C

in Fig. 2) for the low resolution image blocks and the other (the decoder

codebook of size N labeled C  in Fig. 2) for the corresponding high resolution image blocks, and a rule for mapping the feature vectors to the signal vectors. In interpolative VQ, the encoder 4

takes k-dimensional (e.g., 22) image blocks of a low resolution (e.g., 300 dpi) image as its input and quantizes them based on the partitioning speci ed by the codebook C . These quantized image blocks are mapped into blocks of dimension l (e.g., 44) by the interpolative decoder from codebook C .

In other words, the interpolative VQ takes low dimensional feature vectors X and maps them

into higher dimensional signal vectors Y producing a high resolution image.

2.2 Design Methodology of Nonlinear Interpolative VQ Nonlinear interpolative vector quantization (NLIVQ) [16] was introduced primarily to reduce the complexity of the quantizer that can be achieved by operating on a reduced dimensional feature vector that is extracted from a high dimensional signal vector. In this Section, we apply NLIVQ to the image interpolation problem. An NLIVQ system has a sequential design methodology. A modi ed nearest neighborhood VQ is rst designed to partition the k-dimensional feature space

Rk

into N regions, where N is the

codebook size. The partition regions Ri are de ned as: Ri

= fx 2 Rk :

Q(x) = xi g

i = 1; 2; : : : ; N:

(1)

Here Q() is a vector quantizer whose output is constrained to be one of a nite set of N vectors. Assume (xt , yt ) to be the feature and signal vector pairs in the training set. Then the objective of the NLIVQ method is to quantize optimally the feature vector xt using a training set of feature data without regard to the statistical correlation between xt and yt . In this approach, we seek the set of codewords fxi gNi=1 that gives the minimum average distortion when quantizing the entire set of low resolution training data; i.e., we seek a codebook C of size N such that the mean squared-error is minimized:

(X

min fDg = minN

fxi gN i=1

fxi gi=1

xt

df (xt ; Q(xt ))

where df (; ) is the squared error measure in the feature space. 5

)

(2)

The encoder codebook can be designed using any standard vector quantization technique such as the LBG algorithm [20] or a more robust method such as the deterministic annealing method [19]. We used a principal-axis splitting technique [21] to derive the codewords fxi gNi=1 . Starting from an initial codeword x1 , the training system undergoes a sequence of codebook augmentations until the number of low resolution reproduction codewords n reaches the speci ed number N (for example 64). In each augmentation the number of codewords

n

is incremented by one and the

average distortion in (2) is minimized with respect to fxi gni=1 by running the generalized Lloyd algorithm on the whole training set. For this method to work well for our problem, we have found that the training algorithm requires that we rst remove the \all white" and \all black" blocks from the low resolution training data so that these blocks, which are quite numerous in the training data, do not in uence the codewords derived for the remaining blocks. After nishing with the low resolution encoder design, the high resolution decoder reproduction words fyi gNi=1 are chosen to minimize the average distortion of the interpolated image (for the given low resolution encoder); i.e., yi

= Efyt j

xt

2 Ri g

i = 1; 2; : : : ; N:

(3)

This equation implies that the reproduction codeword for the interpolative decoder is the centroid (Euclidean mean in the squared error case) of all of the higher dimension blocks Y whose input X has been mapped to the same index by the encoder. Vector quantization by itself exploits the inter-pixel correlation that exists within an input image block, but it does not take advantage of the correlation between adjacent image blocks. One possible solution is to increase the size of the image blocks, but this requires a very large codebook size to improve the performance, and this solution is not attractive because of the computational cost required for its implementation. Another approach to improve the quality of the interpolated images is to use context instead of encoding the input blocks independently. Suppose (as an example) that the input and output blocks are 22 and 44, respectively. Then as shown in 6

Fig. 3, a more feasible solution is to map overlapping 22 input blocks into 44 output blocks and then to extract the four non-overlapping center pixels from the 44 output image blocks in the codebook C  [22]. In this method (lapped NLIVQ), each 22 input block overlaps eight other input blocks. Consequently each pixel in the low resolution image is used four times in the interpolation process. It is worth noting that the actual implementation of the system only requires storing the four center pixels of the 44 codewords in the codebook C  . In Section 5, we demonstrate the successful application of context-based NLIVQ for the interpolation of gray level images. However, NLIVQ is a suboptimal procedure because the interpolation and quantization are not designed jointly. Furthermore, the NLIVQ design objective of minimizing the quantization error in X is not matched to the objective of the interpolation problem which is concerned with minimizing the distortion in the output signal vector Y . More importantly, for digital binarization applications, a joint optimization of interpolation and binarization under a common distortion measure produces better results than a sequential design methodology. This leads us to propose our generalized interpolative vector quantization technique GIVQ [17].

3 GENERALIZED INTERPOLATIVE VQ 3.1 Formulation of Generalized Interpolative VQ Algorithm Generalized interpolative VQ (GIVQ) is a special case of GVQ [18]. The successful application of GVQ to the coding of noisy sources [23] encouraged us to apply it to the interpolation problem. In GVQ the estimate of a random vector Y is formed from a random vector X using an estimator h()

that is constrained to take on a nite set of

N

values. The mapping h(x) is viewed as a

generalized vector quantizer (GVQ) that optimally generates a quantized approximation to

Y

from an observation of X . In GIVQ a low dimensional feature vector X is mapped into a higher dimensional (possibly binary) signal vector Y producing a high resolution image. GIVQ de nes a partition of the k-dimensional input space Rk by dividing that space into N regions where N is 7

the codebook size. The partition regions Ri are de ned as Ri

= fx 2 Rk :

h(x) = yi g

i = 1; 2; : : : ; N:

(4)

The GIVQ design objective is to minimize the error caused by estimating Y using h(x) de ned as di (Y; h(x)).

GIVQ is described by the same block diagram as NLIVQ; i.e. as depicted in Fig. 2,

GIVQ consists of an encoder followed by an interpolative decoder. The encoder maps the low dimensional input vector

X

to an index i 2 f1; 2; : : : ; N g by applying a nearest neighbor rule

over a low dimensional codebook

C

with size N . Then the interpolative decoder looks up the

corresponding higher dimensional signal vector Y in codebook C  . The main di erence between GIVQ and NLIVQ is that NLIVQ chooses the partitions to minimize the distortion in the input space (optimum quantizer), while GIVQ choose them to minimize the distortion in the output space. Consequently, unlike NLIVQ, the codewords of the GIVQ in codebook

C

are not necessarily the `centroids' of the input (feature) space vectors assigned to

the same partition. Like GVQ, for a mean squared error distortion measure the optimum GIVQ satis es the necessary conditions given in [18]. To formulate the GIVQ problem, let the sets fxi gNi=1 and fyi gNi=1 be the codewords in the low and high dimensional vector spaces, respectively. Also let df (; ) and di (; ) be the distortion measure in the low dimensional vector space X and increased dimension vector space Y , respectively. Then, for a given set of training pairs T = f(xt ; yt )g, we want to optimize the codewords fxi gNi=1 and

fyigNi so that the total distortion in the signal space Y is minimized: =1

8 9 <X = min f D g = min di (yt ; h(xt )) : ; fxj g;fyj g fxj g;fyj g : (xt ;yt )

(5)

Here h() is a GIVQ mapping function that is consistent with the VQ nearest neighbor encoding rule:

8

8 > > < > > :

arg min fdf (xt ; xj )g j

if

i=

then

let yi

(6)

= h(xt )

From (5) and (6), it is obvious that the joint optimization problem is not a trivial one. Deterministic annealing has been shown by [23] to be a successful method for solving the optimization problem while imposing the nearest neighbor constraint given by (6). Therefore, we have chosen this approach for the joint optimization of the quantizer and interpolator.

3.2 Optimization of GIVQ by Deterministic Annealing Deterministic annealing (DA) [19] provides a probabilistic encoding rule for GIVQ that can be exploited to enforce the nearest neighbor constraint on the encoder while minimizing the distortion in the signal space Y . Although convergence to the global minimum is not assured, it has been shown to be a successful method for avoiding many local minima. Randomization of the partition subject to a constraint on the encoder entropy results in a Gibbs distribution. This data clustering becomes a fuzzy membership operation in which each vector

X

is assigned to every cluster by

associative probabilities given by the Gibbs' distribution [19] p(x 2 Rj ) =

exp(? d (x; x ))

PN exp(?f d (x;j x )) f k k=1

j

= 1; 2; : : : ; N;

(7)

where is a positive scalar parameter controlling the degree of randomness. By this fuzzy membership, each input vector X belongs to all of the clusters with a probability that depends on its distance from the codewords representing those clusters. In this formulation it is assumed that the data assignment is an independent operation that ignores the correlation between adjacent image blocks. The associated Shannon entropy for this random partitioning is de ned by H

=?

N XX x j =1

p(x 2 Rj ) log[p(x 2 Rj )]:

9

(8)

Now the optimization of GIVQ can be formulated as the minimization of an objective distortion D

de ned by (5) subject to an encoder entropy constraint (8) min

F

fxi g;fyi g;

in which



= fx min fD ?   H g g;fy g; i

i

(9)

is a temperature in the annealing process. By this de nition, we have an e ective

objective distortion. The free energy F is minimized to obtain the minimal distortion D in the signal vector Y , while imposing the nearest neighbor encoding rule by gradually reducing the randomness H through a gradually decreasing temperature . Using (5), (8), and (9) gives F

=

N X X (x

t ;yt ) j =1

p(xt

2 Rj )fdi (yt ; yj ) +  log[p(xt 2 Rj )]g:

(10)

To obtain the necessary optimality conditions for minimizing F , we set to zero the derivatives of F

with respect to fxi gNi=1 , fyi gNi=1 , and . By solving the resulting equations for the case of the

squared-error measure in df (; ) and di (; ), we nd that each representative in the signal vector Y is de ned as the center of mass of the fuzzy cluster

X

yj

t ;yt )

= X (x

xt

p(xt

2 Rj )yt

p(xt

2 Rj )

j

= 1; 2; : : : ; N

(11)

and the corresponding representative in the feature vector space can be derived as

X

xj

=

(x

(Fxj ? Fx )p(xt 2 Rj )xt

;yt ) tX xt

(Fxj ? Fx )p(xt 2 Rj )

j

= 1; 2; : : : ; N:

(12)

Here Fxj , and Fx are de ned for any (xt ; yt ) as Fx Fxj

=

N X j =1

p(xt

2 Rj )Fxj

= di (yt ; yj ) +  log[p(xt 2 Rj )]:

(13) (14)

The scalar parameter should satisfy the following optimality equation @F @

=

N X X (xt ;yt ) j =1

Fxj p(xt

2 Rj ) 

N X k=1

fp(xt 2 Rk )df (xt ; xk ) ? df (xt ; xk )g = 0: 10

(15)

The role of the temperature



and the scalar

in the optimization can be seen through (10).

While  determines the relative contributions of entropy and signal space distortion in the overall cost function, the parameter controls the entropy itself through the probability in (8). At the beginning of the annealing process the temperature is high and consequently more weight is given to the entropy than to the distortion. But gradually, as the temperature is decreased the cost function becomes solely a function of the distortion. GIVQ can be trained by randomly initializing both feature space codewords fxi gNi=1 and signal space codewords fyi gNi=1 , then optimizing (10) with respect to the parameters for each given temperature. To optimize F with respect to fxi gNi=1 ,

fyigNi and at any xed temperature , we use (11) for the decoder parameters and a gradient =1

descent method for both and encoder parameters. Therefore, the optimization of the codewords and is iterative for a given temperature. The starting and stopping values of the temperature and the trajectory by which it is reduced depend on the training data (the dimension of both feature and signal space data, and the range of their variation). For the training data with input and output blocks of dimension 22 and 44, respectively, and gray scale values ranging from zero to one, our experiments showed that starting from =10 and reducing the temperature by 2 percent at each stage until reaching a value of about =.001 was an appropriate choice. It is also worth noting that the scalar has to be initialized carefully. This parameter, which is optimized during the course of the annealing process, becomes very large when the annealing process reaches its conclusion. Like the temperature, the starting value for depends on the training data and should be chosen to be small enough so that the randomness is maximized in the beginning. However, if we choose too small a value then all the initial codewords merge together undesirably.

11

4 BINARIZATION OF TEXT IMAGES 4.1 Binary Constrained GIVQ Method Our approach for combined interpolation and binarization of text images involves designing an interpolative VQ whose encoder codewords fxi gNi=1 are gray level and decoder codewords fyi gNi=1 are binary. This would be a special case of GIVQ in which a gray level low dimensional feature vector X is mapped into a binary higher dimensional signal vector Y producing a high resolution binary image. In the following, we present a simple extension of the GIVQ algorithm for the design of jointly optimal encoder and interpolative decoder codebooks. It is worth noting that a suboptimal algorithm for interpolation and binarization might very well result in better-looking images than an optimal one, if the distortion criterion is not an e ective or consistent gauge of visual perception. In optimization of GIVQ, we use a mean-squared-error distortion measure. In addition to making the mathematical development feasible, it is also suitable with bilevel text and line art because for binary signals it is equivalent to a Hamming distance. However, the same approach could not be expected to work well with halftoned natural images. Here we assume that the range of image intensities is [0; 1]. We use the previous training set (xt ; yt ) in the low and high resolution space except that yt is constrained to be a binary version of the high dimensional signal vectors. One method of imposing the binary constraint on the codewords

fyigNi into the optimization of free energy F in (10) is to use a simple thresholding operation, =1

which gives the nearest binary codeword according to the mean-squared-error distortion measure. Consequently, we optimize F with respect to fxi gNi=1 , fyi gNi=1 , and at any given temperature  as described in the previous section, then we threshold the codewords fyi gNi=1 and continue with the optimization of F at the next temperature. By this iterative technique the initial binary codewords are improved through the deterministic annealing process. It is important to note that both the decoder codewords fyi gNi=1 and the distribution of the signal vectors Y in this case lie on vertices of a hypercube. Consequently, the free energy

F

12

takes on only discrete values as the annealing

process proceeds. Thus, the optimization by a gradient descent method is easily trapped in a local minimum and is sensitive to the initial location of the encoder and decoder codewords. Therefore, it is helpful to start with only one codeword and use the splitting procedure described in the next section to avoid local minima.

4.2 Algorithm Description The algorithm starts at a very high temperature and small value of with only one initial representative in both the feature and signal spaces (N = 1). As the temperature decreases, the balance between distortion and entropy in (10) changes toward less randomness. For each gradually decreasing temperature, the representatives and scalar parameter are optimized. This procedure continues until  reaches a critical value given by 1 c

= 2(CxyT Cxy Cyy?1 ) :

(16)

Here  is an eigenvalue of its matrix argument that is achieved by  during the temperature schedule, and Cxy and Cyy are cross-covariance and covariance matrices de ned for each cluster by taking into account the probabilistic data assignment

X j Cxy

= (xt ;yt)

p(xt

2 Rj )(xt ? x^t)(yt ? y^t )T

X xt

X j Cyy

= (xt ;yt )

p(xt

p(xt

2 Rj )

j

= 1; 2; : : : ; N

(17)

j

= 1; 2; : : : ; N

(18)

2 Rj )(yt ? y^t)(yt ? y^t)T

X xt

p(xt

where x^t and y^t are the mean values of

2 Rj ) xt

and yt , respectively. At the critical temperature,

the critical cluster's codeword in the higher dimensional space

Y

is split in the direction of the

eigenvector corresponding to . Also the split in the corresponding feature (low dimensional) 1

This formula is derived in [19] for the case in which the feature space and signal space are the same.

13

codeword X is initiated along the direction of the projection of that eigenvector into the feature space (this projection was approximated by the corresponding coordinates of the signal space eigenvector in the feature space). This procedure of decreasing the temperature and optimizing (10) continues as described, and every time the temperature hits the critical temperature of any cluster, the corresponding codeword is split. By proceeding in this fashion, the number of codewords increases to the desired value. At that point the splitting is stopped and the temperature is driven to zero while the parameters in (10) are optimized. In the limit as  ! 0 (and a large value of ) the randomness is highly limited while the distortion in the signal space is minimized through (10).

5 EXPERIMENTAL RESULTS To demonstrate the successful interpolation of the text images using interpolative VQ, we conducted several experiments. We rst present some examples of interpolation of a gray level image into a higher resolution gray level image using context-based (lapped) nonlinear interpolative VQ. We then consider the problem of rendering a gray level image into a higher resolution binary image. We compare the performance of NLIVQ and GIVQ with those obtained by standard interpolation methods. It is worth noting that we do not start with a binary image in the low resolution. In fact to obtain a better quality at the output of the interpolation process, we scan text images 256 gray scales rather than binary values. Thus, the goal is to map a low resolution gray level text image into a high resolution (binary) output. To train the interpolative VQ system, we required a set of 300 dpi images and their 600 dpi counterparts. Inasmuch as we did not have actual images at the two resolutions that were carefully co-registered, we generated the 300 dpi low resolution training images by lowpass ltering the 600 dpi training images using a simple separable lter with a row and column impulse response given by h(n) = :25 (n) + :5 (n ? 1) + :25 (n ? 2)

14

(19)

and downsampling by a factor of two horizontally and vertically. Our preliminary experiments shows that lowpass ltering prior to downsampling reduces slightly the artifacts in the interpolated images compared with simple downsampling without lowpass ltering. With this approach, there is a one-to-one corresponding between (e.g., 22) input blocks in the low resolution images and (e.g., 44) output blocks in the originally scanned high resolution training images. These pairs of blocks (xt ; yt ) in the low and high resolution spaces were used to train the system. The training set is obtained from a half page of single-spaced text in 8 point Times New Roman font that was scanned at 600 dpi. Then, as explained in Section 2.2, all the input blocks and their corresponding output blocks that were either all black or all white were removed (in NLIVQ) or reduced in density (in GIVQ) from the training set. Note that the codewords associated with all white or all black clusters are known in advance with the NLIVQ method so we can append them after training. However, this is not true in GIVQ. Therefore, in GIVQ the density of all black and all white blocks have been reduced to the density of the other clusters in the training data. The nal training set that is used contains about 60000 vector pairs (input, output). We trained on Times New Roman font and tested on one line text that was not included in the training set. Di erent inputs were tested including text in Courier, Times, New York, and Italic Times fonts. The results of the test on di erent fonts were similar to or slightly better than the result of the tests on the Courier font which are presented here. In the codebooks that we have designed, the encoder codebook

C

was rst chosen to have

dimension four (i.e., 22 gray level blocks) and size N , while the interpolative decoder codebook C

had a dimension of sixteen (i.e., 44 blocks) with the same number of codewords as codebook

C.

For a given low resolution gray level (256 levels) image, the input is divided into overlapping

22 blocks, then each block is quantized independently by the encoding rule to a codeword in codebook C . The index of the associated codeword is then used by the interpolative decoder (i.e., the 600 dpi VQ decoder) to select the four center pixels of the corresponding 44 codeword from codebook C  using a table lookup operation to produce the corresponding higher resolution image. 15

Although, the results are presented for interpolation by a factor of two, the method can be adapted for interpolation by a rational ratio (e.g., going from 200 dpi input to 300 dpi output). However, we may not be able to use lapped VQ and therefore the overall quality might be degraded. Moreover, the complexity of VQ can become an issue for some ratios (e.g., 4:7). Note that in re-sampling 4:7, the input and output blocks have to be 4x4 and 7x7 (or a multiple of this), respectively. Since a mean removed VQ cannot be used, the size of the codebooks must be increased for this case. It is worth noting that our primary results on mean removed VQ were not promising due to the need for more accurate estimation of the mean value of the higher resolution blocks from the low resolution block. Using the mean value of the low resolution blocks as an estimate of the mean value of higher resolution blocks resulted in blurred outputs.

5.1 Interpolation of Gray Level Text Images In the following experiments, we report the successful interpolation of text images using contextbased (lapped) NLIVQ. The NLIVQ system was trained over text printed in 8 point Times New Roman font and then tested using the Courier font. Codebook sizes of 128 and 64 were designed. As shown in Fig. 4 2 , the di erences between the image originally scanned at 600 dpi and the VQ images resulting from the codebook sizes of 128 and 64 are subtle. Comparing Fig. 4(c) with Fig. 4(d) shows that using context (lapped NLIVQ) improves the quality of the higher resolution image noticeably. Fig. 4(e) shows the output of an NLIVQ system that was trained on an 8 point font and applied to 10 point input text. All of these examples demonstrate that the lapped NLIVQ system produces very good high resolution text images for di erent font sizes and types. In another experiment, we compared standard interpolation techniques with the NLIVQ method. By inspection of the interpolated images shown in Fig. 5, it is obvious that the edges are reconstructed more sharply by our lapped NLIVQ than by either the bilinear or pixel replication methods. 2

The font is magni ed ten times by sample pixel replication so that ne di erences can be readily observed.

16

5.2 Interpolation and Binarization of Text Images As mentioned earlier, we proposed GIVQ mainly for simultaneously interpolating and binarization of text images. We trained a binary constrained GIVQ and an NLIVQ system over a training set (about 63000 vector pairs) consisting of binary high resolution (600 dpi) blocks and their corresponding gray level low resolution (300 dpi) blocks. Both the NLIVQ and GIVQ systems were trained using 8 point Times New Roman font text then tested using the Courier font at the same size. In this experiment only a codebook of size 64 was designed because of the extremely long codebook design time. Fig. 6(a) and (f)3 show the 300 dpi gray level image and the desired 600 dpi binary image, respectively. The results of interpolation from 300 dpi to a binary 600 dpi image using the pixel replication method, the bilinear interpolation technique, and the NLIVQ algorithm are presented in Fig. 6(b), (c) and (d), respectively. The binary images are produced by thresholding the gray level high resolution images generated by these methods. Furthermore, Fig. 6(e) shows the result of interpolation from 300 dpi to a binary 600 dpi image using the binary constrained GIVQ system. By inspection of these gures, we notice that the image generated by pixel replication is very jagged, and su ers from blocking e ects. Bilinear interpolation results in fat characters. Moreover, the serifs on the letters are lled in by this method. It is also noticeable that the thresholding NLIVQ produces burst errors. Comparison of the images shows that the GIVQ produces the binary image that is closest to the ideal binary image, although it is not perfect.

6 OPEN PROBLEMS AND CONCLUDING REMARKS In this paper, we presented two algorithms for text image resolution enhancement. We rst showed that the NLIVQ system that is implemented by contextual interpolation produced very good high resolution gray level text images for di erent font sizes and types. We also demonstrated the superi3

The font is again magni ed ten times by sample pixel replication so that ne di erences can be readily observed

17

ority of this interpolation technique with respect to standard methods such as bilinear interpolation and pixel replication. Then, based on the promising results of the NLIVQ algorithm, we proposed the generalized interpolative vector quantization (GIVQ) method to design jointly optimal codebooks for the encoder and decoder. Finally, we solved the problem of the joint optimization of interpolation and binarization under a common distortion measure by using a binary constrained GIVQ algorithm. Consequently, by this method, the interpolation and binarization of images are carried out in a single step by a table lookup operation. Our preliminary results showed better performance of GIVQ over NLIVQ for binary images. Further improvement of the high resolution gray level images that are produced by the NLIVQ method can probably be achieved by using a larger codebook and very long training sequence. The immediate consequences of this solution are the additional searching time for encoding and the hardware complexity. A more appropriate approach might be to use a distortion measure based on human visual perception. Recall that in VQ training we used a squared-error distortion measure. Using a frequency-weighted-squared-error measure [15] in the VQ training might improve the quality of the resulting images without adding complexity to the interpolation process. It would be an interesting investigation to consider the following modi cations for the joint interpolation and binarization method (binary constrained GIVQ). Recall that at each optimization step of the annealing process, we selected the binary codewords by a simple thresholding operation, which gives the nearest binary codeword according to the mean-squared-error distortion measure. An interesting open problem is to modify the optimization of free energy in the binary GIVQ algorithm so that the amount of distortion energy falling into the human visual passband is minimized. This may be done by a full search (among all the possible binary vectors) to select a binary vector (for the decoder codebook) that minimizes a low frequency-weighted-squared-error distortion at each optimization step of the annealing process. A simpler method to improve the results of binary interpolated images might be formulated by observing the signi cant e ects of burst errors on the quality of the reconstructed binary image. 18

One can modify the optimization of free energy in the binary GIVQ algorithm so that the maximum quantization error in the signal space is minimized (minimax constraint). This reduces the burst errors whose contributions to the total quantization error are very small but have signi cant impact on the binary image quality. Finally, a simple post-processing step (e.g., median ltering) might improve the uniformity of text image features in the GIVQ binary output. It is also an interesting open problem to compare the result of the binary constrained GIVQ method with that of techniques based on search for optimal binary representatives [24], [25] for text images.

19

References [1] A. Aldroubi, M. Unser, and M. Eden, \Cardinal Spline Filters: Stability and Convergence to the Ideal Sinc Interpolator", Signal Proc. , vol. 28, pp. 127{138, 1992. [2] B. Zeng and A. N. Venetsanopoulos, \A Comparative Study of Several Nonlinear Image Interpolation Schemes", Proc. of SPIE , vol. 1818, 1992, pp. 21{29. [3] A. Lehtonen and M. Renfors, \Nonlinear Quincunx Interpolation Filtering", Proc. SPIE , vol. 1360, 1990, pp. 135{142. [4] J. Allebach, and P. W. Wong, \Edge-Directed Interpolation", Proc. IEEE Intern. Conf. on Image Proc., pp. 707{710, 1996.

[5] S. D. Bayrakeri and R. M. Mersereau, \A New Method for Directional Image Interpolation", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., pp. 2383{2386, 1995.

[6] K. Jensen and D. Anastassiou, \Spatial Resolution Enhancement of Images using Nonlinear Interpolation", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., pp. 2045{2048, 1990. [7] V. R. Algazi, E. F. Gary, and R. Potharlanka, \Directional Interpolation of Images Based on Visual Properties and Rank Order Filtering", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., pp. 3005{3008, 1991.

[8] S. G. Chang, Z. Cvetkovic, and M. Vetterli, \Resolution Enhancement of Images Using Wavelet Transform Extrema Extrapolation", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., pp. 2379{2382, 1995.

[9] W. K. Carey, D. B. Chuang, and S. S. Hemami, \Regularity Preserving Image Interpolation", Proc. IEEE Intern. Conf. on Image Proc., 1997.

20

[10] B. Simon, J. Y. Mertes, Ph. Ciblat, and B. Macq, \Local Interpolation in Multiresolution Decomposition of Images", Proc. IEEE Intern. Conf. on Image Proc., 1996. [11] P.C. Cosman, K.L. Oehler, E.A. Riskin, and R.M. Gray, \Using Vector Quantization for Image Processing", Proc. IEEE , Vol. 81, 1326{1341, 1993. [12] K. L. Oehler and R. M. Gray, \Combining Image Classi cation and Image Compression Using Vector Quantization", in Proc. IEEE Data Compression Conf. , pp. 2{11, 1993. [13] P. C. Cosman, E.A. Riskin, and R.M. Gray,\Combining Vector Quantization and Histogram Equalization", in Informat. Processing Manag. , vol. 28 , no. 6, pp. 681{686, 1992. [14] P. Ning and L. Hesselink, \Vector Quantization for Volume Rendering", in Proc. 1992 ACM Workshop on Volume Visualization , pp. 69{74, 1992.

[15] R.A. Vander Kam, P.A. Chou, E.A. Riskin, and R.M. Gray, \An Algorithm for Joint Vector Quantizer and Halftoner Design", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., vol.3, pp. 497{500, 1992.

[16] A. Gersho, \Optimal Nonlinear Interpolative Vector Quantization", IEEE Trans. on Comm., 38:1285{1287, 1990. [17] F. Fekri, R. M. Mersereau, and R. W. Schafer, \A Generalized Interpolative VQ Method for Jointly Optimal Quantization and Interpolation of Images", Proc. IEEE Intern. Conf. on Acoustics, Speech, and Sig. Proc., vol.5, pp. 2657{2661, 1998.

[18] A. Gersho, \Optimal Vector Quantized Nonlinear Estimation", in IEEE Intl. Symp. on Inform. Theory, p. 170, 1993.

[19] K. Rose, E. Gurewitz, and G. C. Fox, \Vector Quantization by Deterministic Annealing", IEEE Trans. On Inform. Theory , Vol. 38, pp. 1249{1258, No. 4, July 1992.

21

[20] Y. Linde, A. Buzo, and R. M. Gray, \An Algorithm for Vector Quantizer Design", IEEE Trans. On Comm., 28:84{95, 1980.

[21] X. Wu, and K. Zhang, \A Better Tree-Structured Vector Quantizer", in Proc. Data Compression Conf. , pp. 392{401, 1991.

[22] F. Fekri, R. M. Mersereau, and R. W. Schafer, \Enhancement of Text Images Using a Context Based Nonlinear Interpolative Vector Quantization Method", Proc. IEEE Intern. Conf. on Image Proc. , pp. 237{241, 1998.

[23] A. Rao, D. Miller, K. Rose, and A. Gersho, \A Generalized VQ Method for Combined Compression and Estimation", Proc. IEEE Intern. Conf. on Acoustics Speech and Sig. Proc., pp. 2032{2035, 1996. [24] M. Analoui, and J. P. Allebach, \Model Based Halftoning Using Direct Binary Search", in Proc. SPIE ,vol. 1666, pp. 96-108, 1992.

[25] D. J. Lieberman, and J. P. Allebach, \Digital Halftoning Using the Direct Binary Search Algorithm", in Proc. IST International Conference on High Technology, pp. 114-124, 1996.

22

Input Image

-

300 dpi Scanner

X

Nonlinear - Interpolator

Y

-

Output -

600 dpi Printer

Image

Figure 1: De nition of the basic problem.

X

Encoder

i

Code Book C

Interpolative Decoder

Y

Code Book C*

size=N

size=N

Figure 2: Block diagram of the Interpolative Vector Quantization method for text image interpolation.

2 by 2 non-overlapping input blocks

2 by 2 overlapping

4 by 4 non-overlapping output blocks

input blocks

2 by 2 output blocks

Figure 3: Illustration of context and non-context block mapping.

23

(a) Originally scanned 600 dpi image (8 point Courier font magni ed for displaying).

(b) NLIVQ with codebook size of 128 tested on 8 point Courier.

(c) NLIVQ with codebook size of 64 tested on 8 point Courier.

(d) NLIVQ without using context and with codebook size of 64 tested on 8 point Courier.

(e) NLIVQ with codebook size of 128 tested on 10 point Courier. Figure 4: Comparison of the results of the interpolation from 300 dpi to 600 dpi using di erent NLIVQ systems trained on the 8 point Times New Roman font and then tested on the Courier font. 24

(a) Originally scanned 600 dpi image (8 point Courier font).

(b) Interpolated from 300 dpi to 600 dpi using codebook size of 128 trained on 8 point Times New Roman.

(c) Interpolated from 300 dpi to 600 dpi using bilinear interpolation.

(d) Interpolated from 300 dpi to 600 dpi using pixel replication. Figure 5: Comparison of the results of standard interpolation techniques with that of the NLIVQ system trained using Times New Roman font and tested on 8 point Courier font. 25

(a)Blurred 300 dpi 8 point Courier font.

(b) High resolution binary image obtained by using pixel replication and thresholding.

(c) High resolution binary image obtained by using bilinear interpolation and thresholding.

(d) High resolution binary image obtained by the NLIVQ method and thresholding.

(e) High resolution binary image obtained by binary constrained GIVQ.

(f) Ideal 600 dpi 8 point Courier font. Figure 6: Comparison of the results of the standard interpolation and binarization techniques with those of the GIVQ and NLIVQ systems with codebook size of 64 trained on the 8 point Times New Roman font. 26