Technical Report CS-2005-08 - CS Technion

Report 3 Downloads 31 Views
Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

A Multigrid Approach to the 1-D Quantization Problem

frequently referred to as the mean squared error (MSE). In the discrete case, where p(x) is non-zero at only a finite number of points {xi }N i=1 , a similar functional is minimized

Yair Koren, Irad Yavneh and Alon Spira Department of Computer Science Technion—Israel Institute of Technology Haifa 32000, Israel May, 2003

2

D(q) = E[kX − q (X)k ] = n X X (xj − ri )2 p (xj ). i=1 xj ∈(di−1 ,di )

Abstract— A multigrid framework for the 1-D (scalar) quantization problem is presented. The framework is based on the Lloyd-Max iterative process, which is a central building block in many quantization algorithms. This process iteratively improves a given initial solution and generally converges to a local minimum of the quantization distortion. Contrary to the classical Lloyd-Max process, the convergence of the multigrid algorithm is practically independent of the number of representation levels sought. Using this approach, a local minimum with machine accuracy is reached in a matter of several iterations and the need for traditional stopping criteria for the process is alleviated. In addition, the proposed method often achieves better local minima than other quantization methods such as classical Lloyd-Max and LBG. The complexity of the proposed method is O(n) operations for the continuous case and 3N + O(n) for the discrete case, where n is the number of representation levels sought and N is the size of the discrete probability density function.

In practice, when solving the discrete problem (2), it is usually assumed that ri ∈ {xj }N j=1 for all i’s. Certain error measures, such as MSE, as well as other monotonic measures, allow polynomial time globally optimal solvers, [5], [16], [19] for (2) with discrete ri ’s. These solvers use a dynamic programming scheme which greatly reduces the complexity of an exhaustive search, while maintaining global optimality. We are unaware of polynomial time optimal solvers neither for the discrete (2) nor continuous (1) problems where the ri ’s are continuous. The complexity of the fastest algorithm [19] for globally minimizing (2) with discrete ri ’s and the MSE error measure is O(nN ). This algorithm exploits quantizer monotonicities along with an elegant matrix search algorithm [1]. In the sequel, we present an algorithm which tries to minimize (1) in O(n) and (2) in 3N + O(n) operations. Our algorithm assures only local optimality but permits continuous p(x) and ri ’s. Moreover, for large enough n, educated initialization is possible [18], [15], which greatly increases the chances that our method will find the global minimum. Henceforth, we treat the continuous problem (1) where both p(x) and the ri ’s are continuous and permit p(x) to be any non-negative integrable function, not necessarily a distribution density function. The results throughout this paper equally apply to the discrete case (2). Necessary conditions for a minimum are

Index Terms— Quantization, Lloyd-Max, LBG, multigrid, FAS.

I. I NTRODUCTION Quantization [8], [9] is the process of representing a continuum with only a finite number of representatives or representing an initially rich amount of discrete data with fewer representatives. Rounding off real numbers to the nearest integer is a simple form of quantization. More formally, given a random process whose probability density function p : [L, R] → IR, or an approximation thereof, is known and a positive integer n, a quantizer q : [L, R] → IR is defined by n representation levels {ri }ni=1 and n decision regions {(di−1 , di )}ni=1 , where L = d0 < r1 < d1 < ... < di−1 < ri < di < ... < dn−1 < rn < dn = R. That is, q(x) is a piecewise constant approximation of x where all x ∈ (di−1 , di ) are represented by ri . An optimal quantizer achieves minimal distortion, which is defined as the expectation of the squared quantization error

∂ D = 0, ∂ri ∂ D = 0, ∂di

1≤i≤n 1 ≤ i ≤ n − 1.

These conditions yield

2

D(q) = E[kX − q (X)k ] = n Zdi X 2 kx − ri k p (x) dx

(2)

ri =

(1)

i=1d

Rdi

xp(x)dx

di−1

Rdi

, p(x)dx

di =

ri + ri+1 . 2

(3)

di−1

i−1

where k · k is a norm. Henceforth, we assume kx − yk2 = (x − y)2 , i.e., the L2 norm. This error measure is

Lloyd [12] and Max [14] came up (separately) with the above equations and proposed “ping-ponging” back 1

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

The Lloyd-Max process, although not applied to a discretized differential equation, is a local (nonlinear) process, somewhat similar to damped Jacobi relaxation. We show that the multigrid framework can accelerate this process, yielding convergence rates which appear independent of the number of representation levels. These results are similar to those of applying multigrid methods to elliptic differential equations, for which they are considered optimal. To a large extent, our motivation for studying the scalar problem is to develop insight and tools for dealing with the vector quantization problem which is considered more important and is considerably more difficult. In most other applications, multigrid methods have been readily generalized to higher dimensions. We are therefore optimistic about the relevance of the methods developed here to the vector quantization problem. Of course, the present method can already be used for certain vector quantization algorithms which use scalar quantization as a sub-procedure [2]. Since Lloyd-Max is a gradient descent process, we believe that our method may work with other distortion measures too, as long as they are differentiable. We have not tested our method with other distortion measures. The paper is organized as follows. Section II provides a detailed description of the multigrid FAS algorithm and its application to the Lloyd-Max process. Section III analyzes the computational complexity of the multigrid algorithm. Section IV analyzes the convergence for constant p(x). Section V compares between the classical Lloyd-Max, LBG and multigrid methods with respect to convergence rates and final distortion value achieved. Section VI is a conclusion and proposition of future work.

and forth between them, dki =

k−1 rik−1 + ri+1

2

,

k = 1, 2, 3, ...

(4)

,

k = 1, 2, 3, ...

(5)

k

rik

=

Rdi

xp(x)dx

dk i−1 k

Rdi

p(x)dx

dk i−1

that is, optimizing the di ’s according to the ri ’s, followed by optimizing the ri ’s according to the new di ’s, and so on. r0 is an initial guess at the optimal representation levels. This process converges monotonically to a solution which satisfies (3) (see [12]). Fleischer showed that a sufficient condition for a globally minimal distortion is that log(p(x)) be concave [7]. In general, a minimum achieved by the above descent process is not necessarily global. Since the Lloyd-Max iterative process requires an initial guess at the solution in order to start iterating, many heuristic methods have been proposed which try to find a good initial approximation [18], [13], [11], [15]. After acquiring such an initial approximation, LloydMax iterations may be applied in order to bring it to a local minimum. As an example, the LBG [11] algorithm is a grid refinement process which starts by finding a solution (using the generalized Lloyd algorithm) for two ri′ s and then splitting each ri into two ri′ s. This process is repeated recursively until the number of ri′ s equals n. The Generalized Lloyd algorithm is the above iterative process (4), (5) with the improvement in distortion used as a stopping criterion, that is, if the improvement made at a certain iteration does not exceed some predefined threshold, the process is stopped. The multigrid framework [3], [4], [17] is an efficient iterative method mostly used to numerically solve elliptic partial differential equations. Such equations can be solved iteratively by applying relaxation schemes such as Jacobi or Gauss-Seidel. These schemes are local, due to the local nature of elliptic differential equations. As such, relaxation schemes can eliminate only local (high frequency) errors. Large-scale (low frequency) errors cannot be detected locally and are not, therefore, reduced efficiently. The multigrid framework recursively applies local relaxations to decimated versions of the data, exploiting the fact that low frequencies on a fine grid transform to relatively higher frequencies on coarser grids. The idea is that all errors are local on some scale, and they can be reduced efficiently by a local process only on that scale.

II. T HE M ULTIGRID A LGORITHM The key to speeding up the Lloyd-Max iterative process is the multigrid full approximation scheme (FAS), introduced in 1977 by Brandt [3]. The method is almost exclusively used for solving (usually nonlinear) partial differential equations [4], [17]. Iterative methods for solving boundary value problems often consist of discretizing the problem on a uniform grid and approximating the derivatives using some finite difference (or volume or element) scheme. Once this is done, an initial solution can be improved by iterative relaxations. The problem is that derivatives are local. That is, at each iteration, the approximate solution at each point is directly affected by its local neighbors alone. In contrast, the solution at any given point generally depends on the solution at all other points in the 2

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

Initially f is zero, but this will change at coarser resolutions. The role of f is to adapt the coarse-grid equations to represent the original fine grid problem, e.g., [17, §5.3.7]. Without a modified f , each grid approximates a different problem, namely, the quantization problem corresponding to that grid. In fact, a modified f on the coarse grids is the key difference between multigrid and LBG. d˜ denotes the current approximation (the tilde will be neglected when no confusion may arise). The residual is defined by ˜ = f − L(d). ˜ r(d) (9)

domain, many of which are distant. Propagation of the solution from the boundaries inwards, for example, may require many iterations. Multigrid methods overcome this problem. We first describe the FAS algorithm informally, and later proceed with the formal definition. Instead of exhaustively applying Lloyd-Max iterations (4) and (5), only a small number (ν1 ) of such relaxations is performed. The resulting approximate solution is then coarsened by restriction (sampling) to half the resolution. This “descent” process consisting of relaxation/restriction is performed recursively until only a single variable remains, at which point a solution can easily be calculated. This coarse solution is used to generate a correction term which is interpolated at double the resolution. The correction term is added to the next level (finer) approximate solution, where a small number (ν2 ) of additional Lloyd-Max relaxations are performed. This “ascent” process consisting of interpolation/relaxation is repeated recursively until returning to the initial finest resolution with a new approximate solution. The whole process is called a “V-cycle”. “V ” represents the coarsening of the approximate solution until reaching the coarsest resolution (descent), followed by refining (ascent) until returning to the original finest resolution with a new approximate solution. For a more formal description of the algorithm, we adopt the following conventions. We combine the expressions for both the representation levels and the decision regions (3) into a single expression for the decision regions   dR i+1 Rdi xp(x)dx xp(x)dx    1 di−1 di  , di =  d + d  i i+1 2 R (6) R  p(x)dx p(x)dx di−1

Note that r(d˜old ) = d˜new − d˜old ,

where d˜new is the result of applying a Lloyd-Max relaxation to d˜old (12). Upper indices will henceforth denote the number of decision regions, n, which will be omitted when no confusion may arise. For simplicity, n we will assume n to be a power of two. In/2 will represent the interpolation operator which is applied to the correction term obtained from the coarser, n/2-region n/2 problem. In will represent the restriction operator for n/2 the residual r(d˜n ), and I˜n the restriction operator for the approximate solution d˜n (from resolution n to resolution n/2). Each V-cycle is considered one iteration. Like the Lloyd-Max iteration, the V-cycle’s input is an initial guess and its output is a new approximation. A V-cycle is recursively defined as follows: d˜n ← V n (d˜n , f n ) 1) Relax L(d˜n ) = f n , ν1 times. 2) If n = 2 a) return d˜n Else

di

i = 1, . . . , n − 1 .

a) Calculate the residual: r(d˜n ) = f n − L(d˜n ) n/2 b) Set d˜n/2 = I˜n d˜n n/2 and f n/2 = In r(d˜n ) + L(d˜n/2 ). c) Recursively call d˜n/2 ← V n/2 (d˜n/2 , f n/2 )

We define the operator L by   dR i+1 Rdi xp(x)dx xp(x)dx    1 di−1 di   . (7) L(d)i ≡ di −  d + d  i+1 2  Ri R  p(x)dx p(x)dx di−1

3) Interpolate and add the correction: n/2 n d˜n ← d˜n + In/2 (d˜n/2 − I˜n d˜n ) n n ˜ 4) Relax L(d ) = f , ν2 times.

di

L(d) can be considered the gradient of the distortion. The problem is now defined by L(d) = f,

(10)

At the coarsest resolution, n = 2, where d˜n consists of a single coordinate, the ν1 relaxations can be replaced by an exact calculation of the solution. Henceforth, we denote the V-cycle with ν1 pre-relaxations and ν2 postrelaxations Vν1 ,ν2 . We generally use V1,1 cycles which will be assumed from now on unless otherwise stated.

(8)

where d is the vector of inner decision-region boundaries, [d1 , d2 , ..., dn−1 ]T , and f the right-hand side. 3

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

To further clarify the role of f , observe that the n/2 resolution equation is (step 2b) L(d˜n/2 ) = Inn/2 (f n − L(d˜n )) + L(I˜nn/2 (d˜n )).

of the algorithm. Using this notation, our interpolation scheme is defined by  n/2 ci/2 for even i ,        ˜n ˜n h i di+1 −di n/2 n In/2 (cn/2 ) = ˜n c(i−1)/2 + d˜n  i+1 −di−1 i    n n    d˜i −d˜i−1 cn/2 for odd i . ˜n (i+1)/2 − d d˜n i−1 i+1 (16) Here, d˜n is the fine-resolution approximation before the coarse-resolution correction is added. We use this nonlinear interpolation to maintain the strict order of d˜n , i.e., d˜ni > d˜ni−1 , i = 1, . . . , n. This property is not ensured by the usual linear interpolation, for example. The interpolation scheme (16) preserves the order of d˜n , provided that d˜n/2 is itself properly ordered. However, the ordering might still be lost during the relaxation. While the usual Lloyd-Max process always preserves ordering, our Lloyd-Max relaxations (12), at all but the finest resolution, include a modified non-zero right hand side f , which may cause disordering. Therefore, after each relaxation, the ordering is tested and, if compromised, the properly ordered d˜n from before the relaxation is used. Note that the ordering of d˜n cannot be compromised at the finest resolution, since f n = 0 there. Even when ordering is preserved, convergence is not assured. A slight variation of our algorithm, which ensures monotonic convergence, is described in appendix .

(11)

Upon convergence, since the residual, f n − L(dn ), is n/2 zero, we obtain L(dn/2 ) = L(I˜n (dn )). Hence, as the process converges, the coarse solution, dn/2 , tends to the fine solution, dn , restricted to the coarse resolution (e.g., sampled at every other point). Note that the probability density function p(x) remains at the finest resolution (continuous) throughout the algorithm. The operator L at different resolutions is defined by (7) and differs only due to the different ˜ densities of the approximate solution d. We have yet to define the relaxation method and interpolation/restriction schemes. We propose the LloydMax process as the relaxation. The Lloyd-Max equations (4) and (5) with only the decision regions as variables are  old  dold dRi Ri+1 xp(x)dx xp(x)dx    dold 1 i−1 dold  i new di = f +  old + old  (12) di+1  2  dRi R   p(x)dx p(x)dx dold i−1

dold i

with 1 ≤ i ≤ n − 1. We actually prefer an over-relaxed version of this method, given by

+ ωdnew = dold + ωr(dold dover = (1 − ω)dold i i i ), (13) i i where dnew is given by (12) and ω is a constant i relaxation parameter. We use ω = 4/3, suggested by our analysis in section IV. n/2 The restriction scheme, I˜n , for the current approximation, d˜n , is injection. That is, when transferring the approximate solution to a coarser resolution we simply sample d˜n at the even indices: h i (14) I˜nn/2 d˜n = d˜n2i , i = 1, . . . , n/2 − 1 .

III. C OMPLEXITY OF FAS We analyze the computational complexity of the V1,1 cycle, that is, ν1 = ν2 = 1, for both the discrete (2) and continuous (1) cases. A. Discrete Quantization

i

n/2 The restriction scheme for the residual r(d˜n ), In , is the so-called full weighting operator, scaled by a factor 4, yielding h i Inn/2 r(d˜n ) = r(d˜n )2i−1 +2r(d˜n )2i + r(d˜n )2i+1 ,

The multigrid algorithm consists of several parts; relaxation (step 1), computing the residual (step 2a), restriction of both the solution and the residual, computing L(d˜n/2 ) (step 2b) and interpolation (step 3). In this section we shall assume that all integrals in referenced equations are sums, appropriate for the minimization of (2). Pk Before we begin, we pre-compute i=1 xj p (xj ) and Pk i=1 xj , for 1 ≤ k ≤ N . This operation requires 2N additions and N multiplications and allows us to compute, for example, each of the ratios in (12) using 2 subtractions and one division.

i

i = 1, . . . , n/2 − 1 .

(15) The factor 4 is a necessary scaling, which is explained in the analysis section. n/2 Let cn/2 = d˜n/2 − I˜n d˜n denote the coarse-resolution correction that is interpolated and added to d˜n in step 3 4

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

The pre-computation requires a one time effort of 2N multiplications and N additions.

As an example we shall analyze the complexity of a Lloyd-Max iteration. Computing dnew in (6) requires 3 i additions/subtractions, 1 multiplications and 1 divisions. This is true because only one ratio of sums in (12) needs to be computed for each i. Computing dnew on the fine grid requires 3n additions/subtraction, n multiplications and n additions. Consider applying this Lloyd-Max process on all grids. Recall that the FAS algorithm is recursive. Summing up the complexity over all grids yields at most 12n addtions/subtractions 4n Plog n multiplications and 4n divisions since i=0 2−i < 2 and the operation is performed both on the way down (coarsening) and up (refining). Table I presents the computational complexity of the V1,1 cycle. section over-relaxation (13) computing residual (9) solution restriction (14) residual restriction (15) computing L(d˜n/2 ) interpolation (16) overall

per coordinate div. mul. add. 1 2 6 1 1 5 0 0 0 0 1 2 0 0 0 1 2 4 3

6

17

B. Continuous Quantization In the continuous case (1), no pre-computation is needed. When p(x) and xp(x) are analytically integrable, their integral may be computed using their primitive functions. Otherwise, for continuous functions p(x), an arbitrarily accurate numerical integration scheme may be used, such as Gauss quadrature, e.g. [6, §7.4]. A Lloyd-Max relaxation, therefore, requires an additional 2n integral evaluations, as does a residual computation. Summing up over all grids, yields the following result: the complexity of a single V1,1 cycle in the continuous case is 12n integral evaluations, 7n divisions, 13n multiplications and 26n additions.

on the fine grid div. mul. add. 2n 4n 12n n n 5n 0 0 0 n 0 n 2 0 0 0 n n 2n 2 7n 2

13n 2

Since the integral evaluations are the bulk of the complexity in the continuous case and since the Lloyd-Max process requires 2n integral evaluations per iteration, the complexity of a V1,1 cycle is roughly 6 times the complexity of a Lloyd-Max iteration.

20n

TABLE I T HE COMPLEXITY OF THE VARIOUS PARTS OF THE V1,1 MEASURED IN EXACT DIVISION ( DIV.), MULTIPLICATION ( MUL .) AND ADDITION / SUBTRACTION ( ADD .) OPERATIONS .

IV. C ONVERGENCE A NALYSIS An exact quantitative convergence analysis of the multigrid algorithm for the nonlinear problem is beyond reach. We therefore analyze the case where p(x) is constant, for which the problem is linear and lends itself to a Fourier analysis. First introduced into multigrid methods in [3], this approach has become the standard tool for accurate prediction of the convergence behavior of multigrid algorithms [17]. Nevertheless, we include sufficient detail to maintain a self-contained presentation. Much experience with multi-level algorithms, supported by numerical results below, indicates that the analysis is relevant also for the nonlinear problem, so long as p(x) undergoes slow relative variations. Assume then that p(x) ≡ 1. Equation (8) yields   dR i+1 Rdi xdx xdx    1 di−1 di n = L(d)i = (L d)i = di −  +   dR i+1 2  Rdi  dx dx

Remarks regarding Table I. 1) Only the relaxations are performed both when coarsening and when refining. Restriction (14), (15), residual computation (9) and computation of L(d˜n/2 ) (7) is performed only when coarsening. Interpolation (16) is performed only when refining. 2) The restriction of the solution is simply injection and therefore requires no computational effort. 3) The finest restricted residual has n/2 coordinates, which is why only n/2 points need to be computed on the fine grid for it. 4) The same L(d˜n/2 ) is used at steps 1 and 2b. We therefore account for this computation at step 1 alone which is why the corresponding row of table I contains zeros. 5) The interpolation (16) is performed at each grid only for half of the points. Also, for any odd point, the two coefficients in (16) add up to 1. Summing up over all grids, yields the following result:

di−1

di −

the complexity of a single V1,1 cycle in the discrete case is 7n divisions, 13n multiplications and 40n additions/subtractions.

1 2

di

 ! d2i − d2i−1 /2 d2i+1 − d2i /2 + = di − di−1 di+1 − di 

−di−1 + 2di − di+1 =0 4

5

(17)

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

for i = 1, . . . , n − 1, with boundary conditions d0 = 0, dn = 1. Here, Ln is a tri-diagonal matrix of size n−1 by n−1 with 1/2 on the main diagonal and -1/4 on the upper and lower diagonals. Equation (17) is nothing but the 1D discrete Laplace equation, scaled by a constant. Note that Equation (17) lacks the division by the squared mesh width h2 , which is constant for any specified grid. When coarsening the grid, the squared grid width becomes H 2 = 4h2 . This explains the factor of four in the residual transfer (15). For the linear problem (17), the Lloyd-Max relaxation (12) reduces to

Remarks. 1) The spectral radius of Rn is given by   ˆ kn | = cos2 π < 1 . ρ(Rn ) = max |R k 2n Thus, the Lloyd-Max relaxation converges but, for n ≫ 1, we obtain  π 2 + O(n−4 ) ρ(Rn ) = 1 − 2n implying that (except for very special initial approximations), O(n2 ) iterations are asymptotically required to reduce the error by an order of magnitude. 2) Since, by (17), (18), Ln = I − Rn , where I is the n − 1 by n − 1 identity matrix, the vectors ϕnk are also eigenvectors of Ln , with eigenvalues     kπ kπ 2 n 2 ˆ Lk = 1 − cos = sin , 2n 2n k = 1, . . . , n − 1.

old dold + dold i−1 + 2di i+1 , i = 1, . . . , n − 1 . (18) 4 We first analyze the convergence of this iteration. Let eold ≡ d − dold and enew ≡ d − dnew denote the errors in our current approximation of the solution, d, before and after a single Lloyd-Max iteration. Then, eold and enew evidently satisfy (18), with boundary conditions = eold = enew = 0. That is, the error = enew eold n n 0 0 propagation is given by

dnew = i

Kieffer [10] has shown that in obtaining each decimal digit of precision of the distortion (1), O(1) Lloyd-Max iterations are required. This result ignores the influence of the number of representation levels on the rate of convergence. For the simple case where p(x) ≡ 1, we have shown here, that the number of iterations required is in fact quadratic in the number of representation levels. We next turn to the analysis of the multigrid V-cycle. We use the standard approach of restricting the analysis to a two-grid algorithm, where the problem at the coarse resolution is solved exactly rather than recursively. The fine resolution, n, is assumed to be even. Let eold and enew now denote the errors in the approximations to d before and after a single complete two-grid FAS cycle, as described by the algorithm in Section II. The algorithm is conveniently divided into three parts: (a) ν1 LloydMax pre-relaxation iterations; (b) the coarse-resolution correction; (c) ν2 post-relaxation iterations. Accordingly, the error propagation can be written as

enew = Rn eold where Rn is the n − 1 by n − 1 tri-diagonal iteration matrix, with 1/2 on the main diagonal and 1/4 on the upper and lower diagonals. We can determine the asymptotic convergence behavior of the error by the standard eigenvalue analysis. Theorem 1: Let Rn be the Lloyd-Max iteration matrix. Then, the kth eigenvector of Rn , k = 1, . . . , n − 1, is the vector ϕnk , whose ith component is given by   kπi n , i = 1, . . . , n − 1 (19) (ϕk )i = sin n and the corresponding kth eigenvalue is given by   ˆ n = cos2 kπ . R (20) k 2n Proof. For i = 1, . . . , n − 1,  1 n (ϕ ) + 2 (ϕnk )i + (ϕnk )i+1 (Rn ϕnk )i = 4  k i−1    kπ(i − 1) kπi 1 sin + 2 sin = 4 n n   kπ(i + 1) + sin n      kπ kπi 1 1 + cos sin = 2 n n n n ˆ = R (ϕ ) k

enew = M n eold = (Rn )

ν2

ν1

C n (Rn )

eold

(21)

where C n is the n − 1 by n − 1 coarse-resolution correction matrix, which is derived next. ν Let epre = (Rn ) 1 eold denote the error immediately after the pre-relaxation and before the coarse-resolution correction step, and let epost = C n epre

k i

denote the error immediately after the coarse-resolution correction. Thus, the approximation, dpre , just before the coarse-resolution correction is given by d−epre , and the

using trigonometric identities and the fact that for i = 0 and i = n (19) equals zero. 6

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

1) The kth eigenvector of Ln/2 , k = 1, . . . , n/2 − 1, n/2 is the vector ϕk , whose ith component is given by     kπi n/2 = sin ϕk , i = 1, . . . , n/2 − 1 n/2 i

residual which is computed in step (2a) of the algorithm is therefore given by rn = f − Ln (d − epre ) = Ln (epre )

(22)

where we have used the linearity of Ln and the fact that Ln d = f . (In the present problem, f ≡ 0, but we generalize slightly for similarity to the general case). In step (2b) of the algorithm, the residual is calculated. Since we are analyzing the error, epre , the residual can be calculated by applying the operator Ln (22). In step (2c) of the algorithm, the residual is transferred to the coarse n/2 resolution using the restriction operator, In . Next the coarse-resolution problem, (11), is solved (exactly, under the assumptions of this analysis) by application of the operator (Ln/2 )−1 . Note that (Ln/2 )−1 transforms the coarse approximation to the fine-grid residual into a coarse approximation to the fine grid error. Finally, in step (2d) of the algorithm, the correction is interpolated n and added to the current fine-resolution using In/2 approximation, i.e., dpost = d − epost is equal to dpre plus the interpolated correction. Hence, epost is equal to epre minus the interpolated correction. Following these steps we obtain: −1  n Ln/2 Inn/2 Ln . (23) C n = I − In/2

and the corresponding kth eigenvalue is given by   kπ n/2 2 ˆ . Lk = sin n 2) For any real constants, a and b, we have for k = 1, . . . , n/2 − 1,  Inn/2 aϕnk + bϕnn−k =      kπ kπ n/2 4 a cos2 − b sin2 ϕk 2n 2n (26) which is an (n/2 − 1)-vector of zeros when k = n/2. 3) For k = 1, . . . , n/2 − 1,     kπ kπ n/2 2 n n 2 ϕk − sin ϕnn−k . In/2 ϕk = cos 2n 2n Proof. 1) The proof is analogous to that of Theorem 1 and the associated Remark 2. 2) h i Inn/2 aϕnk + bϕnn−k   i   kπ(2i − 1) 2kπi = a sin + 2 sin n n   kπ(2i + 1) + sin n      (n − k)π(2i − 1) 2(n − k)πi + b sin + 2 sin n n   (n − k)kπ(2i + 1) + sin n     kπ 2kπi 1 + cos = 2a sin n n     2(n − k)πi (n − k)π + 2b sin 1 + cos n n      kπ 2kπi a 1 + cos = 2 sin n n    kπ −b 1 − cos n      kπ kπ n/2 2 2 − b sin (ϕk )i . = 4 a cos 2n 2n

Here, Ln/2 —the coarse-resolution approximation to Ln —is the n/2 − 1 by n/2 − 1 tri-diagonal matrix with 1/2 on the main diagonal and -1/4 on the upper and lower diagonals, and I is the n − 1 by n − 1 identity matrix. n/2 The full-weighting restriction operator, In (15), is the matrix of size n/2 − 1 by n − 1, given by   2 for j = 2i i h 1 for j = 2i ± 1 (24) = Inn/2  i,j 0 otherwise .

n Our interpolation matrix, In/2 , is nonlinear (16). For the analysis, where p(x) ≡ 1, we linearize it around the known solution of equal-sized decision regions, which yields simple linear interpolation. The corresponding matrix is just a scaled transpose of the restriction: 1  n/2 T n . (25) I In/2 = 4 n

The vectors ϕnk are not eigenvectors of C n or M n . However, we next show that the two-dimensional subspaces spanned by pairs {ϕnk , ϕnn−k } are invariant under multiplication by M n . This will allow us to compute the spectral radius of M n and deduce the asymptotic convergence factor of the two-grid V-cycle. Lemma 1: 7

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

3) For linear interpolation and i = 1, . . . , n − 1, k = Proof. This follows from Lemma 1 and Definition 1. 1 . . . , n/2 − 1, we have,   Remarks. n/2 n = In/2 ϕk i 1) Theorem 2 implies that M n can be decomposed (  sinh kπi for even i into an almost (2 × 2)-block diagonal matrix D n   i kπ(i+1) kπ(i−1) 1 and a diagonalizing matrix V . Formally, M n = sin + sin for odd i 2 n n V DV T , where   Therefore, due to the fact that ˆn ... M 0 0      1 1 kπ(i + 1) kπ(i − 1)  .  .. ..  ..  sin + sin = . . 0   2 n n D=      n ˆ 0 . . . M 0   n/2−1 kπ kπi cos sin ˆ n )ν1 +ν2 0 0 0 (R n n n/2 and

sin

and V is a matrix of column vectors given by



kπi n



    sin (n−k)πi   n =  − sin (n−k)πi n

for odd i

V = [ϕn1 , ϕnn−1 , . . . , ϕnn/2−1 , ϕnn/2+1 , ϕnn/2 ].

for even i

Note that D and V are both real (n − 1) × (n − 1) ˆ n )ν1 +ν2 is a scalar. By further matrices and (R n/2 the vector is uniquely representable as diagonalizing D we may obtain the eigenvalues the linear combination and eigenvectors of M n . The latter turn out to be a        1 kπ kπ pair of linear combinations of {ϕnk , ϕnn−k } for each 1 + cos ϕnk − 1 − cos ϕnn−k pair {ϕnk , ϕnn−k }, k = 1, . . . , n/2 − 1. In addition, 2 n n the vector ϕnn/2 is also an eigenvector of M n . from which the result follows. 2) Denote     kπ kπ 2 2 , sk = sin . ck = cos 2n 2n Definition 1: The following matrices are referred to as the symbols of the corresponding operators. Then, a straightforward computation yields   n ˆ   L 0 k ˆn sk ck L = k,n−k ˆn = ˆn C 0 L k n−k sk ck   n ˆ R 0 n k ˆ Rk,n−k = and ˆn 0 R   ν1 +ν2 ν1 1+ν2  n−k     s c s c k k k k kπ kπ n/2 ˆ kn =  . I˜n,k = 4 cos2 , − sin2 M 2n 2n 1+ν2 ν1 ν1 +ν2 sk ck sk ck 1  ˜n/2 T n In,k I˜n/2,k = ˆ n , k = 1, . . . , n/2 are zero 4 The eigenvalues of M k  −1 n/2 n/2 n n n ˆ ˆ k,n−k and Cˆk = I − I˜n/2,k L I˜n,k L k  ν1   ν2  λnk = sk ckν1 +ν2 + sνk1 +ν2 ck . n ˆn n n ˆ ˆ ˆ Ck Rk,n−k Mk = Rk,n−k The spectral radius of M n is given by where I is now the 2 by 2 identity matrix. Theorem 2: For any real constants, a and b, we have ρ(M n ) = max λnk . k=1,...,n/2 for k = 1, . . . , n/2,  M n aϕnk + bϕnn−k = anew ϕnk + bnew ϕnn−k (27) In particular, for ν1 = ν2 = 1, which we use in our tests, we get ρ(M n ) = 0.25 for any even n. where   new   Thus the asymptotic convergence factor per multigrid a a n ˆ = M . (28) cycle employing a single pre-relaxation and a single k bnew b post-relaxation Lloyd-Max iteration is expected to be Hence, subspaces spanned by pairs {ϕnk , ϕnn−k } are 0.25 independently of n, whereas O(n2 ) iterations are invariant under multiplication by M n . required for this using plain Lloyd-Max iterations. n/2 n In/2 ϕk

8

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

1.5

A. Relaxation Parameter

alpha=1, beta=1 alpha=2,beta=2 alpha=2,beta=3

We can improve the performance still further by introducing a relaxation parameter (13), which we optimize using the two-level analysis. The thus modified LloydMax iteration (13) in the linear case is given by

1

0.5

old dold + dold i−1 + 2di i+1 (29) 4 where ω is a constant parameter. We wish to choose ω such that will minimize ρ(M n ) independently of n. With ˆ n is given by the relaxation parameter, R k   ˆ n = 1 − ω + ω cos2 kπ . (30) R k 2n

dover = (1 − ω)dold +ω i i

0

Converged Solution, n=32

3

L2 norm of residual vs. iterations

0

10

1.4 1.2

−5

1

10

0.8 0.6

−10

10

Multigrid Lloyd−Max LBG

0.4 0.2 0

−15

0

0.5

1

1.5

10

0

10

20

30

40

50

k

Residual reduction factor vs. iterations

 Since λnk depends linearly on cos 2kπ n , we  only need to check the extremal values, i.e., cos 2kπ = ±1. This n yields

opt

|D − D

0

|

10

−5

10

0.8 0.6

−10

10

0.4

n

Multigrid Lloyd−Max LBG

0.2

which is minimized when ω = 4/3, yielding

Multigrid Lloyd−Max LBG

−15

10

−20

10

1 sup ρ(M n ) = . 9 n

20

30

40

10

0

10

20

30

40

50

Fig. 2. Weibull density p(x) with α = 2, β = 3 and n = 32. Uniform quantization used as initial guess at the solution.

Thus, our expected convergence factor per cycle improves from 1/4 to 1/9 when we employ the optimal relaxation parameter. Our numerical tests show that ω = 4/3 remains nearly optimal for non-constant p(x) as well, and we use this value throughout.

R∞ infinite densities at the point x0 for which x0 p(x) < 10−4 . The examples we present are of continuous p(x)’s. Application of our method to the discrete versions of these p(x)’s yielded similar results, especially when N , the size of the discrete probability density function, was large. LBG and Lloyd-Max are identical once LBG reaches the highest resolution. We consider the LBG algorithm as a method of acquiring an initial guess at the solution. Therefore, all LBG plots throughout disregard the lower resolutions and display the progress starting from the highest resolution where Lloyd-Max iterations are performed until convergence. Consider the example where α = 2, β = 3, Ω = [0, 1.67], and assume that 32 representation levels are sought. The initial d˜i values for Lloyd-Max and multigrid are placed uniformly over [0, 1.67].

V. S IMULATIONS AND R ESULTS A. Convergence Rates We first demonstrate the application of multigrid V1,1 cycles to the class of Weibull density functions p(x) = αβxβ−1 e−αx , x > 0, α, β > 0

k

1

sup ρ(M n ) = max[(ω − 1)2 , 0.25(ω − 2)2 ]

β

2

β

The optimal ω, independently of n, is that which minimizes sup ρ(M n ) = sup max |λnk | . n

1

Fig. 1. Weibull density functions p(x) = αβxβ−1 e−αx , x > 0, α, β > 0. Solid: α = 1, β = 1. Dashed: α = 2, β = 2. Dashdot: α = 2, β = 3.

ˆ n turn out to be For ν1 = ν2 = 1, the eigenvalues of M k zero and    1 2kπ λnk = 5ω 2 − 12ω + 8 + ω(3ω − 4) cos . 8 n

n

0

(31)

and compare the results with those of the Lloyd-Max and LBG methods [11]. This is precisely the set of problems used in [18]. Various values of α and β yield various commonly encountered density functions. Note that β ≥ 1 yields a single minimum since log(p(x)) is concave [7]. As in [18], we truncate the tails of these 9

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

the first column indicates the number of representation levels sought n, and the rest of the columns provide the number of iterations needed for convergence of the residual to 10−12 both for Lloyd-Max and multigrid.

Referring to Fig. 2, the upper left-hand plot displays the converged solution, using any of the three methods, 3 over the distribution function p(x) = 6x2 e−2x . Solid lines describe the representation levels (ri ’s) and dotted lines the decision regions (di ’s). As expected, the higher the distribution function, the denser the representation levels underneath it. The upper right-hand graph displays the L2 norm of the residual (9). ∇D = 0, is a necessary condition for optimality. As previously mentioned, L(dn ) can be considered the gradient of the distortion and so we use the L2 norm of r(d˜n ) = −L(d˜n ), as a measure of convergence. The lower right-hand plot displays the error |Dk − Dopt |, i.e., the distance between the distortion at iteration k and the optimal distortion. Note the similar behavior between the residual and the error in distortion. Applying the multigrid method causes both the residual and the error to drop by about an order of magnitude at each V-cycle, as predicted by the Fourier analysis for constant p(x). The error reaches machine zero before the residual in this example, but this is not always the case. In general, we recommend applying V-cycles until the residual reaches machine zero. Applying the Lloyd-Max method, the residual and error drop by a single order of magnitude approximately every 800 iterations. As can be seen, the LBG method does not improve the convergence rate of neither the residual nor the error over the Lloyd-Max method, it only provides a better initial guess (for this example) with respect to the distortion value. The process was stopped when the residual reached 10−12 (approximately machine zero). Multigrid converged after 13 iterations, Lloyd-Max converged after 7638 iterations and LBG converged after 6392 iterations. The lower left-hand graph displays the residual reduction factor against the number of iterations: kr(d˜n )k+1 k2 (32) kr(d˜n )k k2

n 2 4 8 16 32 64 128

Lloyd-Max 54 186 630 2,175 7,638 30,576 112,239

multigrid 19 16 14 13 13 12 11

TABLE II N UMBER OF ITERATIONS NEEDED TO REDUCE THE RESIDUAL TO MACHINE ZERO

(10−12 ) FOR p(x) = 6x2 e−2x

3

WHEN

n

REPRESENTATION LEVELS ARE SOUGHT.

For Lloyd-Max, Table II demonstrates how doubling the representation levels slows the convergence down by approximately a factor of four, as predicted by the analysis (for constant p(x)). On the other hand, raising the number of representation levels actually reduces the number of multigrid iterations required. This is because the gap between the L2 norm of the initial residual and machine zero decreases as n grows, whereas the convergence rate remains constant. Other p’s display similar behavior to that described by table II as long as the relative change in p(x) is locally non-dramatic. More formally, max p(x) ∀x ∈ (di−1 , di+1 ) ,

x

min p(x)

<M

x

where M is some integer, for example, M = 100. Note that when this criterion is not met, the standard LloydMax process slows down as well. In the extreme situation where p(x) = 0 almost everywhere in some decision region [di−1 , di ], the Lloyd-Max process is not well defined since the denominator of (5) is zero. Table III displays the number of V1,1 cycles needed for the multigrid method to converge for various α, β and n. These results demonstrate how the independence of the multigrid convergence rates on n is not restricted to constant p(x).

where k is the iteration number and the subscript 2 refers to the L2 norm. Indeed, we see that the multigrid method reduces the residual by about an order of magnitude per V-cycle. Both the LBG and the Lloyd-Max residual reduction factors rise quickly and stabilize on about 0.997. The initial lower factor is a result of high-frequency error components, which are efficiently reduced by the local Lloyd-Max process. In the analysis section we have shown that for constant p(x) the number of Lloyd-Max iterations needed to lower the residual by an order of magnitude is proportional to the square of the number of representation levels. Table II shows that the behavior is similar for the 3 nonlinear case p(x) = 6x2 e−2x . Referring to Table II,

B. Stopping Criteria Fig. 3 demonstrates how the common stopping criterion based on the improvement in distortion can be problematic. In this example, p(x) is a polynomial of degree 13, obeying p(x) > 0, ∀x ∈ [0, 1]. Uniform quantization was used as initial guess and Lloyd-Max 10

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

n\α, β 2 4 8 16 32 64 128 256 512 1024

1,1 21 18 18 18 16 16 15 14 13 12

1,2 19 16 15 14 13 13 12 11 11 10

1,3 19 17 14 13 13 12 11 11 10 10

2,1 21 18 18 17 16 15 14 13 12 11

2,2 19 16 15 14 13 12 12 11 10 10

2,3 19 16 14 13 13 12 11 11 10 10

3,1 21 18 18 17 16 15 14 13 12 11

3,2 19 15 14 14 13 12 11 11 10 10

3,3 19 16 14 13 12 12 11 11 10 10

other hand, table II and III show that it is a very practical criterion for the multigrid method which swiftly brings the residual to machine zero, regardless the number of representation levels. C. Improved Local Minima The next common example demonstrates how the multigrid method may achieve lower distortion than the Lloyd-Max and LBG methods. Consider the step function  1  1 x< 3 (33) p(x) = 1  8 x≥ 3

TABLE III N UMBER OF V1,1 CYCLES NEEDED TO REDUCE THE MULTIGRID RESIDUAL TO 10−12 WITH RESPECT TO THE NUMBER OF REPRESENTATION LEVELS

1.4

n ( LEFT COLUMN ) AND α, β ( TOP ROW )

L norm of residual vs. iterations

−6 x 10 Lloyd−Max Converged Solution

2

−3

10

1.2 1

−4

10

Lloyd−Max Converged Solution

Multi−grid Converged Solution

0.8 0.6

8

8

6

6

4

4

2

2

−5

10

0.4 0.2 0

−6

0

0.2

0.4

0.6

0.8

1

10

0

−11

Residual Reduction Factor vs. Iterations 1.1

100

x 10

200

300

400

500

Distortion vs. iterations

2

0

1.9 1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

1.8

1.75

6

1.65

1.4 100

200

300

400

500

Multigrid Lloyd−Max LBG

1.7

1.5

0

Distortion vs. iterations

x 10

8

1.6 0.8

−3

LBG Converged Solution

1.7

0.9

0.7

0

0

100

200

300

400

500

1.6

4

1.55 1.5

2

Fig. 3. Stopping criterion for the Lloyd-Max process. n = 32. Uniform quantization used as initial guess at the solution.

1.45 0

0

0.2

0.4

0.6

0.8

1

1.4

0

10

20

30

40

50

Fig. 4. Multigrid copes with multiple local minima. n = 16. Uniform quantization used as initial guess at the solution.

iterations performed thereafter. Iteration 50 provides low improvement of distortion, whereas iterations 150 to 200 offer almost no improvement at all. Using the improvement in distortion as stopping criterion the process would halt after 50 or 150 iterations (depending on the threshold). As can be seen the distortion drops around iteration 350. Note that although the improvement in distortion between iterations 300 and 400 seems slight, it may have a profound impact on the quantization result (e.g. the visual result when quantizing the gray levels of an image). At iteration 50 and 150 the residual is still over 10−5 which doesn’t rule out a further drop in the distortion. Once the residual has dropped to machine zero, improvement of distortion by further iterations is not possible. Table II implies that using such a stopping criterion for the Lloyd-Max process is impractical, especially for a large number of representation levels. On the

Assume that we are searching for 16 representation levels (n = 16). Referring to Fig. 4, multigrid achieves a lower distortion than both the LBG and the Lloyd-Max methods. An exhaustive search among all locally optimal quantizers shows that the multigrid method achieves the globally minimal distortion in this case. The initial guess obtained by LBG for this example is not good. Once LBG reaches the highest resolution, the Lloyd-Max process is not capable of remedying the situation. Moving representation levels to better locations may require a temporary increase in the distortion value. Since each Lloyd-Max optimization step can only lower the distortion [12], there may be no way of moving representation levels. Indeed, with respect to Fig. 4, moving a single representation level from the higher 11

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

some decision region, the method may “bounce” between various approximate local minima during the Vcycles. This can actually be considered favorable since the solution with the best distortion can be stored over a number of iterations and used as the initial approximate solution to the following procedure which ensures monotonic convergence. During each V-cycle, before accepting the correction from the coarser resolution, check if it indeed reduces the distortion value of the solution d˜ (or the residual). If it does, then accept the corrected d˜ as the new approximation, otherwise continue with the previous approximate solution, ignoring the “correction”. Thus, we can run several ordinary V-cycles, storing the best results achieved. Then, if the convergence is not monotone, we return to the best approximation found, and continue with the monotonically converging V-cycle. The step function solution in Fig. 4 was obtained by applying 5 ordinary V-cycles followed by monotonic Vcycles until convergence. The method must converge because the relaxations at the finest resolution always reduce the distortion value. The worst possible scenario is that none of the coarse grids reduce the distortion, and so the monotonic algorithm behaves like the Lloyd-Max process (at about thrice the cost). In practice, the monotonic V-cycles are still found to converge quite quickly.

step to the lower step would improve the distortion of the LBG result. In general, the LBG method solves the original quantization problem at each resolution and uses each such solution as the initial guess for the next level higher resolution. In contrast, the multigrid method uses the coarse resolutions to solve error equations (8) for the finer resolutions (via the modified right hand side f ), i.e., the multigrid method is focused on solving the original high resolution problem at all times. Note that while both multigrid and Lloyd-Max permit non-monotonic norm of residual, only multigrid permits non monotonic distortion, which may help in avoiding local minima (see appendix ). VI. C ONCLUSIONS AND F UTURE W ORK We have presented a multigrid acceleration of the Lloyd-Max iterative process for scalar quantization. The accelerated algorithm is applicable to both the continuous (1) and discrete (2) versions of the problem and normally requires O(n) and 3N + O(n) operations, respectively. Multigrid convergence rates of both the distortion and the residual seem independent of the number of representation levels, whereas the convergence rates for the Lloyd-Max and LBG methods are quadratically dependent on this factor. Furthermore, we have shown that the multigrid algorithm may achieve minima with lower distortion than both the Lloyd-Max and LBG methods. Finally, the multigrid method alleviates the need for traditional stopping criteria since the residual is brought to machine zero in a matter of several iterations. The traditional Lloyd-Max (and therefore LBG) require a great number of iterations in order to bring the residual to machine zero, and therefore using the residual as a convergence criterion is impractical for these methods. The Lloyd-Max method is applicable to the multidimensional problem (Vector Quantization). In the 2-D case the decision regions are polygons. The optimal decision regions can be found by computing the Voronoi diagram of the representation levels, and the optimal representation levels are still the centers of mass of the decision regions. In higher dimensions the decision regions become polytopes. The multigrid results shown in 1-D are very promising, and we are currently researching a generalization of the acceleration method to the multidimensional (vector) problem.

R EFERENCES [1] A. Aggarwal, M. Klawe, S. Moran, P. Shor, and R. Wilber, “Geometric applications of a matrix-searching algorithm,” Algorithmica, vol. 2, pp. 195–208, 1987. [2] R. Balasubramanian, C. Bouman, and J. P. Allebach, “Sequential scalar quantization of vectors: An analysis,” IEEE Transactions on Image Processing, vol. 4, no. 9, pp. 1282–1295, September 1995. [3] A. Brandt, “Multi-level adaptive solutions to boundary-value problems,” Mathematics of Computation, vol. 31, pp. 333–390, 1977. [4] W. L. Briggs, V. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed. SIAM, 2000. [5] J. D. Bruce, “Optimal quantization,” Sc.D. Thesis, M.I.T., May 1964. [6] G. Dahlquist and A. Bjorck, Numerical Methods. Prentice-Hall, 1974. [7] P. Fleischer, “Sufficient conditions for achieving minimum distortion in a quantizer,” IEEE International Convention Record, pp. 104–111, 1964. [8] A. Gersho and R. M. Gray, Vector quantization and signal compression. Kluwer Academic Publishers, 1992. [9] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 1–63, October 1998. [10] J. C. Kieffer, “Exponential rate of convergence for Lloyd’s method I,” IEEE Transactions on Information Theory, vol. IT28, no. 2, pp. 205–210, March 1982. [11] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Transactions on Communications, vol. COM-28, pp. 84–95, January 1980.

A PPENDIX The distortion of the multigrid method does not always converge monotonically for general p(x). Indeed, when the basic formulation is applied to certain problems in which p(x) undergoes a large relative change within 12

Technion - Computer Science Department - Technical Report CS-2005-08 - 2005

[12] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, March 1982, unpublished Bell Laboratories Technical Note 1957. [13] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Fifth Berkeley Symposium on Mathematics Statistics and Probability, vol. 1, 1967, pp. 281– 296. [14] J. Max, “Quantizing for minimal distortion,” IEEE Transactions on Information Theory, vol. 6, no. 1, pp. 7–12, March 1960. [15] G. M. Roe, “Quantizing for minimum distortion,” IEEE Transactions on Information Theory, vol. IT-10, no. 4, pp. 384–385, October 1964. [16] D. K. Sharma, “Design of absolutely optimal quantizers for a wide class of distortion measures,” IEEE Transactions on Information Theory, vol. IT-24, no. 6, pp. 693–702, November 1978. [17] U. Trottenberg, C. Oosterlee, and A. Sch u¨ ller, Multigrid. Academic Press, 2001. [18] X. Wu, “On initialization of max’s algorithm for optimum quantization,” IEEE Transactions on Communications, vol. 38, no. 10, pp. 1653–1656, October 1990. [19] X. Wu and K. Zhang, “Quantizer monotonicities and globally optimal scalar quantizer design,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 1049–1053, May 1993.

13