On the Whiteness of High Resolution Quantization ... - Semantic Scholar

Report 2 Downloads 104 Views
On the Whiteness of High Resolution Quantization Errors Harish Viswanathan and Ram Zamir  Lucent Technologies, Bell Labs, Holmdel, NJ-07733 and Dept. of Elect. Eng. - Systems, Tel Aviv University, ISRAEL September 11, 2000

Abstract

A common belief in quantization theory says that the quantization noise process resulting from uniform scalar quantization of a correlated discrete time process tends to be white in the limit of small distortion (\high resolution"). A rule of thumb for this property to hold is that the source samples have a \smooth" joint distribution. We give a precise statement of this property, and generalize it to non-uniform quantization and to vector quantization. We show that the quantization errors resulting from independent quantizations of dependent real random variables become asymptotically uncorrelated (although not necessarily statistically independent) if the joint Fisher information under translation of the two variables is nite and the quantization cells shrink uniformly as the distortion tends to zero.

Key words: high resolution, asymptotic whiteness, Fisher information, quantization noise, multiterminal source coding.

 This research was started when the authors were with Cornell University, Ithaca, NY. The second author was

supported in part by the Wolfson Research Awards, administered by the Israel Academy of Science and Humanities.

0

I Introduction High resolution quantization theory provides ecient design tools and explicit formulas for coding of continuous sources with small distortion [6, 5, 10]. The importance of this theory lies in the fact that similar concepts do not exist for coding at arbitrary resolution. In Shannon theory, high resolution source coding leads to useful asymptotic results for the rate-distortion function [9], and for the rate-distortion region in multiuser source coding [17, 11]. Many results and properties in high resolution quantization theory although simple to state and justify heuristically, are hard to prove rigorously. One such property is the asymptotic whiteness of the quantization error process [6, sec. 5.6], which says, in e ect, that independent \ ne" quantization of two \smoothly" dependent random variables generates uncorrelated errors. The Asymptotic Whiteness Property (AWP) plays an important role in practical speech and image compression, where the quantization noise spectrum has a strong perceptual e ect [8]. The AWP also gives interesting insight into the behavior of multiterminal coding of correlated continuous sources [1, 16, 17], where the correlation between the errors at separate terminals may a ect the estimation error at the centralized decoder. Consider, for example, scalar quantization of a stationary process X = X1 ; X2 ; : : : using a uniform quantizer Qu () with step size , i.e.,

Qu (xn ) = arg x^2f0;min jx ? x^j ;2;:::g n

n = 1; 2; : : : :

(1)

Let Zn = Qu (Xn ) ? Xn denote the error in quantizing the n-th sample of X . The AWP implies that under certain conditions on the pairwise distribution of X , the correlation coecient between any two quantization error samples satis es n Zn+k ! 0 as  ! 0 for all k 6= 0. k = EZEZ 2 n

(2)

Under these conditions the mean squared error in quantization satis es EZn2 = D  2 =12 [12], where throughout the paper  means that the ratio between the corresponding quantities goes to 1, thus (2) amounts to EZn Zn+k =2 ! 0 as  ! 0. The uniform scalar quantizer above provides a simple example that captures the spirit of the AWP. The AWP extends straightforwardly with simple conditions to the case of vector lattice quantization that subsumes the scalar quantization example above. Speci cally, we show rigorously in Theorem 1 that the correlation coecient between lattice quantization error vectors vanishes asymptotically under the simple condition that source density exists. The regularity of the quantization scheme combined with this condition is sucient to guarantee the AWP. Related work 1

regarding second moments of lattice quantization noise appears in [10, 12, 18]. Our main result in this paper (Theorem 2) generalizes the AWP to non-uniform quantization. Unlike for the lattice quantization case, in this case the quantization cells are not necessarily convex, and may be even unions of disconnected regions, as happens in the case of multiterminal source coding [17]. This more general formulation of the AWP requires stronger conditions on the joint distribution of (Xn ; Xn+k ). Non-uniform quantization provides a particularly interesting implication of the AWP in which correlation and statistical dependence play di erent roles. It is not hard to construct examples for a source and a non-uniform quantizer such that the quantization errors would be strongly statistically dependent, but still asymptotically uncorrelated. The intuition behind the AWP comes from the combination of two ideas: 1. Local uniformity: If the joint distribution of the source samples is \smooth", then it is approximately uniform inside small cells (corresponding to high resolution quantization). 2. Rectangular partition: Independent quantization of random variables X 2 X and Y 2 Y induces a rectangular (\Cartesian") partitioning of the (X ; Y )-plane. If furthermore, the joint distribution of (X; Y ) is exactly uniform in some rectangular cell, then the quantization errors are conditionally independent given that cell. If the joint distribution is piecewise uniform with respect to all cells, then the overall quantization errors are un-correlated (although not necessarily statistically independent). The property of rectangular partition above seems simple and clear. The main purpose of this paper is to make a precise statement of the idea of local uniformity, to propose a sucient condition for it to hold and to prove a general form of the AWP using the local uniformity condition. For lattice quantization existence of the joint probability density of the source turns out to be sucient, as stated in Theorem 1. For general non-uniform quantization our condition is based on the niteness of the Fisher Information under translation, a quantity which is a function of the joint distribution of the source samples. Lemma 1 provides an intermediate result on local uniformity in terms of the Fisher Information, while Theorem 2 states the AWP for non-uniform quantizers. Section II summarizes these results and Section III provides the proofs.

2

II Summary of Results A. Lattice Quantization A lattice quantizer is a vector quantizer in which the quantization cells are Voronoi regions of a k?dimensional lattice and the reconstructions x^ are geometric centroids of the cells [12]. If the polytope P is the Voronoi region of the lattice, then each cell in the quantizer is congruent to P . A sequence of lattice quantization schemes Qx N with distortion tending to zero is obtained with quantization regions congruent to P N = f N x : x 2 P g, N > 0 and N ! 0 as N ! 1. Consider a sequence of pairs of lattice quantizers Qx N and Qy N , not necessarily derived from the same lattice, with reconstructions x^ N and y^ N , respectively. Denote the distortion of such a sequence by Dx;N = E kX ? X^ N k2 ; Dy;N = E kY ? Y^ N k2 : (3) Let

^ t

^

N ) (Y ? YN )g N = E f(X ?pX D D x;N y;N

(4)

be the correlation coecient between the quantization errors, where ()t denotes transpose. We have the following result on the correlation coecient between the X; Y quantization errors.

Theorem 1 (AWP for Lattice Quantizers) Let (X; Y) 2 (X ; Y ), where X = Y = Rk , be correlated random vectors with source density p(x; y). For the sequence of pairs of lattice quantizers with distortions tending to zero,

N ! 0 as N ! 1:

The basic form of the AWP in (2), regarding uniform quantization of a stationary source with a density, follows as a corollary from this theorem (setting k = 1, X = Xn and Y = Xn+k ). Note that Theorem 1 can also be deduced from the analysis in [10] on multidimensional companding with locally quadratic distortion measures (see also [12]).

B. Non-uniform Quantization We rst de ne the Fisher Information under translation of X (or in short, the FI of X). Let X 2 Rm have a density p(x), and de ne the FI of X as [2, 4, 3] Z 1

@p(x)

2  (

@ ln p(X)

2)  (5) J (X) = p(x) 

@ x

dx = E

@ X

where @p@(xx) (respct. @ ln@px(x) ) denotes the gradient (i.e., vector of partial derivatives) of p(x) (respct. of ln p(x)) with respect to x1 ; : : : ; xm , kk denotes the Euclidean norm, and E fg denotes expectation 3



with respect to X. Finite FI means that the quantity

@ ln@px(x)

= p(1x) 

@p@(xx)

is nite for most probable xs. Roughly, this implies that the relative variation of the density p(x) ? p(y) max x;y2T p(y) inside a small cell T  Rm is small, i.e., p(x) is locally uniform inside T . The following lemma suggests a way to quantify the degree of local uniformity of the source, relative to a \ ne" vector quantizer (not necessarily a lattice), in terms of the FI of the source and the quantizer's distortion:

Lemma 1 (Local Uniformity) Let Q : Rm ! Rm be a vector quantizer which encodes X 2 Rm with mean squared error distortion D, i.e., E kQ(X) ? Xk2 = D . Let p(x) denote the density of X. 2 If the norm of the Hessian of ln p(x) is bounded, i.e.,

@ @lnxp2(x)

< K for all x in Rm for some K , then as D ! 0   p(X)   q (6) E ln p(Q(X))  J (X)  D + O(D); where O(D) = K2 D, and J (X) is the FI of the source. p Note that for small D the term J (X)D dominates the bound. Thus, if the FI is nite and

the distortion is small, then the ratio of the density at an arbitrary point x to the density at the reconstruction point Q(x) is on the average close to one, or equivalently \p(x) is locally uniform". As discussed in the Introduction, our main result uses the niteness of the joint FI to show that errors resulting from independent quantization of source samples are asymptotically uncorrelated. The following setting generalizes the lattice quantization scheme above to non-uniform vector quantization with joint centroid decoding. Let X 2 X ; Y 2 Y , where X = Y = Rk , be random vectors with joint density p(x; y). Let

i(x) : X ! f1; 2; : : : ; Nx g; j (y) : Y ! f1; 2; : : : ; Ny g

(7)

induce two partitions of Rk corresponding to independent quantization of X and Y, respectively. We do not exclude the case where Nx or Ny are in nite. Let

Tix = fx : i(x) = ig and Tjy = fy : j (y) = j g denote the cells of these partitions, and let





(^x; y^ ) = Q i(x); j (y) ; 4

(8)

where Q : f1; 2; : : : ; Nx g  f1; 2; : : : ; Ny g ! (X^ ; Y^ ) denotes the quantizer reconstruction function. We de ne Q(i; j ) as the joint centroid of the cell Tix  Tjy relative to the source distribution, i.e.,

i(x) = i; j (y) = j =)

n o x^ = E Xj (X; Y) 2 Tix  Tjy n o y^ = E Yj (X; Y) 2 Tix  Tjy

(9)

where here E fg denotes conditional expectation with respect to the true distribution of (X; Y) inside the cell Tix  Tjy . Note that the encoding of x and the encoding of y as in (7) are independent; however, as opposed to the case of lattice quantization, the reconstruction function (9) depends on both marginal partition indices, and depends on the joint distribution of (X; Y). This reconstruction function is typical of distribution optimized quantization [8], or of joint decoding in multiterminal source coding [1]. This reconstruction function coincides with the geometric centroid (as in the lattice case) for a cell Tix  Tjy , if the joint conditional distribution of (X; Y) inside this cell is exactly uniform. Consider a sequence of pairs of partition functions iN (x); jN (y) of X ; Y , N = 1; 2; : : :, and a corresponding sequence of reconstruction functions (^xN ; y^ N ), such that

Dx;N = E kX ? X^ N k2 ! 0; Dy;N = E kY ? Y^ N k2 ! 0

(10)

at the same rate. We make the following assumption on the asymptotic distribution of the quantization errors as N ! 1. Moment Condition: There exists some  > 0 such that ^ N k2 !1+ ^ N k2 !1+ k X ? X k Y ? Y lim sup E < 1; lim sup E < 1: (11) D D N !1

N !1

x;N

y;N

The moment condition guarantees that the cells shrink uniformly over a large enough region of the support of the density p(x; y). For example, the moment condition holds for uniform / lattice quantization provided the source has a density. Some easy to check conditions that imply the moment condition are given in the Appendix. We emphasize that for general partitions the moment condition is needed even when the source density has a bounded support. A similar assumption (UACI-p hypothesis) on the quantization error has been made in [13]. Our main result in the paper is summarized in Theorem 2 below. It states that under the assumptions above, the correlation coecient between the X; Y quantization errors vanishes asymptotically, provided that the joint FI of (X; Y) is nite. 5

Theorem 2 (AWP for Non-Uniform Quantizers) Let (X; Y) 2 (X ; Y ), where X = Y = Rk , be correlated random vectors with continuous source density p(x; y), a.s. continuously di erentiable ln p(x; y) and joint FI J (X; Y) < 1 as de ned in (5). Let iN (x) and jN (y) be a sequence of independent partition functions of X and Y , let x^ N and y^ N be the corresponding reconstructions (9), and let N be the correlation coecient between the quantization errors as in (4). If the sequence (^xN ; y^ N ) satis es (10) and (11), then

N ! 0 as N ! 1:

(12)

III Derivation of Results for Lattice Quantization Proof of Theorem 1 For lattice quantization we have,

kx ? x^ N k  N diam(P ) where P is the polytope associated with the Voronoi region of the lattice. Furthermore, Linder and Zeger [12] have shown that for tessellating quantizers and in particular, for k?dimensional lattice quantizers

Z 1 D x;N = kV (P ) kx ? x^ k2 dx lim !0 2N P where V (P ) is the volume of P and x^ is the centroid of P . Hence it follows that kxp? x^ N k  Npdiam(P ) Dx;N s Dx;N  2 R kkxV?(Px^)k2 dx diam(P ) P N

Nk M for suciently large N . From a similar statement for Y, we thus have for some M , kpx?Dx^x;N Nk  M for all suciently large N . and kpy?Dy^y;N Let, for (x; y) 2 Tix  Tjy ,

p~N (x; y) =

R

Tix Tjy p(x; y)dxdy

R

 p : = i;j

Then,

TixTjy dxdy

Z t (y ? y^ N ) ^ ( x ? x ) N pD D p(x; y)dxdy jN j = Z x;N y;N  M 2 p(x; y) ? p~N (x; y) dxdy 6

Z (x ? x^ N )t Z (y ? y^N )t X pD dx pD dy + pi;j y2T x2T i;j x;N y;N Z = M 2 p(x; y) ? p~N (x; y) dxdy ! 0 as N ! 1 y i

x i

where the rst term vanishes since each of the integrals is zero for every (i; j ) and the second term vanishes since by Lebesgue's di erentiation theorem [15], p~N ! p as N ! 0, which via Sche e's theorem [7] implies convergence in mean. 2

IV Derivation of Results for Non-uniform Quantization Proof of Lemma 1 By Taylor expansion in the Lagrange remainder form ([14]) we have



ln p(pQ(x(x)))



  2 p(z) = (x ? Q(x))t @ ln@px(x) + 21 (x ? Q(x))t @ ln @ z2 (x ? Q(x))

for some vector z between x and Q(x). Now, 2 p(z) @ 2 ln p(z) k  k(x ? Q(x))k (x ? Q(x))t @ ln ( x ? Q ( x ))  k ( x ? Q ( x )) k  k @ z2 @ z2 2  K kx ? Q(x)k

where the last inequality is from the boundedness assumption of the Lemma. Hence, applying the Cauchy Schwarz inequality to the rst term we get

Z Z Z p(x) p(x)dx  (x ? Q(x))t ( @ ln p(x) ) p(x)dx + 1 K jjx ? Q(x)jj2 p(x)dx ln p(Q(x)) @x 2 1=2 Z @ ln p(x) Z 1=2 k @x k2 p(x)dx + jjx ? Q(x)jj2 p(x)dx  Z K + 2 jjx ? Q(x)jj2 p(x)dx q = DJ (X) + K2 D

where the last equality follows from the de nition of J (X).

2

Proof of Theorem 2 Fix N , and let AN be a nite subset of Rk  Rk to be speci ed later. De ne p~N (x; y) to be a piecewise uniform approximation of p(x; y) inside AN , relative to the partition function iN (x)  7

jN (y), and a copy of p(x; y) outside AN , i.e.,

8R > A \  p(x;y)dxdy PX Y [A \T T ] > R < = V (A \T T ) for (x; y) 2 AN \ Tix  Tjy  d x d y N p~ (x; y) = > A \  (13) > c : p(x; y) for (x; y) 2 AN where PX;Y [T ] denotes the probability of the set T induced by (X; Y), and V () denotes volume. N

y Tx T i j

;

N

N

y x N Ti Tj

x i

x i

y j

y j

De ne the pointwise correlation coecient

t y ? y^ N ) N (x; y) = (x ?px^DN ) (D : x;N y;N Starting from the de nition of N we have,

N =

Z

N (x; y)p(x; y)dxdy

Z





= Ep~N [N (X; Y)] + N (x; y) p(x; y) ? p~N (x; y) dxdy AN where Ep~N [] denotes expectation w.r.t the distribution p~N de ned in (13). Let MN 2 R be a sequence to be speci ed later. We have

Z





N = Ep~ [N (X; Y)] + N (x; y) p(x; y) ? p~N (x; y) dxdy fj  ( x ; y ) j M g\A Z   + N (x; y) p(x; y) ? p~N (x; y) dxdy N

N

N

N

fjN (x;y)j>MN g\AN

whose absolute value can be bounded as

jN j  Ep~ [N (X; Y)] Z + MN p(x; y) ? p~N (x; y) dxdy Z A + jN (x; y)jp(x; y)dxdy Zfj (x;y)j>M g\A + jN (x; y)jp~N (x; y)dxdy : N

N

N

N

N

fjN (x;y)j>MN g\AN

(14a) (14b) (14c) (14d)

The motivation to break up the cross-correlation into four parts as in (14) is that it would allow us to apply the notions of \local uniformity", \rectangular partition", and the \moment condition" to show that the cross-correlation tends to zero at high resolution. Roughly, (14a) will tend to zero at high resolution due to p~N being uniform in rectangular cells over most part of R2k . For appropriately chosen sequence of sets AN and constants MN (see (17) and Proposition 1), (14b) will tend to zero as p~N better approximates p at high resolution conditions. Finally, the terms (14c) and (14d) vanish due to the \moment condition". We now present a rigorous analysis of the various terms. The following propositions lead to the fact that each of the four terms in (14) tends to zero as Dx;N ; Dy;N ! 0. 8

De ne rx (AN ) and ry (AN ) to be the maximum radii of the x?cells and the y?cells inside AN , respectively:  supfkx ? x^ k : (x; y) 2 A g = N N

rx (AN ) ry (AN )

 supfky ? y^ k : (x; y) 2 A g = N N

where (^xN ; y^ N ) = Q(iN (x); jN (y)). We are now ready to state and prove the prove the propositions leading to the proof of Theorem 2.

Proposition 1 (Choice of \good" AN ) Under the moment condition, for a continuous source density p(x; y) and continuously di erentiable ln p(x; y) with J (X; Y) < 1 there exists a sequence of sets AN and  > 0 such that, as N ! 1, (1+) + D(1+) ) 1. P (AcN ) = O(Dx;N y;N 1? ) 2. rx2 (AN ) = O(Dx;N

1? ) 3. ry2 (AN ) = O(Dy;N ?(1+) D?(1+) ): 4. supAN k @ ln@ (px(;xy;)y) k = O(Dx;N y;N

Furthermore, AN can be chosen so that AN \ Tix  Tjy is a rectangle Six  Sjy . 2 = (D1? ); ry;N 2 = Proof: Let rx;N ; ry;N ; tN be three sequences of real numbers such that rx;N x;N

1? ) and, tN = (D?(1+) D?(1+) ) where the notation xN = (yN ) is used to imply that (Dy;N x;N y;N x x N N 0 < lim inf yN  lim sup yN < 1. Let

A1N = f(x; y) : kx ? x^ N k  rx;N g A2N = f(x; y) : ky ? y^ N k  ry;N g A3N = f(x; y) : k @ ln @p(Xx;Y; y(x) ; y) k  tN g

From the de nition of A1 and the moment condition we have

2+2 P ((A1 )c )  E [kX ? X ^ N k2+2 ] rx;N N 1+ ): = O(Dx;N

Hence

1+ ) O(Dx;N (1?)(1+) Dx;N (1+) ): = O(Dx;N

P ((A1N )c) 

9

Similarly,

(1+) ): P ((A2N )c)  O(Dy;N

Note that

 @ ln pX;Y (x; y)    @ ln pX;Y (x; y) 1=2 E k @ (x; y) k  E k @ (x; y) k2 q = J (X; Y) < 1

and hence

p(x; y) PX;Y [f(x; y) : k @ ln @ (x; y) k > tN g] 

or

pJ (X; Y) tN

(15)

p(x; y) (1+) (1+) P [k @ ln @ (x; y) k > tN ]  O(Dx;N Dy;N )

Choose AN = A1N \ A2N \ A3N . It follows that

P (AcN )  P ((A1N )c) + P ((A2N )c) + P ((A3N )c) (1+) + D(1+) ); = O(Dx;N y;N 2 rx (AN )2  rx;N = O(Dx1? ); 2 ry (AN )2  ry;N = O(Dy1? ); p(x; y) sup k @ ln @ (x; y) k  tN AN

?(1+) D?(1+) ): = O(Dx;N y;N

Thus we have demonstrated by construction that it is possible to nd a sequence of AN s simultaneously satisfying conditions 1, 2, 3 and 4 in the statement of the proposition. Suppose AN \Tix Tjy is not a rectangle for some cell (i; j ). Then for some (xo ; yo ) 2 AN \TixTjy

A0N = AN [ (f(x; y) : kx ? xok < 2rx;N ; ky ? yo k < 2ry;N g \ (Tix  Tjy )) guarantees A0N \ Tix  Tjy is a rectangle without violating conditions 1, 2, 3, and furthermore, 2 condition 4 continues to hold because of the continuity of @ ln@ (px(;xy;)y) .

Proposition 2 (Regarding (14b)) Z jp(x; y) ? p~N (x; y)jdxdy 

v ! u u t4qrx(AN )2 + ry (AN )2 sup k @ ln p(x; y) k : @ (x; y) A N

10

Proof : Since p(x; y) is a continuous function, by the Mean value theorem ([14]) there exists a point (xij ; yij ) in the convex hull of AN \ (Tix  Tjy ) such that R A \T T p(x; y) dx dy  p~N (x; y) = V (AN \ Tix  Tjy ) = p(xij ; yij ) for all (x; y) 2 AN \ (Tix  Tjy ) N

x i

y j

Hence the divergence (in nats) between p and p~N satis es, [3], Z  N D pkp~ = p(x; y) ln p~pN((xx; ;yy)) dxdy ZAN  p(x; y) ln p~pN((xx; ;yy)) dxdy AN

=

XZ

y i;j AN \Tix Tj

p ( x ; y ) p(x; y) ln p(x ; y ) dxdy: ij ij

Now by Taylor Series expansion in Lagrange remainder form (see Rudin [1]) we have

jln p(x; y) ? ln p(xij ; yij )j =  @ ln p(~xij ; y~ij )   @ ln p(~xij ; y~ij )  t t + (y ? yij ) = (x ? xij ) @x

@ ln p(~x ; y~ )

@ y   1 = 2  kx ? xij k2 + ky ? yij k2

@ (x;ijy) ij

for some vector (~xij ; y~ ij ) on the line joining (x; y) and (xij ; yij ). Since (xij ; yij ) is in the convex hull of AN \ Tix  Tjy we have kx ? xij k < 2rx (AN ) and ky ? yij k < 2ry (AN ). Thus, !  N @ ln p ( x ; y ) 2 2 1 = 2 D pkp~  2(rx (AN ) + ry (AN ) ) sup k @ (x; y) k A The statement of the proposition follows from the fact that for any two distributions dV (P1 ; P2 )2  2D(P1 kP2 ) where dV () denotes the variational distance [3]. 2 N

Proposition 3 (Regarding (14a)) De ne the partial second moments outside AN Z  c kx ? x^ N k2 p(x; y)dxdy mx (AN ) = ZA  my (AcN ) = ky ? y^N k2 p(x; y)dxdy: A c N

Then

c N

c ) !1=2 my (Ac ) !1=2 m ( A x 1 = 2 1 = 2 N Ep~ [N (X; Y)]  (fx(AN )) (fy (AN )) + D N D N

where

x;N

y;N

Z

c 2 c 2 jp~N (x; y) ? p(x; y)jdxdy + 4 rx(AND) P (AN ) + mDx(AN ) fx (AN ) = 4 rxD(AN ) x;N AN x;N x;N Z 2 2 c ry (AN ) P (A ) my (Ac )  ry (AN ) N

fy (AN ) = 4 D y;N

AN

jp~ (x; y) ? p(x; y)jdxdy + 4 11

Dy;N

N

+ D N : y;N

(16)

Proof: The proof involves a tedious analysis starting from the de nition of Ep~ [N (X; Y)] and N

is relegated to the appendix.

Proposition 4 (Regarding (16)) AcN ) ! 0 as P (Ac ) ! 0. 1. mDx (x;N N AcN ) ! 0 as P (Ac ) ! 0. 2. mDy (y;N N

Proof: Let

exN (x; y) = kxp?D x^ N k ; x;N k y ? eyN (x; y) = pD y^ N k : y;N

Now

mx (AcN ) = Z jex (x; y)j2 p(x; y)dx dy Dx;N A N Z  M 2 P (AcN ) + sup jexN (x; y)j2 p(x; y)dx dy : c N

N

fexN (x;y)>M g

Similarly

my (AcN )  M 2P (Ac ) + sup Z jeyN (x; y)j2 p(x; y)dx dy : N Dy;N N fe (x;y)>M g Hence to prove the proposition, it is sucient to show that N (x; y); jexN (x; y)j2 ; jeyN (x; y)j2 are uniformly integrable for N = 1; 2; : : : [7]. By the Cauchy Schwarz inequality y N

E [jN (X; Y)j1+ ]

^ N k2+2 ] !1=2 E [kY ? Y^ N k2+2 ] !1=2 E [ k X ? X  DX1+ DY1+  K0

for some constant K 0 by the moment condition. Also by the same moment condition

E [jexN (x; y)j2(1+) ]  K1 E [jeyN (x; y)j2(1+) ]  K2 for some constants K1 ; K2 . The required uniform integrability conditions follow by [7, p. 224, ex. 5.1]. 2 Now we are ready to specify the sequences AN and MN . We pick ? D? ) MN = O(Dx;N y;N

for some  > 0 and AN as given Proposition 1. 12

(17)

Proposition 5 (Regarding (14c) and (14d)) For MN of (17) and AN of Proposition 1, (14d) and (14c) are zero for large enough N .

Proof: For (x; y) 2 AN jN (x; y)j  1 kpx ? x^ k kpy ? y^ k MN MN Dx;N Dy;N  M1 rpx(DAN ) rpy (DAN ) =

N

x;N =2 ) = 2 O(Dx;N Dy;N

y;N

< 1 for N suciently large. Thus the integration sets in (14c) and (14d) are empty as N ! 1. 2 Proof of the Theorem: The theorem is proved by arguing that each of the four terms in (14) tends to zero as Dx;N ; Dy;N ! 0. We start with the (14a): From, (i) the choice of AN as in Proposition 1; (ii)our assumption in (10) that O(Dx;N ) = O(Dy;N ); (iii) the bound in Proposition (2); we have,

Z

jp(x; y) ? p~N (x; y)jdxdy 

r

?

?(1+) ) O(Dx;N2 )O(Dx;N 1 

4 ?( 4 + ) ): = O(Dx;N 1

5

Combining this with the bound in Proposition 3 and the limit in Proposition 4 gives

fx(AN ) =

1? ) 1? ) 1 ?( 5 + ) O(Dx;N O(Dx;N (1+) 4 4 O ( D ) + x;N Dx;N Dx;N O(Dx;N ) + o(Dx;N )

! 0

(18) (19)

for 0 <  < 1=(9=4 + ). Similarly it can be shown that fy (AN ) ! 0. Thus we have shown that (14a) vanishes by (i) choice of AN as in Proposition 1; (ii) bound in Proposition 3; and (iii) limits in Proposition 4. Next, consider (14b):

!1=2 Z (a) @ ln p ( x ; y ) N 2 2 1 = 4 MN p(x; y) ? p~ (x; y) dxdy  2MN (rx(AN ) + ry (AN ) ) sup k @ (x; y) k A A (b) ? D? )O((D1? + D1? )1=4 )O((Dx;N Dy;N )?(1+) )  O(Dx;N y;N x;N y;N ! 0 N

N

where (a) is by Proposition 2 and (b) is by the choice of the sequence MN as in (17), sequence AN as in Proposition 1, and the limit follows by choosing  suciently small. 13

The third and the fourth terms (14c) and (14d) vanish by Proposition 5. Thus we have shown that all the four terms in (14) vanish for appropriate choice of MN ; AN and suciently small . Thus the theorem is proved. 2

Appendix A. Conditions implying the moment condition The following are stronger conditions that imply the moment condition. 1. uniform boundedness:

sup kxp?D x^ N k  M N

x;N

and a similar condition for Y clearly imply the moment condition. 2. If g(x) is an integrable function such that

Z

and

g(x)1+ pX (x)dx < 1

! kpx ? x^ k 2  g(x); D

then the moment condition for X follows.

x;N

B. Proof of Proposition 3 First separate the averaging over AN and AcN : R (x ? x^ )t (y ? y^ )~pN (x; y)dxdy R c (x ? x^ ij )t (y ? y^ij )~pN (x; y)dxdy ij pD Dij pD D + AN : Ep~N [N (x; y)] = AN x;N y;N x;N y;N (20) Note that our partition into terms involving integration inside the region AN and outside AN is natural since the two terms vanish for di erent reasons. The rst term would be identically zero if the reconstructions xij ; yij were the geometric centroids. However, they are not geometric centroids but are second moment centroids w.r.t to the density p(x; y). Nevertheless, as the quantization becomes increasingly re ned and the quantization cell size tends to zero the second moment centroid does converge to the geometric centroid. The second term vanishes essentially by our moment criterion which guarantees that there are no \bad" cells as the quantization becomes increasingly re ned. 14

Now we consider each of the two terms separately and show that they are bounded by the two terms in the statement of the proposition. R (x ? x^ )t (y ? y^ )~pN (x; y)dxdy ij ij AN p =

Dx;N Dy;N R X P (Six  Sjy ) S S (x ? x^ ij )t (y ? y^ ij )dxdy p V (Six  Sjy ) i;j Dx;N Dy;N X P (Six  Sjy ) t p = aij bij i;j Dx;N Dy;N y j

x i

where we have de ned

Six  Sjy = AN \ Tix  Tjy R (x ? x^ )dx ij  aij = S V (S x ) R (y ? iy^ )dx ij S bij = y V (Sj ) x i

y j

and where AN \ Tix  Tjy is guaranteed to be a rectangle by the choice of AN as in Proposition 1. Now we compute aij : R (x ? x^ )dx x ij aij = Si V (S x ) "Zi # 1 x = V (S x ) x xdx ? x^ ij V (Si ) R x xi dx Si R x y xp(x; y)dxdy T T = VSi(S x ) ? i Pj(T x  T y ) R i y xp~N (x; y)dxidy jR x y xp(x; y)dxdy S xS ? Ti PTj(T x  T y ) = i Pj (S x  S y ) i " j i j # Z Z 1 N = P (S x  S y ) x y x[~p (x; y) ? ij p(x; y)]dxdy ? ij c x y xp(x; y)dxdy i j "ZSi Sj #AN \Ti Tj = P (S x1 S y ) x y (x ? x^ ij )[~pN (x; y) ? ij p(x; y)]dxdy Si Sj i j " Z # 1 ? P (S x  S y ) ij Ac \T xT y (x ? x^ ij )p(x; y)dxdy i j i j N 1 = P (S x  S y ) [x (i; j ) + x (i; j )] i j where we have set P (S x  S y ) ij = P (T ix  Tjy ) Z i j  x(i; j ) = x y (x ? x^ ij )[~pN (x; y) ? ij p(x; y)]dxdy Si Sj

15

Z x (i; j ) = ij Similarly letting

AcN \Tix Tjy

Z

y (i; j ) =

S S x i

y (i; j ) = ij we get

Z

y j

(x ? x^ ij )p(x; y)dxdy :

(y ? y^ ij )[~pN (x; y) ? ij p(x; y)]dxdy

AcN \Tix Tjy

(y ? y^ ij )p(x; y)dxdy

bij = P (S x1 S y ) [y (i; j ) + y (i; j )] i j y x To simplify notation set P (Si  Sj ) = p~ij . Then by Cauchy Schwarz inequality X at bij  p~ij pDij pD x;N y;N i;j

11=2 0 11=2 0 X X b a @ p~ij k p ij k2A @ p~ij k p ij k2 A Dx;N Dy;N i;j i;j 0 !211=2 0X !2 11=2 X k  ( i; j ) k k  ( i; j ) k k  ( i; j ) k k  ( i; j ) k A @ p~ij py A  @ p~ij p~ px D + p~ xpD + yp p ~ D p ~ D ij x;N ij x;N ij y;N ij y;N i;j i;j 0 ! !2 11=2 2 X X k  ( i; j ) k k  ( i; j ) k x x A  + 2 p~ij p~ pD  @2 p~ij p~ pD ij x;N ij x;N i;j i;j 0 ! !211=2 2 X X k  ( i; j ) k @2 p~ij py + 2 p~ij kyp(i; j )k A (21) p~ D p~ D i;j

ij

y;N

ij

i;j

y;N

Now from the de nition of x (i; j ), R kxp(i; j )k  AcN \TixpTjy kx ? x^ ij kp(x; y)dxdy ij Dx;N P (Six  Sjy ) p~ij Dx;N

Thus,

Z

 P (S x ij S y ) i j

AcN \Tix Tjy

!1=2 P (Ac \ T x  T y )1=2 N pDi j kx ? x^ ij k2 p(x; y)dxdy : x;N

0R !2 2 p(x; y)dxdy 1 (1 ? )P (T x  T y ) ^ k x ? x k ij X  T A \ T k  ( i; j ) k ij i j A p~ij p~ xpD  2ij @ x  Sy) D P ( S x;N ij x;N i j i;j Ri;j kx ? x^ k2 p(x; y)dxdy ij mx(AcN ) : = (22)  A D D

X

c N

c N

x i

y j

x;N

x;N

Now from the de nition of x (i; j ) we have,

Z

kx ? x^ ij k jp~N (x; y) ? ij p(x; y)jdxdy Z jp~N (x; y) ? ij p(x; y)jdxdy  rx(AN )

kx(i; j )k 

SixSjy

Six Sjy

16

where the last inequality is because of the de nition of rx (AN ). Hence

0R 1 !2 jp~N (x; y) ? ij p(x; y)jdxdy 2 2X  S S k  ( i; j ) k r ( A ) x x N A p~ij pD p~ p~ij @ = 16 D x  Sy) 4 P ( S x;N x;N ij i j i;j i;j 0R 1 N j p ~ ( x ; y ) ? p ( x ; y ) j d x d y 2X (a) ij  S S r ( A ) x N A P (Six  Sjy ) @  16 D 4P (Six  Sjy ) x;N i;j 2 XZ r ( A ) x N  4 jp~N (x; y) ? p(x; y)jdxdy

X

x i

y j

x i

Dx;N

y j

ij

y i;j Six Sj

2

3

2 Z r ( A ) x N 4 jp~N (x; y) ? p(x; y)jdxdy + X(1 ? ij )P (Six  Sjy )5  4 D A x;N i;j Z 2 2 c  4 rxD(AN ) jp~N (x; y) ? p(x; y)jdxdy + 4 rx(AND) P (AN ) : (23) x;N A x;N P P where (a) is because of i pi s2i  i pi si whenever si  1. Similarly the terms corresponding to y can also be bounded. Combining equations (22), (23), (21) and using the de nitions of fx(AN ) and fy (AN ) we have R (x ? x^ )t (y ? y^ )~pN (x; y)dxdy ij A pD Dij  (fx(AN ))1=2 (fy (AN ))1=2 (24) x;N y;N N

N

N

Now we consider the second term in equation (20) R (x ? x^ )t (y ? y^ )~pN (x; y)dxdy ij AcN pD Dij = = (b)



=

R (x ? x^x;N)t (yy;N? y^ )p(x; y)dxdy ij A pD Dij x;N y;N 0R 1 0R 1 ^ ij k2 p(x; y)dxdy 1=2 A ky ? y^ ij k2 p(x; y)dxdy 1=2 k x ? x A @ A @ A Dx;N Dy;N ! ! mx (AcN ) 1=2 my (AcN ) 1=2 c N

c N

Dx;N

c N

Dy;N

where (b) is by Cauchy Schwarz inequality. The proposition follows from the last inequality and equations (24) and (20). 2 Acknowledgements We thank Toby Berger for discussions that motivated this work and thank Tamas Linder for his comments on an early version of this paper that helped greatly improve its quality.

References [1] T. Berger. Multiterminal Source Coding. In G.Longo, editor, the Information Theory Approach to Communications, Springer-Verlag, New York, 1977. 17

[2] R. E. Blahut. Principles and Practice of Information Theory. Addison Wesley, Reading, MA, 1987. [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991. [4] A. Dembo, T. M. Cover, and J. A. Thomas. Information theoretic inequalities. IEEE Trans. Information Theory, IT-37:1501{1518, Nov. 1991. [5] A. Gersho. Asymptotically optimal block quantization. IEEE Trans. Information Theory, IT-25:373{380, July 1979. [6] A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Pub., Boston, 1992. [7] R. Durrett. Probability- Theory and Examples. Wadsworth Brooks/Cole Pub., California, 1991. [8] N. S. Jayant and P. Noll. Digital Coding of Waveform. Prentice-Hall, Englewood Cli s, NJ, 1984. [9] T. Linder and R. Zamir. On the asymptotic tightness of the Shannon lower bound. IEEE Trans. Information Theory, IT-40:2026{2031, Nov. 1994. [10] T. Linder, R. Zamir, and K. Zeger. Multidimensional companding for non-di erence distortion measures:the rate distortion function IEEE Trans. Information Theory, IT-45:533-547, March 1999. [11] T. Linder, R. Zamir, and K. Zeger. The multiple description rate region for high resolution source coding. In Proc. of Data Compression Conf., pages 149{158, Snowbird, Utah, March 1998. [12] T. Linder and K. Zeger. Asymptotic entropy constrained performance of tessellating and universal randomized lattice quantization. IEEE Trans. Information Theory, pp. 575-579, March 1994. [13] S. Na and D.L. Neuho . Bennett's integral for vector quantizers. IEEE Trans. Information Theory, IT-41:886{900, July. 1995. [14] W. Rudin. Principles of Mathematical Analysis, Page 111. McGraw Hill Pub., New York. [15] W. Rudin. Real and Complex Analysis, . McGraw Hill Pub., New York. [16] H. Viswanathan and T. Berger. The quadratic Gaussian CEO problem. IEEE Trans. Information Theory, IT-43:1549{1559, Sept. 1997. [17] R. Zamir and T. Berger. Multiterminal source coding with high resolution. IEEE Trans. Information Theory, IT-45:106{117. [18] R. Zamir and M. Feder. On lattice quantization noise. IEEE Trans. Information Theory, IT-42:1152-1159, July 1996.

18