A Converse Coding Theorem for Mismatched ... - Semantic Scholar

Report 2 Downloads 94 Views
A Converse Coding Theorem for Mismatched Decoding at the Output of Binary-Input Memoryless Channels  V.B.Balakirsky The author is from The Data Security Association "Con dent", St.-Petersburg, Russia E-mails: [email protected] [email protected] [email protected] Abstract. An upper bound on the maximal transmission rate over binary-input memoryless channels, provided that the decoding decision rule is given, is derived. If the decision rule is equivalent to the maximum likelihood decoding (matched decoding), then the bound coincides with the channel capacity. Otherwise (mismatched decoding), it coincides with a known lower bound. Key words : channel capacity, mismatched decoding.

The work was supported by a Scholarship from the Swedish Institute, Stockholm, Sweden. 

1

1 Introduction Shannon's coding theorem on the capacity of memoryless channels [1] may be presented as a result of maximization of the transmission rate over all possible block codes and all possible decoding algorithms. An open information theory problem is to generalize this theorem to the case when optimization over decoding algorithms is forbidden [2-10]. More precisely, the decoder calculates the value of some distortion (metric) function for each codeword and makes a decision that the codeword with the smallest distortion was sent. Since the distortion function is not necessarily matched to the channel's characteristics, the maximal transmission rate can be less than the channel capacity. The current state of the mismatched problem and its connections with other open problems of information and coding theory are given in the recent paper by I.Csiszar and P.Narayan [9], and we cannot do it better. We deal with the binary-input memoryless channels and prove a converse statement to the direct coding theorem [3]. As a result, we obtain that the maximal transmission rate over any binary-input memoryless channel can be found for any distortion function using a single-letter characterization. The paper is organized as follows. In Section 2 we introduce some notations, which will be used in the analysis. The main result is formulated and discussed in Section 3. In Section 4 we discuss the basic ideas of the proof of the theorem, which lead to the results, called a 'combinatorial approximation lemma' and a 'permutation lemma'. Section 5 is devoted to the proof of the combinatorial approximation lemma, and Section 6 is devoted to the proof of the permutation lemma. Some properties of the mismatched decoding and basic ideas of the proof are illustrated in the Appendix for speci c data.

2 Notation 1. The channel's input and output alphabet will be denoted by X and Y , respectively. We assume that X = f0; 1g; but write X instead of f0; 1g meaning that many of the further considerations can be extended to a general case. 2. The number of symbols x 2 X in x 2 X n; the number of symbols y 2 Y in y 2 Y n; and the number of pairs of symbols (x; y) 2 X  Y 2

in (x; y) 2 X n  Y n will be denoted by

nx(x) = ny (y) = nx;y (x; y) =

n X

j =1 n X j =1 n X j =1

f xj = x g f yj = y g f xj = x; yj = y g:

Hereafter,  denotes the indicator function of the event in the braces: f g = 1 if the statement is true and f g = 0 otherwise. 3. We introduce the special notation for the empirical probability distributions generated by the numbers nx(x); ny (y); and nx;y (x; y); where (x; y) 2 X  Y: These distributions will be referred to as the compositions of x 2 X n; y 2 Y n; and (x; y) 2 X n  Y n : Comp( x ) = fnx(x)=ng Comp( y ) = fny (y)=ng Comp( x; y ) = fnx;y (x; y)=ng: Furthermore, we introduce the conditional composition of y 2 Y n given x 2 Xn Comp( yjx ) = fnx;y (x; y)=nx(x)g: 4. The probability distributions fPxg; fVx(y)g, and fWx(y)g will be denoted by P , V , and W , respectively. 5. The set of types P n on X n and the set of conditional types VPn on Y n given P 2 P n are introduced as P n = f P : n  Px are integers for all x 2 X g VPn = f V : n  Px  Vx(y) are integers for all (x; y) 2 X  Y g; The set of sequences of type P 2 P n and the set of sequences of conditional type V 2 VPn given x 2 TnP are denoted by TnP = f x 2 X n : Comp( x ) = P g TnV (x) = f y 2 Y n : Comp( yjx ) = V g: 3

6. A code Gn of rate R and length n, consisting of enR codewords, such that each codeword has the type P , i.e.,

Comp( x ) = P; for all x 2 Gn will be referred to as a P -composition code and denoted by GnP . 7. The distortion (metric) function is introduced as d = fdx(y)g, where d0(y) = 0 0  dx(y)  dmax < 1; for all x 2 X; y 2 Y: An additive extension of this function is de ned as n X d(x; y) = dxj (yj ): j =1

8. We use the following notation for the marginal distribution PV on Y , the entropy function H (PV ); the conditional entropy function H (V jP ); the mutual information function I (P; V ), and the average distortion function in the ensemble f X  Y ; Px  Vx(y) g :

PV (y) =

X

Px  Vx(y) xX

H (PV ) = ?

H (V jP ) = ?

y

X x;y

PV (y)  ln PV (y) Px  Vx(y)  ln Vx(y)

I (P; V ) = H (PV ) ? H (V jP ) X d(P; V ) = Px  Vx(y)  dx(y): x;y

9. To simplify formalization, we will write jV~ ? V j = max jV~x(y) ? Vx(y)j x;y

for any conditional probability distributions V~ = fV~x(y)g and V = fVx(y)g: 4

The notation above is conventional, and is given to make the paper selfcontained. The function, introduced below, seems to be new. This function will be used thoughout the paper. De nition 2.1 (Upsilon Notation): Let f ng be a given sequence. We introduce a Boolean function (f ng) such that (f ng) = 'TRUE' if and only if (i ) the following statement: "there exist an " > 0 and n0 (") < 1 such that n > " for all n > n0(")", is valid. Otherwise, (f ng) = 'FALSE': For the values of the function  we will use the relation 0 0 meaning that 'FALSE'  'FALSE' 'FALSE'  'TRUE' 'TRUE'  'TRUE': For any given sequences, f ng and f ng; we say that f ng approximates f ng if (f ng)  (f ng) i.e., (f ng) = 'TRUE' =) (f ng) = 'TRUE' and that f ng approximates f ng if (f ng)  (f ng) i.e., (f ng) = 'TRUE' =) (f ng) = 'TRUE': If both statements are valid, we say that f ng and f ng approximate each other. In this case we write (f ng) = (f ng) i.e., (f ng) = 'TRUE' () (f ng) = 'TRUE':

3 Statement of the Problem and a Converse Coding Theorem Our considerations of the mismatched decoding for a memoryless channel W are based on constructing the references to a memoryless channel V; which 5

can be called a 'test-chanel'. For formal convenience, we introduce the test channel as a real channel, existing between the sender and the receiver, and consider the system model for information transmission given in Fig.1. Let us suppose that a P -composition block code GnP is used to transmit data over two parallel memoryless channels V = fVx(y)g and W = fWx(y0)g: The conditional probabilities that the decoders D and D0 receive the vectors y = (y1; :::; yn) 2 Y n and y0 = (y10 ; :::; yn0 ) 2 Y n; when a codeword x = (x1 ; :::; xn) 2 GnP was sent, are given as n Y

V (yjx) =

j =1 n Y

W (y0jx) =

j =1

Vxj (yj )

(3.1)

Wxj (yj0 ):

We suppose that the decoders D and D0 estimate the transmitted codeword as x^ 2 GnP and x^ 0 2 GnP , respectively, using the same partitioning of the space Y n with respect to the minimal value of the distortion (metric) function d; i.e.,

d(x; y) d(^x; y) = xmin 2GnP 0 0 d(^x0; y0) = xmin 0 2Gn d(x ; y ):

(3.2)

P

If the minimum is attained for several codewords we assume that the received vector is decoded incorrectly. Then the decoding error probabilities for the codeword x can be expressed as

Pd(n) (x; V ) = Pd(n) (x; W ) =

X y

X y0

V (yjx)  f d(x; y)  D(x; y) g

(3.3)

W (y0jx)  f d(x; y0)  D(x; y0) g;

where

D(x; y) = x^2Gmin d(^x; y) n nfxg P d(^x0; y0): D(x; y0) = x^0 2min GnP nfxg 6

(3.4)

Let

P (n)(x; V ) Pd(n) (V ) = xmax 2GnP d P (n)(x; W ) Pd(n) (W ) = xmax 2Gn d

(3.5)

P

denote the maximal decoding error probabilities for the code GnP ; and let fPd(n)(V )g and fPd(n)(W )g denote the sequences of the maximal decoding error probabilities constructed for a sequence of codes fGnP g having a xed rate R. The upsilon notation, introduced in Section 2, allows us to represent the lower bound on the maximal transmission rate for mismatched decoding as a corollary of the result, formulated below. Theorem 3.1: Let P be a given probability distribution on X = f0; 1g and let W be a given binary-input memoryless channel. Let

d(P; W ) = f V : PV = PW; d(P; V )  d(P; W ) g: Then for all V 2 d(P; W ): Corollary 3.2: If

where

(3.6)

(fPd(n)(V )g)  (fPd(n)(W )g)

(3.7)

R > Cd(P; W );

(3.8)

Cd(P; W ) = V 2min I (P; V ) (P;W )

(3.9)

(fPd(n)(W )g) = 'TRUE'

(3.10)

d

and the set d(P; W ) is de ned in (3.6), then

for all P -composition codes GnP ; i.e., there exist "d(P; W ) > 0 and n0("; d; P; W ) < 1 such that

Pd(n)(W )  "d(P; W ) for all n > n0("; d; P; W ) 7

and all P -composition codes GnP : Proof: For a sequence of P -composition codes fGnP g; we use a converse statement to the coding theorem for the channel V , which minimizes the mutual information at the right hand side of (3.9), and conclude that, if (3.8) is valid, then (fPd(n)(V )g) = 'TRUE': (3.11) Combining (3.7) and (3.11), we obtain (3.10). Q.E.D. Corollary 3.3: If

R > Cd(W )

where

Cd(W ) = max Cd(P; W ) P and the function Cd(P; W ) is de ned in (3.9), then

(3.12) (3.13)

(fPd(n)(W )g) = 'TRUE' (3.14) for all codes Gn; i.e., there exist "d(W ) > 0 and n0("; d; W ) < 1 such that

Pd(n) (W )  "d(W ); for all n > n0 ("; d; W ) (3.15) and all codes Gn: Proof: There exists P such that any sequence of codes fGng; having a xed code rate, contains a subsequence of P -composition codes fGnP g; having asymptotically the same rate [2]. Thus, (3.14) follows from (3.10). Q.E.D.

Corollary 3.4: Let PW0 = fPW0 (y)g be a given probability distribution on Y and let  0 be a given constant. If

Vx(y) = fx  '(y)  e dx(y) where ffxg and f'(y)g are chosen in such a way that X Vx(y) = 1; for all x 2 X y

and

X x

Px  Vx(y) = PW0(y) for all y 2 Y 8

(3.16) (3.17) (3.18)

then the maximal transmission rate, when a P -composition code is used, is not greater than Cd(P; W ) = H (PW0) ? H (V jP ) for all binary-input channels W such that

PW = PW0 (3.19) d(P; W )  d(P; V ): Proof: If the marginal distribution on Y is xed by PW0 , then the minimization of the mutual information function at the right hand side of (3.9) is equivalent to the maximization of the conditional entrropy function H (V jP ) under the linear restrictions (3.17) and (3.18). This function is convex up, and we obtain (3.16) as a result of optimization using Lagrange multipliers. The restrictions (3.19) coincide with the restrictions at the right hand side of (3.6) when we consider them as restrictions on W given PW and V . Q.E.D. Corollary 3.5: For any P -composition code GnP ; decoding, which minimizes the distortion function d; is equivalent to the maximum likelihood decoding for the channel V , given in (3.16), and the maximal transmission rate is not greater than I (P; V ). Proof: Using (3.1) and (3.16) we write :

ln V (yjx) = n 

X x

Px  fx +

n X

j =1

'(yj ) +  d(x; y):

Since < 0; maximization of the function ln V (yjx) over all codewords is equivalent to the minimization of the distortion function d(x; y): Q.E.D. Discussion: Let V be a probability distribution minimizing the mutual information at the right hand side of (3.9). We are interested in the case

I (P; V ) < R < I (P; W ) because, otherwise, the maximum-likelihood decoding does not achieve an arbitrarily small error probability. Let And(x) = f y : d(x; y)  n  d(P; W ) g: 9

Then TnV (x); TnW (x)  And(x): Roughly speaking, a vector y 2 TnW (x) will be realized as a result of transmission of x over the channel W: Since R < I (P; W ); we can choose a code GnP , for which "almost" all the vectors belonging to TnW (x) do not coincide with the vectors belonging to TnW (x0 ), x0 2 GnP nfxg. However, to analyze the decoding error probability in our case it is necessary to estimate the number of d-"bad" points for x, i.e., the size of the intersection of TnW (x) and And(x0 ), x0 2 GnP nfxg. As it is well-known [2], ln jTnW (x)j=n ' H (W jP ) ln jTnV (x)j=n ' H (V jP ) and because V minimizes the mutual information at the right side of (3.9), it maximizes the conditional entropy function H (V jP ): Therefore, most of the vectors belonging to And(x) have the conditional type V: Since R > I (P; V ); the size of the intersection of TnV (x) and the union of TnV (x0 ), x0 2 GnP nfxg is asymptotically the same as jTnV (x)j (note that we use only a weak converse statement for the channel V and lower-bound this size as "  jTnV (x)j; where " > 0 does not depend on n). In fact, the main result of the paper is the statement that, for binary-input channels, this condition is sucient to show that the size of the intersection between TnW (x) and the union of And(x0 ), x0 2 GnP nfxg is asymptotically the same as jTnW (x)j.

4 Basic Ideas of the Proof of the Converse Coding Theorem

4.1 Combinatorial Approximation of Memoryless Channels

Let us consider the transmission of a codeword x 2 GnP over a memoryless channel V: Suppose that V is a type on Y n given P 2 P n ; i.e., V 2 VPn . Then we may write : Comp(yjx)  V with the high probability meaning that, with the high probability, the conditional compositions of the received sequences given x are close to the conditional type V . The exact formulation of this statement is shown below. 10

Lemma 4.1 ([2, Section 2.1]): Let

N (P; V ) = f n : P 2 P n and V 2 VPn g be the set of lengths such that a given probability distribution P is a type on X n; and a given conditional probability distribution V is a conditional type on Y n for that P: For any increasing sequence fng; whose elements belongs to N (P; V ), there exist sequences fng such that

p

n ! 0; n n ! 1; as n ! 1 and, for any x 2 TnP ,

X y

V (yjx)  f jComp(yjx) ? V j < n g  1 ? n

(4.1)

where f ng is a sequence, depending on fng; such that n ! 0; as n ! 1: Convention 4.2: We assume that Px > 0 for all x 2 X; and that n 2 N (P; V ): In later considerations, when we deal with two channels, V and W; we assume that \ n 2 N (P; V ) N (P; W ): The sequence fng; satisfying the conditions of Lemma 4.1, is assumed to be given.

Let

[V ] = f V~ 2 VPn : jV~ ? V j < n g: Then the inequality (4.1) can be rewritten as

V (Tn[V ](x)jx)  1 ? n for any x 2 TnP , where

Tn[V ](x) =

[ V~ 2[V ]

TnV~ (x):

(4.2) (4.3) (4.4)

Note also that

d(x; y)=n = d(P; V ); for all y 2 TnV (x) 11

(4.5)

and

jd(x; y)=n ? d(P; V )j < n  dmax ; for all y 2 Tn[V ](x): (4.6) The conditional probability to receive y depends only on Comp(yjx); and, for all V~ 2 VPn ; we may write V (yjx) = p(V~ jV )  jTn1(x)j if Comp(yjx) = V~ (4.7)

where

V~

X p(V~ jV ) = jTnV~ (x)j  expf n  Px  V~x(y)  ln Vx(y) g: x;y

(4.8)

Therefore, information transmission over a memoryless channel V can be represented as a choice of the conditional composition in accordance with the distribution p(V~ jV ); V~ 2 VPn ; and a choice of a particular received sequence in accordance with the uniform distribution on TnV~ (x): The step, which we call 'a combinatorial approximation', consists in the substitution of di erent probabilities, p0 (V~ jV ); V~ 2 VPn ; for p(V~ jV ); V~ 2 VPn ; into the expression at the right hand side of (4.7). Then we obtain a di erent channel, which may have memory. For example, we can assign p0(V~ jV ) as a uniform distribution on the set [V ] and, because of (4.3), expect that the decoding error probability for such a channel is approximately the same as for V: However, we will use a less tight approximation and assign p0(V~ jV ) as the indicator function of the event V~ = V: The de nition below formalizes this step, and the statement of Lemma 4.4 shows that this approximation can be used for our purposes. The considerations above are also valid for the channel W , and we continue the parallel de nitions of Section 3. De nition 4.3: The channels V n = fV n (yjx)g and W n = fW n(y0jx)g, where ( n ?1 if y 2 TnV (x) n V (yjx) = 0jT; V (x)j ; otherwise (4.9) ( n ?1 if y0 2 TnW (x) n 0 W (y jx) = 0jT; W (x)j ; otherwise will be referred to as combinatorial channels V n and W n: For these channels we de ne the decoding error probabilities for a codeword x 2 GnP : X n Pd(n)(x; V n) = V (yjx)  f d(x; y)  D(x; y) g (4.10) y

12

Pd(n)(x; W n) =

X y0

W n(y0jx)  f d(x; y0)  D(x; y0) g

the maximal decoding error probabilities : P (n)(x; V n) Pd(n) (V n) = xmax 2Gn d P

(4.11)

P (n)(x; W n) Pd(n) (W n) = xmax 2GnP d and the sequences fPd(n) (V n)g; fPd(n)(W n)g in the same way as for the channels V and W in (3.3)-(3.5). Lemma 4.4 (Combinatorial Approximation Lemma): The sequences of the maximal decoding error probabilities for the memoryless channels V and W; de ned in (3.1), and for the combinatorial channels V n and W n; de ned in (4.9), approximate each other, i.e., (fPd(n)(V )g) = (fPd(n)(V n)g) (4.12) ( n ) ( n ) n (fPd (W )g) = (fPd (W )g):

4.2 Permutation Lemma

The results of the previous subsection are valid for any memoryless channel. In particular, they are valid for a channel V; satisfying the restrictions V 2 VPn (4.13) I (P; V )  I (P; W ) PV = PW d(P; V )  d(P; W ): The idea of the proof of the theorem is to connect the maximal decoding error probabilities for combinatorial channels V n and W n in such a way that fPd(n)(W n)g approximates fPd(n)(V n)g; i.e., (fPd(n)(V n)g)  (fPd(n)(W n)g): Then, based on the combinatorial approximation lemma, we can use this inequality in the middle part of a logical chain whose rst part is the statement (fPd(n)(V )g) = (fPd(n)(V n)g) 13

and the last part is the statement (fPd(n)(W n)g) = (fPd(n)(W )g): As a result, we obtain (3.7) and prove the theorem. Lemma 4.5 (Permutation Lemma): Let the probability distributions P 2 n P and W 2 VPn be given, and let GnP be a P -composition binary code. Let V be some probability distribution satisfying the restrictions (4.13). Then the sequence of the maximal decoding error probability for the combinatorial channel W n approximates the sequence of the maximal decoding error probability for the combinatorial channel V n; i.e., (fPd(n)(V n)g)  (fPd(n)(W n)g): (4.14)

5

Proof of the Combinatorial Approximation Lemma

The distortion (metric) function was introduced in Section 2 in such a way that, for binary-input channels, d0(y) = 0 0  d1(y)  dmax < 1; for all y 2 Y: Without loss of generality, we assume that d1(0) = 0: (5.1) Proposition 5.1: Let V n = fV n(yjx)g and V n = fV n(yjx)g be the combinatorial channels constructed for the memoryless channels V = fV x(y)g and V = fV x(y)g such that (5.2) V 0(y) = V0 (y) = V 0(y); for all y 2 Y and ( (jY j ? 1)n; if y = 0 V 1 (y) = VV1((yy)) + (5.3) ? n ; if y 6= 0 1 ( (jY j ? 1)n; if y = 0 V 1 (y) = VV1((yy)) ? if y 6= 0: 1 + n ; 14

Then the following statements :

(fPd(n)(V n)g)  (fPd(n)(V )g)  (fPd(n)(V n)g) are valid, where

(5.4)

Pd(n) (V n) = xmax P (n)(x; V n) 2GnP d (n) (x; V n ) P Pd(n) (V n) = xmax d n 2G P

and

Pd(n) (x; V n) = Pd(n) (x; V n) =

X y X y

V n(yjx)  f d(x; y)  D(x; y) g V n(yjx)  f d(x; y)  D(x; y) g:

Proof of the Combinatorial Approximation Lemma Based on Proposition 5.1: Let us note that Proposition 5.1 also gives the inequalities :

(fPd(n)(V 0)g)  (fPd(n)(V n)g) (fPd(n)(V n)g)  (fPd(n)(V 00)g)

where

(5.5)

V00 (y) = V0(y) = V000(y); for all y 2 Y

and

(

V1(y) + 2(jY j ? 1)n; if y = 0 V (y) ? 2n; if y 6= 0 ( 1 2(jY j ? 1)n; if y = 0 Vx00 (y) = VV1((yy)) ? + 2 n ; if y 6= 0: 1 Using the converse statement to the coding theorem for the channels V 0 , V , and V 00 we obtain : (fPd(n)(V 0)g) = (fPd(n)(V )g) = (fPd(n)(V 00 )g): (5.6) Hence, (5.4)-(5.6) lead to the statement V 0 (y) x

=

(fPd(n)(V n)g) = (fPd(n)(V )g) = (fPd(n)(V n)g): 15

(5.7)

However, as it is easy to see (fPd(n)(V n)g)  (fPd(n)(V n)g)  (fPd(n)(V n)g)

(5.8)

and combining (5.7), (5.8) we complete the proof of the combinatorial approximation lemma. Proof of Proposition 5.1: The idea of the proof is to construct references to the vectors y 2 TnV (x) and y 2 TnV (x) located from the received vector y 2 Tn[V ](x) at the minimal Hamming distance. Let us introduce the sets

d (y; y0) g SnV (x; y) = f y 2 TnV (x) : dH (y; y) = y02min TnV (x) H d (y; y0) g: SnV (x; y) = f y 2 TnV (x) : dH (y; y) = y02min Tn (x) H V

Then, using (5.1), we note that

f d(x; y)  D(x; y) g  f d(x; y)  D(x; y) g  f d(x; y)  D(x; y) g

(5.9)

for all y 2 SnV (x; y) and y 2 SnV (x; y): Let us also introduce the uniform distributions on the sets SnV (x; y) and SnV (x; y) :

RnV (yjx; y)

=

(

jSnV (x; y)j?1; if y 2 SnV (x; y) 0;

otherwise ( n ?1 if y 2 Sn (x; y) V RnV (yjx; y) = 0jS; V (x; y)j ; otherwise. Note that

X y2Tn[V ](x)

V (yjx)  RnV (yjx; y) = V n(yjx):

(5.10)

(5.11)

To prove (5.11), we use the symmetry properties of the sets Tn[V ](x) and TnV (x) and conclude that the sum at the left hand side of (5.11) is the same for all y 2 TnV (x) and that this sum is equal to zero if y 62 TnV (x): Therefore, 16

this sum gives the uniform distribution on TnV (x); which is V n(yjx): Using (5.9)-(5.11), we write :

Pd(n)(x; V ) 

X

V (yjx)  f d(x; y)  D(x; y) g (5.12) X X = V (yjx)  RnV (yjx; y)  f d(x; y)  D(x; y) g y y2Tn[V ] (x) X X n  V (yjx)  RV (yjx; y)  f d(x; y)  D(x; y) g y y2Tn[V ] (x) X n = V (yjx)  f d(x; y)  D(x; y) g y2Tn[V ] (x)

y = Pd(n)(x; V n):

Similar considerations lead to the inequality : (5.13) Pd(n)(x; V )  n + Pd(n)(x; V n): Since the inequalities (5.12) and (5.13) are valid for all x 2 GnP ; we obtain

Pd(n)(V n)  Pd(n)(V )  n + Pd(n)(V n) and prove (5.5). Q.E.D.

6 Proof of the Permutation Lemma

6.1 Basic Ideas of the Proof

The proof of the permutation lemma can be presented as a result of several sequential steps, which are described in Subsections 6.2-6.6. We introduce a combinatorial broadcast channel fF n(y; y0jx)g, which has a codeword x 2 GnP at the input and the elements of the sets TnV (x) and TnW (x) at the output. The probabilities F n(y; y0jx) are assigned in such a way that X n 0 F (y; y jx) = V n(yjx) (6.1) 0

y X y

F n(y; y0jx) = W n(y0jx) 17

and where

F n(y; y0jx) > 0 () dH (y; y0) = k

(6.2)

d (y; y0); y 2 TnV (x): k = y0 2min Tn (x) H

(6.3)

d(^x; y)  d(x; y):

(6.4)

W

If R > I (P; V ); then the decoder D; which receives y; cannot realize a reliable decoding procedure, i.e., with the high probability, there exists an incorrect codeword x^ such that Using (6.1) and the inequality d(P; V )  d(P; W ) we conclude that

d(x; y)  d(x; y0)

(6.5)

i.e., the conditions for the decoder D0; which receives y0; are worse than the conditions for the decoder D from the point of view of the correct codeword x: Our intention is to prove that the conditions for D0; as a rule, are not worse than the conditions for D from the point of view of the incorrect codeword x^ ; i.e., as a rule, d(^x; y0)  d(^x; y): (6.6) Then using (6.4)-(6.6) we conclude that, as a rule, d(^x; y0)  d(x; y0); and D0 cannot do better than D: The result is named a 'permutation lemma' since y 2 TnV (x) and y0 2 TnW (x) can be obtained one from the other by permutations of the components. This statement follows from the equation PV = PW; which is very important for our considerations. The minimal number of pairwise permutations transforming y to some element of the set TnW (x) is equal to k=2; where k is de ned in (6.3). In Subsection 6.2 we describe the structure of these permutations and note that there is a complementary property : for any given x 2 X; the set Y can be splitted into two disjoint subsets, Yx+ and Yx?; such that

xj = x; yj 6= yj0 =) yj 2 Yx?; yj0 2 Yx+ or yj 2 Yx+; yj0 2 Yx?: (6.7) On the base of this fact we represent F n as a result of combination of four conditional distributions : Qn; U n?k ; V k , and W k in Subsection 6.3. The 18

distribution U n?k is used to assign the coinciding symbols in y and y0; and we interprete it as a common element in V n and W n: The distributions V k and W k are used to assign the noncoinciding symbols in y and y0; and we interprete them as individual contributions of V n and W n: A binary vector z of length n and the Hamming weight k, which determines the positions j such that yj 6= yj0 ; is assigned in accordance with the distribution Qn: In the end of Subsection 6.3 we represent the transformation of y 2 TnV (x) to y0 2 TnW (x) as interchanging of the individual contributions of V n and W n: Since d(P; V )  d(P; W ); all the transformations of y 2 TnV (x) to y0 2 n TW (x) do not decrease the value of the distortion function for the codeword x: If x^ 2 GnP nfxg is an incorrect codeword, then a particular transformation either increases or decreases the value of the distortion function for x^ : The result depends on the distributions of the components yj 6= yj0 located at positions where xj 6= x^j : In Subsection 6.4 we show that there are distributions which de nitely do not increase this value. Therefore, we conclude that d(x; y0)  d(^x; y0) if d(x; y)  d(^x; y) and call these transformations of y to y0 as permutations conserving relations between distortions. The last step of the proof is to show that the decoder D can restrict the code GnP in such a way that the distribution on the components, where y is transformed to y0; is xed. On the other hand, the restriction of the code keeps the converse statement to the coding theorem for the channel V: These considerations are given in Subsection 6.5. A very short formal proof of the permutation lemma is given in Subsection 6.6. We recommend reading Subsections 6.2-6.5 simultaneously with the material given in the Appendix, where the main steps of the proof are illustrated numerically.

6.2 The Structure of Minimal Permutations Between TnV (x) and TnW (x) Let

Yx+ = f y 2 Y : Vx(y) > Wx(y) g Yx? = f y 2 Y : Vx(y) < Wx(y) g 19

(6.8)

(

nPx  (Vx(y) ? Wx(y)); if y 2 Yx+ 0; if y 2 Yx? X k(y) = kx(y) x X kx = kx(y):

kx(y) =

y

For binary-input channels, the equation PV = PW means that P0  V0(y) + P1  V1(y) = P0  W0(y) + P1  W1 (y) for all y 2 Y: Therefore, as it is easy to see, Y0? = Y1+; Y0+ = Y1? and X X k = k(y) = kx y

x

where the parameter k is de ned in (6.3). Let us suppose that we want to get a vector y0 2 TnW (x) from y 2 n TV (x) using a minimal number of permutations of the components (these permutations are referred to as minimal permutations). If y 2 Y0+; then V0(y) > W0 (y) , V1 (y) < W1(y), and nP0  V0(y) ? nP0  W0(y) = nP1  W1 (y) ? nP1  V1(y) because of the condition PV = PW . The similar statement is valid for all y 2 Y1+: Therefore, we should select k0(y) indices j such that (xj ; yj ) = (0; y) for all y 2 Y0+ and k1 (y) indices j such that (xj ; yj ) = (1; y) for all y 2 Y1+: Then k0 and k1 indices j such that xj = 0 and xj = 1 will be selected, and k0 = k1: The components of the vector y located at k0 positions where xj = 0 should be replaced with the components located at k1 positions where xj = 1 and vice versa. As a result of this procedure, we obtain a vector y0 2 TnW (x) with the following properties: nxyy (x; y; y0) = nP x  minfVx (y ); Wx(y )g ( X x  (Vx (y ) ? Wx (y )) if y 2 Yx+ nxyy0 (x; y; y0) = nP nPx  (Wx(y) ? Vx(y)) if y 2 Yx? y0 6=y where nxyy0 (x; y; y0) is the number of indices j such that xj = x; yj = y; and yj0 = y0: 20

6.3 Decomposition of the Distributions V n and W n

Convention 6.1 (Z -Convention): For any binary vector z = (z1 ; :::; zn ), we introduce the sets

Z  = f j : zj = 0g ; Z = f j : zj = 1g and associate z with the pair (Z  ; Z ): For any vector u; we write : u = (uZ  ; uZ ); where uZ  is the vector composed of the components uj ; j 2 Z  ; and uZ is the vector composed of the components uj ; j 2 Z (in further considerations we substitute x; y; and y0 for u). Let us de ne the conditional probability distribution Q = fQx(z)g on f0; 1g in such a way that

Qx(z) =

(

(nPx ? kx)=nPx; if z = 0 kx=nPx; if z = 1

for all x 2 X: Since Q is as a type on f0; 1gn given x 2 TnP , we can also refer to the set

TnQ(x) = f z 2 f0; 1gn : Comp(x; z) = f Px  Qx(z) g g and de ne a combinatorial channel Qn = fQn (zjx)g as the uniform distribution on TnQ(x); i.e., ( n ?1 n n Q (zjx) = jTQ(x)j ; if z 2 TQ(x) 0;

otherwise.

We also de ne the conditional probability distributions U = fUx(y)g; V = fVx(y)g; and W = fWx(y)g in such a way that

fVx(y); Wx(y)g Ux(y) = nPx  min nPx ? kx nP V ( y x x ) ? (nPx ? kx )  Ux (y ) Vx(y) = kx Wx(y) = nPxWx(y) ? (nPx ? kx)  Ux(y) k x

21

(6.9)

for all x 2 X; y 2 Y . Given x 2 TnP and z 2 TnQ(x); we refer to the sets TUn?k (xZ  ) = f yZ  2 Y n?k : Comp(xZ  ; yZ  ) = f (nPx ? kx)  Ux(y)=(n ? k) g g

TkV (xZ ) = f yZ 2 Y k : Comp(xZ ; yZ ) = f kx  Vx(y)=k g g TkW (xZ ) = f yZ0 2 Y k : Comp(xZ ; yZ0 ) = f kx  Wx (y)=k g g

where we used the de nition of Comp, given in Section 2 as the set of ratios of the number of entries (x; y) in a particular pair of vectors and the length of these vectors. Let us introduce the combinatorial channels U n?k = fU n?k (yZ  jxZ  )g V k = fV k (yZ jxZ )g; and W k = fW k (yZ jxZ )g as the uniform distributions on the sets TUn?k (xZ  ), V k (yZ jxZ ), and W k (yZ0 jxZ ), i.e., ( n?k ?1 if yZ  2 Tn?k (xZ  ) n ? k U U (yZ  jxZ  ) = j0T; U (xZ  )j ; otherwise V k (y

Z jxZ )

=

W k (yZ0 jxZ ) =

(

(

jTkV (xZ )j?1; if yZ 2 TkV (xZ ) 0;

otherwise

jTkW (xZ )j?1; if yZ0 2 TkW (xZ )

0; otherwise. Then we can represent the process of generating the vectors y 2 TnV (x) and y0 2 TnW (x); located at the Hamming distance k, as a transmission of x over a combinatorial broadcast channel (Fig.2), de ned by the probabilities X n F n(y; y0jx) = Q (zjx)  f n?k (yZ  ; yZ0  jx)  z

V k (yZ jxZ )  W k (yZ0 jxZ ) 22

where

f n?k (yZ  ; yZ0  jx) = U n?k (yZ  jxZ  )  f yZ0  = yZ  g: These probabilities satisfy (6.1), (6.2), and the triples ( yZ  = yZ0  ; yZ ; yZ0 ) 2 ( TUn?k (xZ  ); TkV (xZ ); TkW (xZ ) ) (6.10) are generated at the output of the channel. The decoder D receives (yZ  ; yZ ); while the decoder D0 receives (yZ  ; yZ0 ) (note that z is unknown for D and D0). From the point of view of D, transmission of the codeword x over the combinatorial channel V n can be represented as a process consisting of two steps : 1) the channel assigns a binary vector z 2 TnQ(x); 2) the channel distributes n ? k components of the vector y corresponding to the components 0 of the vector z in accordance with the distribution U n?k and k components of the vector y corresponding to the components 1 of the vector z in accordance with the distribution V k : From the point of view of D0, transmission of the codeword x over the combinatorial channel W n can be described in the same way, but the distribution W k should be used at the second step instead of V k : Note that the distributions W and V have the complementary property : V0 (y) = W1 (y) (6.11) W0 (y) = V1(y) for all y 2 Y: Note also that the average distortion functions d(P; V ) and d(P; W ) can be written as X X d(P; V ) = nP1n? k1  U1 (y)  d1(y) + kn1  V1(y)  d1(y) y y X X d(P; W ) = nP1n? k1  U1 (y)  d1(y) + kn1  W1 (y)  d1(y) y y where we used the equations d0(y) = 0 for all y 2 Y: Therefore, the restriction d(P; V )  d(P; W ) can be represented as the inequality : X (V1(y) ? W1 (y))  d1(y)  0: (6.12) y

23

6.4 Permutations Conserving Relations Between Distortions for Binary-Input Channels

In this subsection we show the idea for a simpli ed case and then extend the considerations to a more general case. Let z 2 TnQ(x); y 2 TnV (x); and y0 2 TnW (x) be given. Suppose also that x^ 2 GnP nfxg is xed. Let X f (xj ; x^j ) = (x; x^) g (6.13) kxx^ =

kxx^ (y) = kxx^(y0) =

j :zj =1

X

j :zj =1

X

j :zj =1

f (xj ; x^j ; yj ) = (x; x^; y) g f (xj ; x^j ; yj0 ) = (x; x^; y0) g

denote the number of entries (x; x^) in the pair (xZ ; x^ Z ); the number of entries (x; x^; y) in the triple (xZ ; x^ Z ; yZ ); and the number of entries (x; x^; y0) in the triple (xZ ; x^ Z ; yZ0 ); respectively. Proposition 6.2: Let y 2 TnV (x) and y0 2 TnW (x) be connected by (6.10). If x^ 2 GnP and z 2 TnQ(x) are chosen in such a way that kxx^(y) = kxx^  Vx(y); for all y 2 Y (6.14) 0 0 0 kxx^ (y ) = kxx^  Wx(y ); for all y 2 Y then d(^x; y)  d(x; y) =) d(^x; y0)  d(x; y0): Proof: Let  = (d(x; y0) ? d(^x; y0)) ? (d(x; y) ? d(^x; y)): (6.15) Since ( ) d(x; y) = d(x  ; y  ) + d(xZ ; yZ ) Z Z d(xZ ; yZ0 ) d(x; y0)

( ) d(^x; y) = d(^x  ; y  ) + d(^xZ ; yZ ) Z Z d(^xZ ; yZ0 ) d(^x; y0) we change the order of summands in (6.15) and write  = (d(xZ ; yZ0 ) ? d(xZ ; yZ )) ? (d(^xZ ; yZ0 ) ? d(^xZ ; yZ )): 24

Thus

 =

? = =

0 1 X X @ k10(y0)  d1(y0) ? k10(y)  d1 (y)A ? (6.16) y y0 0 1 X X @ k01(y0)  d1(y0) ? k01(y)  d1 (y)A y y0 X X k10  (W1(y) ? V1 (y))  d1(y) ? k01  (W0 (y) ? V0(y))  d1(y) y y X (k10 + k01)  (W1 (y) ? V1(y))  d1(y)  0: y

The rst equation in (6.16) follows from the note that we can consider only noncoinciding components of x and x^ : The second equation follows from (6.14). Then we have used (6.11) and (6.12). Hence,

d(x; y0) ? d(^x; y0) = d(x; y) ? d(^x; y) +   0: Q.E.D. The statement below generalizes the considerations. The proof is omitted since it is similar to the proof of Proposition 6.2. Proposition 6.3: Let y 2 TnV (x) and y0 2 TnW (x) be connected by (6.10). If x^ 2 GnP and z 2 TnQ(x) are chosen in such a way that (6.17) jkxx^(y) ? kxx^  Vx(y)j  n  kxx^ for all (x; x^; y) 2 X  X  Y jkxx^(y0) ? kxx^  Wx(y0)j  n  kxx^ for all (x; x^; y0) 2 X  X  Y where kxx^; kxx^(y); and kxx^ (y0) are de ned in (6.13), and

X y

then

(W1(y) ? V1 (y))  d1(y)  2 ndmax

d(^x; y)  d(x; y) =) d(^x; y0)  d(x; y0):

25

(6.18)

6.5 Restrictions of the Code GnP

In this subsection, we are dealing with the probabilistic ensemble f XY Z; Px  Vx(y)  Q^ (zjx; y) g where Z = f0; 1g; ( )=(nPx  Vx(y)); if z = 0 ^ Qx;y (z) = 1k ?(yk)x=((ynP (6.19) if z = 1 x x  Vx(y )); and the parameters kx(y); x 2 X; y 2 Y; are de ned in (6.8). We consider the distribution Q^ = fQ^ x;y (z)g as a conditional type on f0; 1gn given y 2 TnV (x) and x 2 TnP and introduce the set TnQ^ (x; y) = f z 2 f0; 1gn : nxyz (x; y; z) = nPx  Vx(y)  Q^ x;y (z) for all (x; y; z) 2 X  Y  Z g where nxyz (x; y; z) denotes the number of entries (x; y; z) in the triple (x; y; z): Let the uniform distributions on TnQ^ (x; y) be given as

Q^ n (zjx; y) =

(

jTnQ^ (x; y)j?1; if z 2 TnQ^ (x; y)

0; otherwise: Let us x x^ 2 GnP ; y 2 TnV (x); and z 2 TnQ^ (x; y): In Subsection 6.2 we partitioned the output alphabet Y in two disjoint subsets, Y0+ and Y1+ (see (6.8)), and we can say either y 2 Y0+ or y 2 Y1+ for any y 2 Y: Let x(y) 2 f0; 1g be de ned in such a way that x(y) = x () y 2 Yx+ (6.20) and let X f (^xj ; yj ) = (^x; y) g; x = x(y): (6.21) kxx^ (y) = Besides, let

j :zj =1

kxx^ =

X X y2Yx+ j :zj =1

f (^xj ; yj ) = (^x; y) g:

(6.22)

If kxx^ > 0; then fkxx^(y)=kxx^; y 2 Yx+g are the probability distributions on Yx+ for all pairs (x; x^) 2 X  X: In the de nition below, we construct a subcode of GnP , consisting of codewords, for which these distributions are close 26

to fVx(y); y 2 Y g for all x^ 2 X: De nition 6.4: The code GnP (yZ ) will be referred to as a V -restricted code given y 2 TnV (x) and z 2 TnQ^ (x; y) if it consists of the codewords x^ 2 GnP such that

jkxx^(y) ? kxx^  Vx(y)j  n  kxx^ for all (x; x^; y) 2 X  X  Y

(6.23)

where the parameters kxx^ (y) and kxx^ are de ned in (6.21), (6.22). Proposition 6.5: Let X n P^d(n)(x; V n) = V (yjx)  Q^ n (zjx; y)  f d(x; y)  D(x; yjz) g y;z

where

d(^x; y): (6.24) D(x; yjz) = x^2Gnmin P (yZ )nfxg Then there exist sequences f ng such that p n ! 0; n n ! 1; as n ! 1 (6.25) and that the sequences fPd(n)(V n)g and fP^d(n)(V n)g approximate each other, i.e., (fPd(n)(V n)g) = (fP^d(n)(V n)g): (6.26) Proof: First of all, we write : (fP^d(n)(V n)g)  (fPd(n)(V n)g) (6.27) since x 2 GnP (yZ ) for all y 2 TnV (x) and z 2 TnQ^ (x; y): Using (6.8), (6.19), and (6.20) we note that Q^ xy (1) > 0 only if x = x(y): Therefore, we can introduce the following conditional probability distribution on f0; 1g : Q~ y (z) = Q^ x;y (z); where x = x(y): The distribution Q~ = fQ~ y (z)g can be also considered as a conditional type on f0; 1gn given y 2 TnPV ; i.e., we refer to the set TnQ~ (y) = f z 2 f0; 1gn : Comp(y; z) = fPV (y)  Q~ y (z)g g: 27

Let the uniform distributions on TnQ~ (y) be given as

Q~ n(zjy)

=

(

jTnQ~ (y)j?1; if z 2 TnQ~ (y) 0;

otherwise:

Let us x V -restricted codes for all z 2 TnQ~ (y) in the same way as for the vectors z 2 TnQ^ (x; y) and de ne the decoding error probability for the codeword x as X n P~d(n)(x; V n) = V (yjx)  Q~ n(zjy)  f d(x; y)  D(x; yjz) g y;z

where the function D(x; yjz) is de ned in (6.24). Note that

X n P~d(n)(x; V n) = V (yjx)  Q~ n (zjy)  f x 62 GnP (yZ ) g y;z X + V n(yjx)  Q~ n (zjy)  f x 2 GnP (yZ ) g  y;z

and

(6.28)

f d(x; y)  D(x; yjz) g

X n P^d(n)(x; V n)  V (yjx)  Q~ n (zjy)  f x 2 GnP (yZ ) g  y;z

(6.29)

f d(x; y)  D(x; yjz) g

since the vectors z 2 TnQ~ (y) put stronger restrictions on the incorrect codewords included into the collection GnP (yZ ) for all y 2 TnV (x) compared to the vectors z 2 TnQ^ (x; y): However, using the same arguments as in Lemma 4.1, we conclude that there exist sequences f ng satisfying (6.25) such that X n V (yjx)  Q~ n(zjy)  f x 2 GnP (yZ ) g  1 ? n (6.30) y

where f ng is a sequence, depending on f ng; such that n ! 0; as n ! 1: Thus, combining (6.28)-(6.30), we conclude that (fP~d(n)(V n)g)  (fP^d(n)(V n)g): 28

(6.31)

The condition R > I (P; V ) does not give an opportunity to partition the space consisting of all possible pairs (y; z), where y 2 TnPV and z 2 TnQ^ (y). Therefore, (fPd(n)(V n)g)  (fP~d(n)(V n)g): (6.32) As a result of (6.31) and (6.32), we have (fPd(n)(V n)g)  (fP^d(n)(V n)g) (6.33) and complete the proof using (6.27) and (6.33). Q.E.D. Note that x 2 GnP (yZ ) and that if x^ 2 GnP (yZ ); then the rst group of inequalities in (6.17) is satis ed. The similar considerations are valid for the decoder D0, which receives a vector y0 2 TnW (x): To avoid more complex notations, we distinguish between the parameters of D and D0 writing y0 and y0 instead of y and y; respectively. Let ( 0)=(nPx  Wx (y 0)); if z = 0 ^ Qx;y0 (z) = 1k ?(yk0)x=((ynP if z = 1 x x  Wx (y 0));

TnQ^ (x; y0) = f z 2 f0; 1gn : nxy0z (x; y0; z) = nPx  Wx(y0)  Q^ x;y0 (z) for all (x; y0; z) 2 X  Y  Z g Q^ n (zjx; y0)

=

(

jTnQ^ (x; y0)j?1; if z 2 TnQ^ (x; y0)

0; otherwise: De nition 6.6: The code GnP (yZ0 ) will be referred to as a W -restricted code given y0 2 TnPW and z 2 TnQ^ (x; y0) if it consists of the codewords x^ 2 GnP such that jkxx^(y0) ? kxx^  Wx (y0)j  n  kxx^ for all (x; x^; y0) 2 X  X  Y (6.34) where X f (^xj ; yj0 ) = (^x; y0) g; x = x(y0) kxx^(y0) =

kxx^ =

j :zj =1

X X

y0 2Yx? j :zj =1

f (^xj ; yj0 ) = (^x; y0) g 29

and

x(y0) = x () y0 2 Yx?: Proposition 6.7: Let X n 0 P^d(n)(x; W n) = W (y jx)  Q^ n (zjx; y0)  f d(x; y0)  D(x; y0jz) g y0 ;z

where

d(^x; y0): D(x; y0jz) = x^2Gnmin 0 ) nf x g ( y P Z Then there exist sequences f ng such that

p

(6.35) n ! 0; n n ! 1; as n ! 1 and that the sequence fP^d(n)(W n)g approximates the sequence fP^d(n)(V n)g; i.e., (fP^d(n)(V n)g)  (fP^d(n)(W n)g): (6.36) Proof: Let us suppose that d(x; y)  D(x; yjz) for some y 2 TnV (x) and z 2 TnQ^ (x; y). Then there exists an incorrect codeword x^ such that x^ 2 GnP (yZ ) and d(^x; y)  d(x; y): (6.37) Let the parameters kxx^, de ned in (6.22), correspond to this codeword. We generate y0 2 TnW (x) such that (6.10) is valid, i.e., we set yZ0  = yZ  and distribute the components of the vector yZ0 2 TkW (xZ ): It means that we distribute k0 symbols y 2 Y0? on k00 and k01 positions j where xj = 0; and k1 symbols y 2 Y1? on k10 and k11 positions j where xj = 1: However, these positions are xed and, typically, we obtain the same conditional distribution on kx0 and kx1 positions as a result of this procedure (note that the main diculty of the previous analysis was the point that there are exponentially many incorrect codewords, and we could not x one of them). Therefore, we need to assign n in such a way that the distributions, which do not satisfy (6.34), will be classi ed as large deviations. We refer to the arguments leading

30

to the Delta-convention again [2, Convention 2.11] and conclude that we can take a sequence f ng; satisfying (6.35). Then we note that the conditions (6.23) and (6.34) coincide with (6.17) for the codeword x^ satisfying (6.37). We suppose that d(P; V )  d(P; W ) ? 2 ndmax (6.38) and use the result of Proposition 6.3. The formal steps are given below. X n P^d(n)(x; V n) = V (yjx)  Q^ n(zjx; y)  y;z

f 9x^ : x^ 2 GnP (yZ ) and d(x; y)  d(^x; y) g

!

X y;z

X V n(yjx)  Q^ n(zjx; y)  W k (yZ0 )  yZ0

f 9x^ : x^ 2 GnP (yZ ); x^ 2 GnP (yZ0 ); and d(x; y)  d(^x; y) g



X y;z

X V n(yjx)  Q^ n(zjx; y)  W k (yZ0 )  yZ0

f 9x^ : x^ 2 GnP (yZ0 ) and d(x; y0)  d(^x; y0) g

= =

X y;z

X y0 ;z

X V n(yjx)  Q^ n(zjx; y)  W k (yZ0 )  f d(x; y0)  D(x; y0jz) g yZ0

W n(y0jx)  Q^ n(zjx; y0)  f d(x; y0)  D(x; y0jz) g

= P^d(n)(x; W n) where we also used the equations V n(yjx)  Q^ n(zjx; y)  W k (yZ0 ) = Qn(zjx)  U n?k (yZ  )  V k (yZ )  W k (yZ0 ) = W n(y0jx)  Q^ n (zjx; y0)  V k (yZ ): Finally, we note that the sequence f ng vanishes with n and we can set n = 0 in the equation (6.38), when we formulate the result using the upsilon function. Q.E.D.

31

6.6 Formal Proof of the Permutation Lemma

The upsilon notation allows us to write all the proof of the permutation lemma in one line : (fPd(n)(V n)g) = (fP^d(n)(V n)g)  (fP^d(n)(W n)g) = (fPd(n)(W n)g) where we have used (6.26) and (6.36), i.e., (4.14) is valid. The proof of the theorem is obtained when we also write : (fPd(n)(V )g) = as the leftmost term of this line, and = (fPd(n)(W )g) as the rightmost term.

7 Acknowledgement The author wishes to thank Prof. R.Johannesson for the possibility to work at the mismatched problem in his Department and for the help in preparing the manuscript. The author is also greatful to Prof. R.Ahlswede, Prof. N.Cai, Prof. I.Csiszar, Prof. G.Kaplan, Prof. A.Lapidoth, Prof. N.Merhav, Prof. P.Narayan, and Prof. S.Shamai (Shitz) for their interest to the results and fruitful discussions.

32

Appendix An Example of Mismatched Decoding Let

Y = f0; 1; 2; 3g; P0 = P1 = 1=2: The marginal probabilities fPW (y)g and the values of the distortion function fdx(y)g are de ned in Table 1 by the vector PW and the matrix d; respectively. Note that the minimization of the distortion function d; when a xed P composition code is used, is equivalent to the maximum likelihood decoding for the memoryless channel de ned by the following matrix of transition probabilities : " # 1 = 2 1 = 4 1 = 8 1 = 8 W0 = 1=8 1=8 1=4 1=2 : Really " # 1 2 3 3 ? log2 W0 = 3 3 2 1 and the minimization of the distortion function d0x(y) = ? log2 W0;x(y) (A.1) is equivalent to the minimization of the function dx(y) = c  (d0x(y) + fx + '(y)) (A.2) for any c > 0; ffxg and f'(y)g: If c=1 f0 = 0; f1 = 2 '(y) = log2 W0;0 (y); for y = 0; 1; 2; 3 then the values of the distortion function, de ned by (A.1), (A.2), coincide with the values given in the matrix d. The matrix W given in Table 1 was obtained by changing the rst two columns of W0: It does not a ect the marginal probability distribution on Y; but increases the average distortion d(P; W0) = 0:5  (4  1=8 + 3  1=8 + 1  1=4 + 0  1=2) = 0:5625 d(P; W ) = 0:5  (4  1=4 + 3  0=8 + 1  1=4 + 0  1=2) = 0:625: 33

Therefore, V = W0 satis es the restrictions

PV = PW d(P; V )  d(P; W ) and belongs to the minimization domain in (3.9). The transition probabilities V = fVx(y)g; given in Table 1 by the matrix V; satisfy the same restrictions, since

PV (0) = PV (3) = 0:5  (0:474 + 0:151) = 0:3125 PV (1) = PV (2) = 0:5  (0:240 + 0:135) = 0:1875 d(P; V ) = 0:5  (4  0:151 + 3  0:135 + 1  0:240 + 0  0:474) = 0:6245: However

H (V jP ) = 1:807 > H (W0jP ) = 1:213: Thus in comparison with W0 , the distribution V gives stronger restrictions on the maximal transmission rate expressed as follows : Cd(P; W ) = H (PW ) ? H (V jP ): It is easy to see that maximum likelihood decoding for the memoryless channel V is also equivalent to the minimization of the distortion function d = fdx(y)g: Really

#

"

2:059 2:889 2:727 : ? ln V = 12::077 727 2:889 2:059 1:077 and if

c = 1:21 f0 = 0; f1 = 1:65; '(y) = ln V0(y) for y = 0; 1; 2; 3 then

dx(y) = c  (? ln Vx(y) + fx + '(y)): 34

Let us suppose that n = 2000 and that x = 01000 11000 is the transmitted codeword (we denote by si the sequence consisting of i repeatitions of the symbol s). Then

y = 0474 12402135 3151 0151 1135 22403474 2 TnV (x) and all the other vectors of the set TnV (x) are obtained by all permutations of the components yj ; j 2 J0; and all permutations of the components yj ; j 2 J1, where we denote the set of the rst 1000 positions by J0 and the set of the last 1000 positions by J1 : If we want to get a vector y0 2 TnW (x) from y using the minimal number of pairwise permutations of the components, we have to select 99, 10, and 26 positions j 2 J0 where yj = 0, 2, and 3, respectively, and 135 positions j 2 J1 where yj = 1: The corresponding

components should be interchanged. This procedure is illustrated in Table 2. We can represent the transmission of the codeword x over the channel n V as follows : 1) the channel assigns 135 positions j 2 J0 and 135 positions j 2 J1; 2) the channel distributes 99, 10, and 26 symbols 0, 2, and 3 on the selected positions of J0 and generates 135 symbols 1 on the selected positions of J1 ; 3) the channel distributes 474 ? 99; 240; 135 ? 10; and 151 ? 26 symbols 0, 1, 2, and 3 on the remaining 865 positions of J0 and 151, 240, and 471 symbols 0, 2, and 3 on the remaining 865 positions of J1: The distributions of the step 2) are given by V, and the distributions of the step 3) are given by U (Table 1). The vectors y0 2 TnW (x); located from y at the Hamming distance k = 270; will be obatined if we substitute (J1; J0) for (J0; J1) to this procedure. Then the distribution of the step 2) will given by W. We are interested in the case

enR

!?2 2000! 1000! > 625!  385!  385!  625!  474!  135!  240!  151! 35

and suppose that the decoder D; having received one of

jTnV (x)j =

!2 1000! 474!  135!  240!  151!

vectors y 2 TnV (x); can assign one of

jTnQ^ (x; y)j =

!

!

!

474  135  240  151 99 135 10 26

!

binary vectors z 2 TnQ^ (x; y) of the Hamming weight 270 such that there are 99, 135, 10, and 26 positions j containing zj = 1 and (xj ; yj ) = (0,0), (1,1), (0,2), and (0,3), respectively. Let x^ 2 GnP nfxg denote an incorrect codeword, and k00(y) and k01(y) denote the number of indices j such that (^xj ; yj ) = (0; y) and (^xj ; yj ) = (1; y); respectively, for y = 0; 2; 3: Furthermore, let

k00 = k00(0) + k00(2) + k00 (3) k01 = k01(0) + k01(2) + k01 (3): Note that we write the rst index 0 because, in our case, Y0+ = f0; 2; 3g and x(y) = 0 for y = 0, 2, and 3. Then the codeword x^ belongs to a collection, which was called a V -restricted code GnP (yZ ), if jk00(0) ? k00  99=135j < n  k00 jk00(2) ? k00  10=135j < n  k00 jk00(3) ? k00  26=135j < n  k00 and jk01(0) ? k01  99=135j < n  k01 jk01(2) ? k01  10=135j < n  k01 jk01(3) ? k01  26=135j < n  k01: We examine the characteristics of the decoding when the decoder selects the codeword with the minimal distortion, which belongs to GnP (yZ ); and takes the average value of the decoding error probability for the codeword x over all possible vectors z 2 TnQ^ (x; y): Note that, in this procedure, the decoder has some helping information since the vectors z are assigned depending on 36

x: However, such a restriction of the code GnP is equivalent to the procedure when the decoder restricts the code using uniformly distributed vectors z~ having the Hamming weight

99  625=474 + 135  385=135 + 10  385=240 + 26  151=625  639 such that there are 99  625=474, 135  385=135, 10  385=240, and 26  151=625 positions j containing z~j = 1 and yj = 0, 1, 2, and 3, respectively. An assignement of the vector z~ does not depend on the transmitted codeword, and the possibility of a reliable decoding procedure with the 'helping' information will lead to a possibility of an improvement of the behaviour, which is based on the received vector y: The similar considerations are also valid for the decoder D0; which receives a vector y0 2 TnW (x) and 'constructs' W -restricted codes. The decoding error probability for D0 is not less than for D when they use restricted codes, and this note gives a necessary connection between the behaviour of D and D0; which allows us to lower-bound the decoding error probability for D0:

Table 1: The matrices of transition probabilities, which determine the characteristics of mismatched decoding for the channel W with respect to the distortion function d:

y=

0

1

2

3

PW = ( 1/2 + 1/8 )/2 = ( 1/4 + 1/8 )/2 = ( 1/8 + 1/4 )/2 = ( 1/8 + 1/2 )/2 = = 0.3125

= 0.1875

= 0.1875

= 0.3125

0 4

0 3

0 1

0 0

W0 =

1/2 1/8

1/4 1/8

1/8 1/4

1/8 1/2

W=

1/2 - 1/8 = = 0.375 1/8 + 1/8 = = 0.250

1/4 + 1/8 = = 0.375 1/8 - 1/8 = = 0.000

1/8 = = 0.125 1/4 = = 0.250

1/8 = = 0.125 1/2 = = 0.500

V=

0.474 0.151

0.240 0.135

0.135 0.240

0.151 0.474

U=

0.375/ 0.865 = = 0.433 0.151/ 0.865 = = 0.175

0.240/ 0.865 = = 0.277 0.000/ 0.865 = = 0.000

0.125/ 0.865 = = 0.145 0.240/ 0.865 = = 0.277

0.125/ 0.865 = = 0.145 0.474/ 0.865 = = 0.548

V =

0.099/ 0.135 = = 0.733 0.000/ 0.135 = = 0.000

0.000/ 0.135 = = 0.000 0.135/ 0.135 = = 1.000

0.010/ 0.135 = = 0.074 0.000/ 0.135 = = 0.000

0.026/ 0.135 = = 0.193 0.000/ 0.135 = = 0.000

W =

0.000/ 0.135 = = 0.000 0.099/ 0.135 = = 0.733

0.135/ 0.135 = = 1.000 0.000/ 0.135 = = 0.000

0.000/ 0.135 = = 0.000 0.010/ 0.135 = = 0.074

0.000/ 0.135 = = 0.000 0.026/ 0.135 = = 0.193

d=

Table 2: The structure of a vector y 2 T2000 V (x); which is transformed into 0 2000 a vector y 2 TW (x); the probability distributions V and W are given in Table 1 and x = 01000 11000 :

Yx+ = Yx? = nx0 (x; y) = nx1 (x; y) = nx2 (x; y) = nx3 (x; y) =

x=0

x=1

f0; 2; 3g f1g

f1g

375 240 125 125

f0; 2; 3g

+ 99 151 + 0 + 0 0 + 135 + 10 240 + 0 + 26 474 + 0

kx(0) = kx(1) = kx(2) = kx(3) =

99 0 10 26

0 135 0 0

kx =

135

135

k(0) = k(1) = k(2) = k(3) =

99 135 10 26

k=

270

39

y

- Channel

x 2 GnP

- Decoder

- x^ 2 GnP

- Decoder 0

- x^ 0 2 GnP

D

V

u

-

y0

Channel W

D

Figure 1: A system model of information transmission for mismatched decoding.

40

- Decoder D

x 2 GnP

u

- U n?k  V k  W k z 2 TnQ(x)

6 u

yZ0 2 TkW (xZ )

6 -

yZ 2 TkV (xZ )

yZ  2 TUn?k (xZ  )

?

- Decoder 0 D

Qn

Figure 2: A broadcast model for mismatched decoding.

41

- x^ 2 GnP

- x^ 0 2 GnP

References [1] C.E.Shannon, "A mathematical theory of communication," Bell Syst. Techn. J., vol.27, pp.379-423 and 623-656, July and Oct. 1948. [2] I.Csiszar and J.Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic Press, 1981. [3] J.Y.N.Hui, Fundamental issues of multiple accessing. Ph.D dissertation, MIT, 1983. [4] I.Csiszar and J.Korner, "Graph decomposition : A new key to coding theorems," IEEE Trans. Inform. Theory, vol.IT-27, pp.5-12, Jan. 1981. [5] V.B.Balakirsky, "Coding theorems for discrete memoryless channels with given decision rule," Lecture Notes on Computer Science, No.573, Proceedings of 1st French-Soviet Workshop on Algebraic Coding, July 1991, pp.142-150. [6] A.Lapidoth, "Information rates for mismatched decoders," in Proc. 2-nd Winter Meeting on Coding and Information Theory (Essen, Germany, Dec. 1993), p.12-15. [7] I.Csiszar and P.Narayan, "Channel capacity for a given decoding rule," in Proc. IEEE Int.Symp. on Information Theory (Trondheim, Norway, June-July 1994), p.378. [8] A.Lapidoth, "Mismatched decoding and the multiple access channel," in Proc. IEEE Int.Symp. on Information Theory (Trondheim, Norway, June-July 1994), p.382. [9] I.Csiszar and P.Narayan, "Channel capacity for a given decoding rule," IEEE Trans. Inform. Theory. vol.IT-41, pp.35-43, Jan. 1995. [10] N.Merhav, G.Kaplan, A.Lapitoth, and S.Shamai(Shitz), "On information rates for mismatched decoders," IEEE Trans. Inform. Theory, vol.IT-40, pp.1953-1967, Nov. 1994.

42