Information rates for a discrete-time Gaussian channel with ...

Report 6 Downloads 24 Views
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

1527

Information Rates for a Discrete-Time Gaussian Channel with Intersymbol Interference and Stationary Inputs Shlomo Shamai (Shitz), Senior Member, IEEE, Lawrence H . Ozarow, Member, IEEE, and Aaron D. Wyner, Fellow, IEEE

Abstract -Bounds are presented on Zi,i,d,-the achievable information rate for a discrete Gaussian channel with intersymbol interference (IS11 present a n d i.i.d. channel input symbols governed by a n arbitrary predetermined distribution p , ( x ) . Upper bounds on I , the achievable information rate with the symbol independence demand relaxed, are given as well. The bounds are formulated in terms of the average mutual information of a memoryless Gaussian channel with scaled i.i.d. input symbols governed by the same symbol distribution p , ( x ) where the scaling value is interpreted a s a n enhancement (upper bounds) o r degradation (lower bounds) factor. The bounds apply for channel symbols with a n arbitrary symbol distribution p,(x), discrete as well us continuous, a n d thus facilitate bounding the capacity of the IS1 (dispersive) Gaussian channel under a variety of constraints imposed on the identically distributed channel symbols. The use of the bounds is demonstrated for binary (two-level) i.i.d. symmetric symbols a n d a channel with causal ISI. I n particular a channel with two a n d three IS1 coefficients, that is, IS1 memory of degree one a n d two, respectively, is examined. The bounds on 1i.i.d. are compared to the approximated (by Monte Carlo methods) known value of Zi.i,d, a n d their tightness is considered. An application of the new lower bound on Zi,i,d. yields a n improvement on previously reported lower bounds for the capacity of the continuous-time strictly bandlimited (or bandpass) Gaussian channel with either peak power or simultaneously peak power a n d bandlimiting constraints imposed on the channel’s input waveform. Index Terms --ISI, additive Gaussian channel, capacity, average mutual-information.

I. INTRODUCTION

C

ONSIDER the discrete-time Gaussian channel (DTGC) with intersymbol interference (ISI) described by

Manuscript received July 19, 1990; revised February 18, 1991. This work was done at AT&T Bell Laboratories, Murray Hill, NJ. S. Shamai (Shitz) is with the Electrical Engineering Department, Technion-Israel Institute of Technology, Haifa 32000, Israel. L. H. Ozarow is with the General Electric Corporate Research and Development Center, Room Kwc 611, P.O. Box 8, Schenectady, NY 12301. A. D. Wyner is with AT&T Bell Laboratories, Room 2C-365, 600 Mountain Ave., Murray Hill, NJ 07974. IEEE Log Number 9101891.

where ( x k } are stationary identically distributed real-valued channel input symbols, (yk> are the corresponding channel output observables, ( h k )are real IS1 coefficients’, and ( n k } are independent identically distributed (i.i.d.1 zero-mean Gaussian noise samples with variance E ( n i ) = (T2.

A convenient way to describe the channel (1) using matrix notation is yN= HNxN+nN

(2)

and it resides on the notion of the N-block DTGC [l],[2]. Here, y N = ( y o ,y l ; * * , y N - , I T , x N = ( x o ,~ 1 , .‘ . , X N - 1 I T are column vectors with N and n N = ( n o ,n , ; . ., n N components standing, respectively, for the output samples, channel symbols and noise samples and superscript T denotes the transpose operation. The equivalence between (2) and (1) is evident for N + cc [ll and in this case, which is of interest here, “end effects” are suppressed [l] and the rows of H = H” are specified by circular shifts of the IS1 coefficients {hi}.We assume throughout finite energy )lhJ12

/=I

IM I,

= I(

llhllx

+ v ; x),

(9)

where x is a random variable with the probability function p,(a), v is a zero-mean Gaussian variable with the same distribution as that of nk in (1) (variance u 2 )and the power enhancement factor is the norm

The notion “matched filter bound” stems from the fact evidenced in Appendix A, that I,, corresponds to a single shot transmission meaning that only one symbol is transmitted. For uncoded communication this assumption leads to the matched filter lower bound on error probability [3]. Again, the upper bound I,, (9) is formulated in terms of the mutual information of a memoryless channel with i.i.d. inputs where llh1I2 (10) takes on the interpretation of a power enhancement factor as opposed to the power degradation factor in (71, p 2 Illhl12,appearing in the lower bound IL. The Gaussian upper bound I,, to follow results imme’ diately by invoking standard arguments (see Appendix A) and it is stated in the following lemma. Lemma 1 -Gaussian Bound:

(13)

where H(A) is given by (8). The enhancement factor 5 is interpreted as the maximal gain of the IS1 “transfer-function” H(A). Note that I ( 5 E= : -,lhrl.

d

n

The Gaussian-based upper bound I,, stated next is specified by the average mutual information over this channel, taking {xl}to be the Gaussian symbols with the same correlation as that corresponding to the actual symbols. Lemma 2 -Gaussian Information Bound: 1 I = NIim + m -NI ( ~ , ; x , ) II,, =

(1 + s ( A ) I H ( A )

-/-In 1 257 0

U

12)

dh,

(14)

where H ( A ) is given by (8) and where CO

rx(1)er’*, i=m,

S,(A)= I=

(15)

-m

stands for the discrete power spectral density of the sequence {xl} for which r,(l) = E ( X , + ~ X , )denotes the correlation coefficients. For i.i.d. symbols the bound I,, (14) reduces to I,, (11). Clearly, 1

I

P

d h (16) In [ma~(@1H(A)1~,1)] 2%with 0 being the solution of I,,,

iT ( + (

1 22n-

In 1

PA / u ) I H ( A )

I ’) d A ,

IC,

=

-

~ ~ m a x ( O - I H ( h ) l - ’ , O ) d A = n - P ~ / u 2 . (17)

( 11)

where PA = E1xI2. This upper bound on Ii,i,d.equals the mutual information I,, = lim, l / N ( y N ;x N ) in the case where {xi)are assumed to be i.i.d. Gaussian random variables with variance (average power) PA. For symmetric binary symbols this kind of a bound was mentioned and used in L2.51. Two additional upper bounds I,,(i,i,d,)and I,Z(i,i,d.)are stated respectively in Lemmas B1, B2, in Appendix B. ~~

C. Upper bounds -Identically Distributed Symbols We relax now the independence demand and assume that the symbols {xi} that are not necessarily i.i.d. are identically distributed where each symbol is governed by the probability function p,(a). The upper bounds stated at Theorem 3 and Lemma 2 are proved in Appendix A.

The value C, is interpreted as the capacity under average power constraints [1] that results by maximizing (14) over all S,(A) that satisfy a symbol average power constraint, that is, r,(O) = E ( x 2 )= l/n-/,“S,(h)dA = PA. First, note that for no ISI, llhll = IhJ, and i.i.d. symbols, we have that I = I, = I,,, while I I I,, = I,, = C,. For Gaussian symbols {xJ with correlation coefficients r,(Z), and IS1 present, we have I = I,,. Assume now that (x,}are i.i.d. and each symbol x is a discrete symmetrically distributed random variable that takes on N possible values and satisfies E(1xI2)= PA. It is clear that for PA / a 2--)LE both I,.l,d.and I,,, + approach to the entropy of the discrete random variable x - b ( x ) , where 6 stands for the standard entropy function [lo], resembling thus the correct behaviour of Il,l.d,, while I,, + w . For low SNR (PA/ a 2+ O), it can be shown, in a similar way to that used in [25] for binary symbols, that

SHAMAI (SHITZ) el

U/.:

1531

INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL

example in [25] and [32] while closed form bounds [331 are further discussed in the next section. For the sake of simplicity we turn now to the case of only two nonzero IS1 coefficients (IS1 memory of degree 1)having the values h, = (1+ a’)-’/’, h, = a ( l + a 2 ) - 1 / 2 , and lh,l = 0 for i # 0 , l . We use here the convenient normalization [25] Ilh11’ = h i + h; = 1. The parameter - 1I aI 1 determines the amount of IS1 present, a = 0 111. APPLICATIONS corresponds to no IS1 while a = + 1(- 1) corresponds to We apply here several of the bounds presented in the the duobinary (dicode) case [3], [5] that yields the maxiprevious section to some interesting examples. In Section mum IS1 possible with memory of degree 1. This very 111-A, we address the binary symmetric case, that is, { x L ) simple model is important in some practical cases encounare i.i.d. binary symmetrically distributed [25] symbols tered in magnetic recording [61, [161, [181, 12.51, [271. For this case, with causal minimum phase IS1 the memory order of which is L - 1 , that is, h,=O for 1 < O and lr L . In particular we examine the cases of L = 2 and L = 3. In Section 111-B, we specialize on lower bounds for the continuous-time bandlimited baseband channel with either a peak power limit (PPL) [9] or simultaneous bandwidth limit and PPL (BPPL) [14], [15] constraints imposed and on the continuous-time channel input signal. The relevant results for the bandpass case are also mentioned. I,, = (1/2) I n ( l + P,,,,/u*) + (1/2) In 1/2+ (1/2) A. Binary Symmetric Symbols with Causal Finite Minimum Phase ISZ I,, -+ ( I / ~ > I I ~ I I/ ~aP2,, see also [351 for similar arguments. In Appendix B, another two upper bounds on Zi,i,d. denoted by Z,l(,.i,d,) (B.1) and IUZ(i,i,d,) (B.4) are derived. These bounds may turn, for certain cases, tighter as compared to I,,, (9) and I,, (11) presented here, see further discussion in Appendix B.

1i.i.d.-

Consider the binary symmetric case, that is, x , are i.i.d. binary symbols taking on the values k f i with equal probability 1/2. This is an interesting application since in several communication problems the transmitter is re- where ,,Z follows from (11) substituting in (8) stricted to use only binary alphabets [91, [lo], [161, [181, 2cu [25], [27], [32]. We specialize here to the causal minimum IH(A))* = h: + h; +2h0h,cosA = l + -cos A phase IS1 representation, as is the case at the output of l+a2 the sample-whitened matched filter (or the feed forward part of the DFE equalizer [3]) and assume that the IS1 and using the integral [36, Section 4.224, p. 3291. The memory is of degree L - 1, that is, h, = 0 for 1 < 0 and upper bounds Zul(,, ), and I,, , specialized for this binary case (given, respectively, by equations (B.5a) and 12 L . (B.5b) in Appendix B) were found to be less tight as The lower bound compared to min(ZUM,.),Z I L = Cb( h t P M / a ’ ) , ( 18a) The bounds Z, (20a), ZUM (20b) and ,,Z (20c) in 1 for a = 1 (the (bits/channel use) are shown in Fig. and the upper bound duobinary case) versus the signal-to-noise ratio P,,,,/ a 2 = c b ( II h 11 ’PM ) ( 18b) and are compared to jl the approximated value of 1,,d calculated in [25, Fig. 4.41 using Monte Carlo techniques. are given in terms of Cb(R) = ~ ( 6 +ap ; a ) the capacity The bounds ZL and ZuM are 3 dB apart and the matchedof a Gaussian scalar channel with binary inputs, where a filter upper bound,,Z (20b) is tighter for low and modis a binary random variable taking on the values k 1 with erate values of the signal-to-noise ratio PM/a2while the equal probability 1/2 and p is a normalized Gaussian lower bound I, (20a) is found to be tighter for high values random variable. The argument R is, therefore, interof the signal-to-noise ratio. Note that the Gaussian upper preted as the signal-to-noise ratio. The notation Cb(R) is bound ,,Z is remarkably tight for small values of the used since it is actually the capacity of the memoryless signal to noise ratio P M / u 2I 0 (dB) and it is the preGaussian channel with binary inputs and it is determined ferred upper bound in the region PM/a2I 2.5 (4dB). by a single integral [lo, Problem 4.221, [25, (4.1411 We turn now to examine the IS1 case where L = 3 and h , = 1/2, h, = h, = 1/2, (h: + h: h; = 1) which was considered also in [25, see Figs. 2.3 and 3.11. For this channel, Equivalent forms appear in [4, pp. 1531 and [32, pp. 2741. I H ( A ) l2 = (cos A + The function C,(R) has been evaluated numerically for



9

-

m,

+

m)’

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

1532

1.

5

P,/w2

Py/u2 ( dB)

(dB)

Fig. 1. Bounds on the information rate Ii,i,d, (bits/channel symbol) for Fig. 2. Bounds on the information rate Ii.i,d, (bits/channel symbol) for symmetric i.i.d. binary symbols and two equal IS1 coefficients h , = h , = symmetric i.i.d. binary symbols and three IS1 coefficients h , = h , = 1/2, I/& (IS1 memory of degree one) versus signal-to-noise ratio P M / u z h, = (IS1 memory of degree two) versus signal-to-noise ratio (dB). L,: Lower bound (20a). I,,: Matched-filter upper bound (20b). P M / u 2 (dB). ZL: Lower bound (21a). I,,:Matched-filter upper bound IUc: Gaussian upper bound (20c). Ii,,,d; approximated value of li,i,d, by (21b). Zuc: Gaussian upper bound (21c). Ii,i,d, approximated value of Monte Carlo techniques [25]. li,i,d, by Monte Carlo techniques [251.

J1/2

B. Lower Bounds on the Capacity of the Bandlimited Continuous- Time Channel with a PPL or BPPL Constraints

1

=-

2.rr

0

[ 1+ PM/a2(cos A + \11/2)2]d A

(21~)

are shown in Fig. 2 (in bits/channel use) along with I,, d , the Monte Carlo approximated value of 1,, d [25, Fig. 4.61. Indeed the bounds are looser in this case with a larger IS1 memory ( L - 1 = 2) when compared to the previous case with unit ( L -1 = 1) IS1 memory. However, the lower bound ZL seems to capture the behavior of Zlld for large signal-to-noise ratio PM/ u 2 values (which is determined basically by h,) while the upper bound is tight for asymptotically low values of PM/ a 2 for which ZuM (as well as lUG) + 1/2P;/p2 in agreement with the exact asymptotic behavior of [25, Corollary 4.21. In midrange values of the signal-to-noise ratio, the bounds ZL and I u M seem not to be tight. This observation is believed to hold in general and it is further supported by the Gaussian case (i.e., x , are i.i.d. Gaussian) for which Zuc 1 / 2 P M / a 2 for P M / a 2 + 0 and C,+ I,,--+ 1/21n(p2PM/a2) for P M / a 2+ W . Note that for the Gaussian case and for asymptotically high signal-to-noise ratio P M / a 2+CO, C, +,,Z [201, [34] evidencing that no loss in capacity under symbol average power constraint incurs by using i.i.d. Gaussian inputs. We conjecture that the same holds for non-Gaussian continuous symbols as well. --f

We turn our attention to the strictly bandlimited continuous-time channel for which the channel filter's transfer function D(f)= 1 for I f 1 I W and 0, otherwise. The transmitted channel input, s ( t ) = C k x k g ( t- k T ) is taken to be a PAM signal where g ( t ) stands for the pulse shape and T is the symbol duration. The signal d t ) is constrained either to be peak power limited to PM [9] (abbreviated here as the PPL constraint) or to satisfy both a PPL constraint and a strict bandwidth constraint [141, [151, that is, s ( t ) is of bandwidth no larger than W (these joint constraints are abbreviated by BPPL). We specialize here on lower bounds on the capacity of this channel under the PPL and BPPL constraints6. Following [9], [151 we restrict the signal to the PAM class for the baseband case considered here. The channel symbols are chosen to be i.i.d. digits x , satisfying the peak constraint Ix,I I (where subscript M stands for maximum). The pulse shape g ( t ) is rectangular g ( t ) = (1, It( I T/2j [91 for the PPL constraint and spectral cosine7 g ( t ) = .rr2/4[1 ( 2 t / T)2]-' cos(Tt / T ) for the BPPL case respectively, while the symbol duration T =(2W)-' [9], [15]. It has been verified that the signals s ( t ) so constructed satisfy the respective PPL [9] and BPPL [15] constraints. The 6For upper bounds on capacity under these constraints, see [301 an: [15]. In [14], a spectral triangular pulse g ( 0 = [(at/W 1 sin(at/ TI]* was selected.

1533

SHAMAI (SHITZ) et al. : INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL

frequency response of the receiver filter is chosen to match d ( t ) = 9''Nf) where 9and 9 - I stand for the Fourier transform pair. Evaluating p using the calculations reported in [91 and [151 gives p pppL= e / r and pApBppL= ~ / 8for the PPL and BPPL cases respectively. Thus, by Theorem 1 and with proper scaling, the lower bound on capacity (per channel symbol of duration T = (2Wl-l) is

where p equals pppLor peppL for the PPL and BPPL constraints respectively and where u 2= NOW with No /2 standing for the two-sided spectral density of the additive white Gaussian noise. The random variable x may take any probability function satisfying 1x1s The ran-

a.

i

a,

than those reported in [9], [151, since in [9] and [15] a uniform distributions for x in [ G ] was applied and here the optimizing distributions is used. However, the improvement measured with respect to the signal to noise ratio P M / ( N o W )decreases from 101og10( e ~ / 2 = ) 6.3 dB achieved for asymptotically low signal-to-noise ) 0, until it completely vanishes for ratios P M / ( N o W -+ asymptotically high signal-to-noise ratios P / ( N O W )-+ w. The bandpass case with either the PPL [9] or BPPL [15] constraints can be treated in a similar manner since Theorem 1 applies also for the complex case (see Appendix A, Part f). In this case where a QAM signalling is employed the channel symbols { x , ) are i.i.d. complex satisfying I x l i f i and (72,) stand for i.i.d. complex Gaussian random variables with independent, zero-mean real and imaginary components each of variance u 2 = 2N0W. The analysis yields,

2WC,,[ ( e / r ) 2 P M / ( N O W ) ] , bandpass and PPL constraint

I , , =

dom variable p stands as usual for a normalized zero-mean Gaussian variable with unit variance. In [91, [141, [151, x had to be chosen continuously distributed otherwise the convolutional inequality of entropy powers [22] upon which the derivation of [91, [14], [151 relies, collapses. Here, free from such restrictions, we chose the channel symbol distribution to maximize the bound in (22). This maximizing distribution is well known and reported in [37]. Denote by C,(R) the capacity derived in [371, that is, C,(R)=

sup Z ( a + p ; a ) ,

fi

(23)

where C,,(R) stands for the capacity found in [38], which is also defined by (23). However, a is now a complex random variable and p is a zero-mean complex Gaussian random variable with normalized i.i.d. components [E(Re p)' = E(Im p)' = 1, EKRe pXIm p ) ) = 01. In [38], it has been proved that the distribution of the complex random variable a achieving C,,(R) is uniform in arg(a) and independently discrete in (al. For R I 6, the constant envelope distribution [38], that is, la1 = 1 with probability 1, is optimal while for R -+athe optimal distribution approaches the one that is uniform over a disk with radius This observation yields, therefore,

-

fi.

la/ I

where the supremum is taken over all distributions of the real random variable a satisfying la1 _< fi and where p is a zero-mean, real, unit-variance ( E ( p 2 )= 1) Gaussian random variable. The optimized bound Z , for this channel equals the optimized ZL multiplied by 2W (measured in nats per sec), and thus takes the form

i

~wc,[( ~ / ~ ) ' P ~ / ( N , w )PPL ] , constraint

ILO =

ZWC,[ ( T / ~ ) ' P , / (

(25)

2WC,,[ ( ~ / 8 ) ~ f ' , /N( O W ) ] , bandpass and BPPL constraint

NOW)],

BPPL constraint

1.

(24) It has been shown [37] that the distribution of the random variable a in (23) achieving C,(R) is discrete and further, for R c; 6.25 [37, Fig. 31, it is binary symmetric, while for R +CQ it approaches a uniform distribution. It follows that [37],

-

+

- R,

Cce(R), In (2e)-'R

-

[

where C,,(R) is given [39] by, Cc,(R)

=

R+o I

RI-6,

+ 11,

R +w

--)

- l m $ ( [ ) l n ( F ) d T + l n ( 2R

with $ ( T ) = 2Texp[ - R(1-k T ~ ) ] Z , ( ~ R T and ) where Zo(*) stands for the zero order modified Bessel function. The improvement of the lower bounds ZLo (25) on the bounds reported in [9] and [15] which were derived for complex input symbols uniformly distributed over a disk of radius measured with respkct to the signal-to-noise ratio P M / ( N o W ) decreases , from 10loglo2e = 7.35 dB achieved for asymptotically low signal-to-noise ratios P, / ( N O W ) + 0, until it completely vanishes for asymptotically high signal-to-noise ratios P / ( N O W )+a.

a,

IV.

The lower bounds reported here (24) are strictly tighter

I

CAR)

cSb( R) =

QISCUSSION AND

CONCLUSION

We focus here on the achievable information rates for the classical discrete tide Gaussian channel with IS1 present and with identically distributed, not necessarily

1534

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

Gaussian, symbols. Lower and upper bounds on Zi,i,d. (the information rate, for i.i.d. but otherwise arbitrary channel input symbols) as well as upper bounds on I (the information rate for identically distributed input symbols, not necessarily independent) are derived. The bounds are formulated in terms of the average mutual information between the output and input of a scalar memoryless Gaussian channel. This formulation enables a unified treatment of discrete as well as continuous channel symbol distributions using the same underlined framework. These bounds are therefore easily calculated, either analytically or bounded again by applying the extensive results and techniques developed for memoryless channels. To demonstrate this, we turn back to Section III-A where binary symbols are considered and note that Cb(R), which is expressed in an integral form (19), can be further lower bounded as

li,i,d, for high signal-to-noise ratio values and continuous inputs, as has been demonstrated by the examples in Section 111 and the Gaussian case for which Zi,i,d. is given by (11). Other approaches which lead to memoryless channels representation, as the orthogonalizations method [lo], [20], [28] and the Tomlinson nonlinear filter [29] cannot be directly used due to the difficulties in translating the constraints imposed on the {x,}-the channel inputs to a corresponding set of constraints imposed on {i[} the , inputs of the resultant memoryless channels. This translation is straight forward for a block average power constraints. For example, if ( x l } , the outputs of a Tomlinson filter are demanded to be i.i.d. with a given probability function, it is not at all clear how to restrict { i [ the } , input of the Tomlinson filter to satisfy this demand. For the spe-

(26c) where b,,(a) is the binary entropy function [lo]. Equations (26a) and (26b) were taken from [33] (where N = 2 in notations of [33] was substituted) while ( 2 6 ~ is ) the cut-off rate for the binary case [4, p. 1421 which clearly lower bounds C,(R). Evident upper bounds on 1i.i.d. and I based on a Gaussian assumption are also mentioned. The lower bound on 1i.i.d. (6) can be used to lower bound the capacity of the dispersive (ISI) Gaussian channel under a variety of constraints imposed on the input symbols which do not preclude the use of i.i.d. symbols. Upper bounds on capacity are constructed by supremizing the upper bounds on Z over the relevant constraints induced on an individual input symbol. Incorporating the convolutional inequality of entropy powers [22]’ with the lower bound ZL ( 6 ) reduces exactly to the lower bounds derived using the standard technique described in detail in [9]. Assuming a causal IS1 channel (as observed for example at the output of a sampled-whitened matched filter or a feedforward equalizer), the lower bound in Theorem 1 is interpreted as the average mutual information of a zero-forcing decision-feedback equalizer having ideal errorless feedback decisions. Note, however, that the errorless past decision assumption has not been employed here to derive this lower bound. We conclude therefore that, as far as the average mutual information 1i.i.d. is concerned, ignoring the information carried by the rest of the IS1 coefficients {hi,i > 0) over compensates for the optimistic assumption of errorless past decisions, yielding thus an overall lower bound Z, on 1i.i.d; Indeed this lower bound seems to capture the exact asymptotic behavior of

‘Whenever the channel symbols are continuous random variables.

cia1 case of uniformly distributed (within the extreme levels) i.i.d. { i [ } it ,is readily verified that the outputs { x , ) are also uniformly distributed i.i.d. random variables. Also in this special case the bound in Theorem 1 is superior over the Tomlinson based bound and that is due to the information destroying modulu operation at the Tomlinson receiver. The information loss incurred by the modulu operation is diminished with the increase of the signal-to-noise ratio. The matched filter upper bound I,, (9) shows that under a given average power constraint at the channel output, that is, llhJI2is kept constant, IS1 cannot improve on the information rate 1i.i.d. over that of an ISI-less channel (that is, h = h,,), This is attributed mainly to the fact that the symbols {xi} were chosen i.i.d. as is also concluded in [25] for the binary and Gaussian cases. This feature is not necessarily true if optimal statistical dependence, (induced by the capacity achieving statistics) is introduced into the channel symbols as has been demonstrated for Gaussian symbols in [ll. This is clearly evidenced by the upper bound ZUc (12) on Z which shows that the increase in the information rate cannot exceed the corresponding information rate for an ISI-less channel with h , taken to be the maximal value of the IS1 “transfer” function, that is: h, = maXO 0. Nevertheless, in the context of this paper, it is only a technical assumption that permits simple proofs. All the results are still valid provided IH(A)l is integrable, which is guaranteed since 1 H(A)I2 was assumed integrable (finite power). Note, however, that if H ( h ) equals zero over a region (not isolated zeros) then p = 0 (7), yielding thus a trivial lower bound in Theorem 1. In the proofs of Theorems 1 and 2 and Lemma 1 it is assumed that {xl}is an i.i.d. sequence while this assumption is relaxed in the proofs of Theorem 3 and Lemma 2.

Proof of Theorem 1: Since H N is nonsingular, then

where z N= x N

+mN,

(A4

z N= ( H N ) - ' y N , and m N = ( H N ) - ' n N is a Gaussian vector with a correlation matrix r,N = = u 2 ( H N ) - ' ( H N ) - I T .The function G d ( . ) stands here for the differential entropy [lo]. The chain rule [lo, ch. 21 yields I

1

N-3

I

N-1

(A.3) Conditioning which does not increase the differential entropy [lo, ch. 21 gives bd(X,

+ m,lz'-l)

2 fjd(X,

+ m,Iz/-',x[-')

=b,(x,

+ m,Ix'-',m'-'),

(A.4)

where the right-hand-side equality in (A.4) follows since = I-1 x + m[-'(A.2). Express m, by

&I

m, = E(m,lm'-') + P,,

(A.5)

where PI is an innovation Gaussian random variable statistically independent of the Gaussian vector m'- '. ACKNOWLEDGMENT The function E(m,(m'-') denoting conditional expectaThe authors are grateful to A. Dembo for interesting tion is a linear function of m'-' since the random varidiscussions and to an anonymous reviewer for his careful ables involved are jointly Gaussian. Now, since x[, m l - ' , x'-l, and P, are all statistically independent, by (A.4), reading of the manuscript and useful suggestions. (A.5) APPENDIX A b d ( x [ + m,Iz'-') 2 b d ( x , + mIIx'-',m'-') PROOFS = b d ( x, + P , P ) = @ d ( XI + P , ) . (A.6) In this appendix, we prove Theorem 1, 2, and 3 and Lemmas 1 and 2, which appear in Section 11. We assume Using the entropy chain rule and (A.5) yields here that the symbols X I and the Gaussian noise samples n , are i.i.d. real random variables. Extensions to the complex case is shortly discussed at the end of this appendix. We further assume a nonsingular channel, that is, H N is a nonsingular matrix. The assumption incurs no

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

1536

Inserting now (A.7) and (A.6) into (A.3) and using (A.1) gives 1 1 N-1 -I( z"; x") 2 I ( x I P I ; XI). (A.8) N N I=O

+

The limit innovation variable p = 1imldmP I , which takes also the interpretation of the stationary innovation variable in the noise prediction process [34], is a well-defined Gaussian random variable with variance

up'=E ( p 2 )= u 2 [ lim

2/N

{detlHNI)

N-rm

] -'.

1 -z(y";x") N

1

N-1

z(yN;X,IXf-l,XI+,,...,X"). I=O

For i.i.d. symbols Z(X'-', x / + ~ , ., ' .x N ;x,) = 0 and, x/+l,. . * , X N ) = therefore, I ( y N ; x I I x ' Z ( y N , x l - l , xi+'; . -,x N ; x,) which is evaluated by a straight forward calculation since the IS1 effect of X I - l ,xl+l,. x N is fully neutralized". In other words to evaluate ~ ( y " ;x J x ' - ~ x, ~ + ~ .,, x*,) we may use x k = 0 for k = 0,1, . 1 - 1 , 1 + 1 , 1 + 2,- . * , N in (2) which leaves us with the formulation

',

e ,

y k = hk-,x,

and then rescaling Z(x

N-1

I-

N

(A.lO)

(A.ll) (A.9)

Invoking the asymptotic properties of log determinants of Toeplitz matrices [23] specifies up' in terms of "the IS1 transfer function" H ( A ) (8) which is given by the discrete Fourier transform of { h J . The proof of Theorem 1 is completed by following standard arguments [lo] (see also [l] for more details) to show that 1

Proof of Theorem 2: By the chain rule 1 1 N-1 /C = 0 Z(J";XIIX'-'). -NI ( Y " ; x " ) =

+ p; x) = Z(px + p p ; x) letting

+nk,

k

= 0,l;

. e ,

N > 1. (A.12)

It can readily be verified that p

=(c7jp/a>-l.

I(Y";X,)=I(-CI/;X,),

(A.13)

The bound can be further tightened by relaxing the conditioning in (A.4), that is, conditioning on x ' - ~ ,k = 2,3, . . rather than on XI-' ( k = 1). This yields an improvement on the bound ZL by a factor, expressed as a conditional mutual information, the evaluation of which is given in terms of a k - 1 fold integral. Another straight-forward lower bound for i.i.d. symbols (xI}results by applying the inequality l / N Z ( z N ;x N >> Z(z; x) = Z(x + m ; x) [lo], which is interpreted also as the mutual information achieved by employing ideal interleaving in (A.21, which effectively cancels the correlation present among the components of the vector m". The resultant bound, given by ( 6 ) where p in (7) is replaced here by

where 9, = E::Jhk-,yk is the maximal ratio combining that rises from the maximum likelihood rule when applied to (A.12). It is clearly seen, using power rescaling, that

follows by noting that

Proof of Theorem 3: We use (A.l) and (A.2) which stay valid also for non-i.i.d. {XI}-the case examined here. Let now "=JIN+p", (A.15)

U2

E ( m 2 )= - / T l H ( A ) [ - 2 d A . T

O

This lower bound is found to be inferior when compared to the one given in Theorem 1 as is evidenced by Jensen inequality9

which upon substitution in (A.ll), taking the limit for 0 N +03, yields Theorem 2. Proof of Lemma I : It is well known that the average mutual information l / N Z ( y N ;x") is upper bounded by a Gaussian distribution of the vector x N under a given correlation matrix E ( " T ) constraint [lo]. Thus, in our case, letting x i be i.i.d. Gaussian random variables with E(x,)'= PA yields Zi.i.d,= ZUc (11) [l], which sets the 0 upper bound stated in Lemma 1.

where tJJ and p" are independent zero-mean Gaussian vectors with tJJ" having the covariance matrix

- 1/2 p = (expL/TlnlH(A)l-2dA) T O

U

'See [3, ch. 81 where this inequality is stated in the context of the output signal-to-noise ratio superiority of the zero-forcing DFE over the linear zero-forcing equalizers.

In the above expression I N stands for the N X N unit matrix and 6 is a nonnegative scaling factor to be determined. This (A.15) representation is possible if

"This is identified as the mutual information with errorless past and future "decisions" that are provided as side information.

1537

SHAMAI (SHITZ) et al. : INFORMATION RATES FOR DISCRETE-TIMEGAUSSIAN CHANNEL

where r," and :r stand for the covariance matrices of the Gaussian vectors m" and p", respectively. Thus :r = r," - [ - 2 ~ 2 Z N must be a nonnegative definite matrix. The minimum value of t 2(maximum value of 5-2) for which this is satisfied is

distributed complex random variables according to b ,(U) = Jj,(Re(u),Im(u)). Note also that the asymptotic properties of log determinants of Toeplitz matrices extend to the complex case [23] as well. The main modification required is introducing conjugate and transpose conjugate operations4 whenever needed and taking care of dimen-1 sionality issues, that is dimension 2 for the complex case 5 2 = (minimal eigenvalue of ( H N T H N ) and 1 for the real case. Thus the results of Section I1 with (A.18) minor notational modifications extend also to complex = maximal eigenvalue of ( H N T H N ) . valued {xi},{hi},and {ni} accounting for bandpass sysNow tems. 1 1 The technique used in the proof of Theorem 3 could -I( z"; x") = -I( X" + IJJ" + p"; x") N N have been used also to derive an alternative lower bound on 1i.i.d. to that stated in Theorem 1. This is accomplished 1 I --Z(X"+ @"";x"), (A.19) by adding an additional independent Gaussian vector 7"' N to m" such that T" m" form a Gaussian vector with since we have ignored the additional noise component p"' i.i.d. component. This yields the same structure of a lower that is independent of both IJJ" and x". Now since IJJ" is bound as stated in Theorem 1 with p (7) replaced by composed of i.i.d. Gaussian components the variance of minoSAS,IH(A)l. This lower bound falls short as compared to the one given in Theorem 1, since p 2 which is tP2u2, minolAl,lH(A)l. 1 -+"+ N IJJN;xN)5 I ( x + $ ; x ) = I ( & + ($; x), APPENDIX B (A.20) UPPER BOUNDSON Zi,i,d where the right-hand side equality in (A.20) results by In this appendix, we formulate the upper bounds scaling by the factor 5. Invoking the well-known result I,l(i.i,d.) and ZU2(i,i,d,) on Zi,i.d. and compare them with the upper bound I,,, given by Theorem 2. The proofs of [231 these upper bounds, detailed in [41, Appendix B], are lim t 2= lim max eigenvalue of ( H"TH") somewhat lengthy and are, therefore, omitted. N+m N+m Lemma Bl: For i.i.d. symbols {xi}, = max I H ( A ) ~ ~ , (A.21)

-'}

+

O l h i ~ r

where H ( A ) is given by (8) and noting that the variance of 0 t@is u 2 concludes the proof of the Theorem. Proof of Lemma 2: Lemma 2 is well known [lo] and the proof follows that of Lemma 1. Noting again that replacement of the original symbols {xi) by Gaussian symbols which satisfy the same second-order moments rx(l),increases the average mutual information and yields thus the upper bound Zuc,. The bound Zuc, is again upper bounded by C,, which is the supremum of I,, over all the possible correlation matrices satisfying the symbolwise average power constraint E(xf)5 PA.

where the random variables U , and u1 are defined by U , = (x, + x l ) / f i and u1 = (x, - x l ) / f i , having the respective probability functions

A. Extensions and Comments Throughout the paper, for the sake of simplicity, we have considered only the real case, that is, the channel Again p,(a> stands for the probability function of x, and symbols xi,the noise samples ni and the IS1 coefficient xl, which by our assumptions are i.i.d. random variables hi are real valued. However, all the proofs are easily and * denotes convolution. The random variable v is a extended to the complex case where xi, ni, and hi are zero-mean Gaussian variable independent of U,, u l , and x, having the same probability function as that of nk in (1) complex valued. This is possible since the basic relation (that is: variance ( r 2 ) and the correlation coefficient J j d ( &U) = Inldet &I+ bd( U ) extends also to a complex vector U composed of continuI I 1 ous random variables (in our case a Gaussian noise vector) and an arbitrarily complex nonsingular matrix & For a symmetric distribution of x it follows that puo(a)= where we interpret differential entropies of continuously P u p d .

.

-

-

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

1538

where the function Z,(R) = Z ( G a + p ; a ) in which a is a ternary random variable with the probability function Prob(a = 0) = 1/2, Prob(a = 1) = 1/4 and p is a normalized Gaussian random variable. Following [32, p. 2741 it is readily seen that

Lemma B2: For i.i.d. symbols {xi},

N-l

where x assumes the probability function p,(a) and v is an independent Gaussian variable having the same distribution as that of I t k in (1) (variance a’). First note that when no IS1 is present llhll = Ih,l and Ph = 0 thus Ii,i,d.= I~2ti.i.d.) while 1i.i.d.S IUl(i.i,d,). For the Gaussian case, that is, { x i ) are chosen to be i.i.d. Gaussian (with ~ ( 1 x 1=~P,), ) with ISI present l,,i,d,= I,, I I,l(i,i,d.) s I,,. This relation between Z,l(i,i,d,) and I,, in the Gaussian case is established by noting that uo and u1 are i.i.d. Gaussian random variables having the same variance PA and using the fact that In(.) is a convex function. For high signal-to-noise ratio, PA / a 2 >> 1, it is readily seen that Z,,l(i,i.d,) = IU2(i.i,d,) < I,?. For low signal to noise ratio assuming still Gaussian i.1.d. symbols with IS1 present, we find that Zi,i,d.= I,, = Z,l(i.i.d.) = I,, = 1/211h112PA/ a 2 while Z,2(i.i,d,h is useless due to the nonnegative term 1/210g(l- lph( ), independent on the signal-to-noise ratio, P A / u 2 , which is present in the bound (B.4). Assume now that x is a discrete symmetrically distributed random variable that takes on J’ possible values and satisfies E(1xI2)= PA. It is clear that, for P A / a 2-03, both Zi,i,d. and I,, b(x) I In M, where b stands for the standard entropy function [lo], while I,, -,m, I,I(i,i,d,) 2Mu) - b(x) > b(x) and IuZ(i.i.d.) b(x>- 1/2 In (1 -

REFERENCES

[ l ] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian channel with intersymbol interference,” IEEE Trans. Inform. Theory, vol. 34, pp. 380-388, May 1988. [2] B. S. Tsybakov, “Capacity of a discrete Gaussian channel with a filter,” Probl. Peredach. Inform., vol. 6, pp. 78-82, 1970. [3] E. A. Lee and D. G. Messerchmitt, Digital Communication. Norwell, MA: Kluwer Academic Publishers, 1989. [4] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding. New York McGraw-Hill, 1979. [5] R. W. Lucky, J. Salz, and E. J. Weldon, Principles of Data Communication. New York: McGraw-Hill, 1968. [6] K. A. S. Immink, “Coding techniques for the noisy magnetic recording channel; A state-of-the-art report,” IEEE Trans. Commun., vol. 37, no. 5, pp. 413-419, May 1989. [7] I. N. Andersen, “Sample-whitened matched filters,” IEEE Trans. tnform. Theory, vol. IT-18, pp. 363-378, May 1972. [SI J. M. Cioffi, G. P. Dudevoir, M. V. Eyuboglu, and G. D. Forney, Jr., “MMSE decision-feedback equalizers and coding-Part I: General results,” to appear in IEEE Trans. Commun. [9] L. H. Ozarow, A. D. Wyner, and J. Ziv, “Achievable rates for a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 34, pp. 365-370, May 1988. [IO] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. lPhI2) > b h ) . [ l l ] L. H. Brandenburg and A. D. Wyner, “Capacity of the Gaussian We notice that also for low SNR, that is, PA / a 2-+ 0, channel with memory: The multivariate case,” BSTJ, vol. 53, no. 5, pp. 745-778, May-June 1974. I,, = Iul(i,i.d,) + ( I / ~ > I I ~ I/Ia~2P while , IUz(i.i.d.) as was already mentioned is useless. From this short discussion [12] W. Toms and T. Berger, “Capacity and error exponent of a channel modeled as a linear dynamic system,” IEEE Trans. Inform. Theory, we conclude that the relative tightness of the four upper vol. IT-19, pp, 124-126, Jan. 1973. bounds on Zi,i.d. presented here depends on the specific [13] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engineering. New York: Wiley, 1965. case, that is, the actual distribution of x, the IS1 vector h [14] C. E. Shannon and W. Weaver, The Mathematical Theory of Comand the region of the signal-to-noise ratio of interest. For munication. Urbana, IL: Univ. of Illinois, 1949. discrete symbols having a bounded entropy Q ( x )and high [15] S. Shamai (Shitz), “On the capacity of a Gaussian channel with signal to noise ratios the bounds IUG, Iul(i,i,d.), and I,Z(i,i,d,) peak power and bandlimited input signals,” Archiv fur Electronik Ubertragungstechnik, Band 42, heft 6, pp. 340-346, 1988. do not approach to the correct term Q(x) while I,, does. [16] and J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response When specified to the first example in Section 111-A with channels,” IEEE Trans. Commun., vol. 34, pp. 765-773, Aug. 1988. IS1 memory of degree 1, the previous upper bounds (in [17] G. D. Forney, Jr. and A. R. Calderbank, “Coset codes for partial response channels, or, coset codes with spectral nulls,’’ IEEE Trans. nats/channel used) assume the form, Information Theory, vol. 35, pp. 925-943, Sept. 1989. [18] A. R. Calderbank, C. Heegard, and T. A. Lee, “Binary convoluzUl(i.i.d.)= It([’la1/(’+ a 2 ) ] P M / u 2 ) tional codes with application to magnetic recording,” IEEE Trans. Inform. Theory, vol. IT-32, pp. 797-815, Nov. 1986. + I t @ + I a l / ( l + a 2 ) ] P , / u 2 ) - b‘ ( ‘M / ’) [19] M. V. Eyuboglu and G. D. Forney, ‘‘Trellis precoding: Combined (B.5a) coding precoding and shaping for intersymbol interference channels,” to appear in IEEE Trans. Inform. Theory. and [20] S. Kasturia, J. T. Aslanis, and J. M. Cioffi, “Vector coding for partial response channels,” IEEE Trans. Inform. Theory, vol. 36, ‘UZ(i.i.d.)= 2cb( [ l - a 2 / ( 1 +a 2 ) 2 ] P M / u 2 ) pp. 741-762, July 1990. [21] I. Bar-David and S. Shamai (Shitz), “Information rates for magnetic -~,(~,/a2)-1/21n(l-a2/(1+ a’)’), recording with a slope-limited magnetization model,” IEEE Trans. (B.5b) Inform. Theory, vol. 35, pp. 956-962, Sept. 1989.

-

-+

+

SHAMAI (SHITZ) et al.: INFORMATION RATES FOR DISCRETE-TIME GAUSSIAN CHANNEL

[22] N. M. Blachman, “The convolutional inequality for entropy powers,” IEEE Trans. Inform. Theory, vol. IT-11, pp. 267-271, Apr. 1965. [23] U. Grenander and G. Sjego, Toeplitz Forms and Their Applications. New York: Chelsea, 1984. [24] S. Shamai (Shitz) and I. Bar-David, “A lower bound on the cut-off rate fof dispersive Gaussian channels with peak-limited inputs,” IEEE Trans. Commun., vol. 39, pp. 1058-1064, July 1991. 12.51 W. Hirt, “Capacity and information rates of discrete-time channels with memory,” Doctoral dissert. (Diss. ETH No. 86711, Swiss Federal Inst. of Technol. (ETH), Zurich, Switzerland, 1988. [26] H. Sasano, M. Kasahora, and T. Namekawa, “Evaluation of the exponent function E ( R ) for channels with intersymbol interference,” Electron. and Commun. in Jap., vol. 65-A, no. 2, pp. 28-37, 1982. [271 S. Shamai (Shitz) and A. Dembo, “Bounds on the symmetric binary cut-off rate for dispersive Gaussian channels,” submitted to IEEE Trans. Commun. [281 J. W. Lechleider, “The optimum combination of block codes and receivers for arbitrary channels,” IEEE Trans. Commun., vol. 38, pp. 615-621, May 1990. [291 J. R. Price, “Nonlinearly feedback-equalized PAM vs. capacity for noisy filter channels,” in Proc. Int. Conf. Commun., June 1972, pp. 22-12-22-17. [30] S. Shamai (Shitz) and I. Bar-David, “Upper bounds on the capacity for a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 35, pp. 1079-1084, Sept. 1989. [311 A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling on the Gaussian channel,” IEEE Trans. Inform. Theory, vol. 36, pp. 726-740, July 1990. [321 R. E. Blahut, Principles and Practice of Information Theory. Read-

1539

ing, MA: Addison-Wesley, 1987. [33] L. H. Ozarow and A. D. Wyner, “On the capacity of the Gaussian channel with a finite number of input levels,” IEEE Trans. Inform. Theory, vol. 36, pp. 1426-1428, Nov. 1990. [34] M. V. Eyuboglu, “Detection of coded modulation signals on linear, severely distorted channels using decision-feedback noise prediction with interleaving,” IEEE Trans. Commun., vol. 36, pp. 401-409, Apr. 1988. [35] J. L. Massey, “All signal sets centered about the origin are optimal at low energy-to-noise ratios on the AWGN channel,” Abstracts of Papers, IEEE Inc. Symp. Inform. Theory, Ronneby Brunn-Ronneby, Sweden, June 1976, pp. 80-81. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series, and [36] I. Products. New York: Academic Press, 1980. [37] J. G. Smith, “The information capacity of amplitude and variance -Constrained scalar Gaussian channels,” Inform. Contr., vol. 18, pp. 203-219, 1971. [38] S. Shamai (Shitz) and I. Bar-David, “Capacity of peak and average-power constrained quadrature Gaussian channels,” Abstract of Papers, IEEE Int. Symp. Inform. Theory, Ann Arbor, MI, Oct. 1986, p. 66. [39] A. D. Wyner, “Bounds on communication with polyphase coding,” BSTJ, vol. 45, no. 4, pp. 523-559, April 1966. [40] S. Shamai (Shitz), “Information rates for the peak- and slope-limited magnetization model with binary signaling,” E E Pub. No. 761, Technion, Haifa, Israel, 1990. [41] S. Shamai (Shitz), A. D. Wyner, and L. H. Ozarow, “Information rates for a Gaussian channel with intersymbol interference and stationary inputs,” Internal Tech. Memo., AT&T Bell Laboratories, 1990.

s.