PERFORMANCE OF THE STRENGTH-REDUCED ADAPTIVE FILTER ...

Report 2 Downloads 85 Views
1997 IEEE International Symposium on Circuits and Systems, June 9-12, 1997, H o G o n g

PERFORMANCE OF THE STRENGTH-REDUCED ADAPTIVE FILTER ARCH1TECTU:REFOR 51.84 Mb/s ATM-LANI]' Manish Goel and Naresh R. Shartbhag Coordinated Science Laboratory/ltCE Department, University of Illinois a t Urbana--Champaign, 1308 W. Main, Urbana, IL 61801. E-mail : [m~oel,shanbhag]~uivlsi.csl.oiuc.edu

Abstract In this paper, a pipelined strength-reduced (PIPSR) adaptive filter architecture is employed as a receive equalizer for 51.84 M b / s ATM-LAN over unshielded twisted pair category-3 (UTP-3) wiring. This architecture provides the advantage of low-power dissipation and high-speed operation. simulation results are presented to investigate the effect of level of pipelining on the steady-state signal-tonoise ratio at the slicer ( S N & L ~ ~Simulation ~ ~ ) . reisults indicate that speed-ups of up to 160 can be achieved with about 0.8 dB loss in SNR,i,,,,.

simulation results for this application in order t o investigate the effect of t h e level of pipelining on the steadys t a t e signal-to-noise ratio a t the slicer (SNRsltcer).

2. Pipelined Strength-reduced

(PIPSR) Equalizer In this section, we review t h e strength reduction transformation and development of the PIPSR architecture [3] from the CC architecture. T h e reader is referred t o [3] for more details, while we will present only the final results here.

A

1. Introduction Strength reduction is an algebraic transformation, which has been proposed [I] t o trade-off multipliers with adders in a complex multiplication thereby achieving power reduction. In [ 3 ] , we proposed the application of strength reduction transformation a t the algorithmic level t o adaptive systems involving complex signals and filters. It was shown in [ 3 ] t h a t the strength-reduced (SR) filter enables power savings of 2 1 - 25% over the traditional cross-coupled ( C C ) filter with no loss in performance. However, the application of strength renduction increases the critical path and hence a n inherently pipelined SR (PIPSR) architecture was also presented. Furthermore, by trading off t h e throughput gained through pipelining with power supply scaling El], it was demonstrated t h a t additional power savings of 40 - 69% are feasible. Clearly, the SR and PIPSR architectures is an attractive alternative t o t h e traditional C C architecture for high bit-rate communications and digital signal processing applications. In this paper, we demonstrate an application of the SR and PIPSR architectures as an equalizer for quadrature amplitude multiplex (QAM) [5] receivers. V1'e employ Q A M receiver for 51.84 M b / s [4]ATM-LAN over 100 meters of unshielded twisted pair category3 (UTP-3) cable employing 16-CAP (carrierless amplitude/phase) modulation scheme. It must be mentioned t h a t the 16-CAP line code was chosen as ATM-LAN standard over UTP-3 a t 51.84 Mb/s [4].Therefore, this study is of great interest as it can lead t o low-power and cost-effective ATM-LAN transceivers. We present the

Strength Reduction Transformation

Consider the problem oC coIiiputiiig the product or two complex numbers (U j b ) and ( c j d ) as shown below,

+

+

(a+jb)(c+,~d)=(ac-bd)+~(ucl+bc).

(2. 1)

From (2.1), a direct-mapped architectural implement,ation would require a total of four real multiplications and bwo real additions to compute the complex product. Application of strength rcduction involves rcformulating (2.1) as follows, (:U-

b)d

(U -

b)d

+ - d ) = u c - bd + b(c + d ) = ad + bc, U(C

(2

. 2)

where we see t h a t strength reduction reduces the number of multipliers by one a t t h e expense of three additional adders. Typically, multiplications are more experisive than additions and hence we achieve an overall savings in hardware.

B Strength-reduced (SR) Architecture T h e SR1 architecture [3] is obtained by applying strength reduction transformation a t the algorithmic level. instead of a.t the multiply-add level described in t h e previous subsection. Starting with the complex LMS algorithm, assume t h a t the filter input is a complex signal X ( n ) given by X ( n ) = X , ( n ) + j X ; ( n ) , where X , ( n ) and Xi(.) are t,he real and the imaginary parts of the input signal vectlor X ( n ) . Furthermore, if the filter W ( n )is also complex ( W ( n )= c ( n )+jd(n)), then the complex LMS algorithm is given by, e ( n ) = d ( n ) - wH(n -I)x(~)

This research w a s supported by Analog Devices Inc. and liniversity of Illinois.

W(n) = W(n - 1)

2u2

+ pe*(n)X(n),

(2

'

3)

XI

ltl)

FR-BLOCK

: + -2 x,

2

__

WUDR-BLOCK ~...~._

WUDI-BLOCK

xr

-2

Xi

ln)

-,

FI-BLOCK

Figure 1: Low-power strength reduced adaptive filter architecture

Figure 2: Pipelined strength reduced adaptive filter architecture

where p is the step-size, d ( n )is t h e desired signal, e(.) is the error and W ( n ) is the coefficient vector. Also, e * ( n ) represents the complex conjugate of the signal e(.) and W H ( n ) represents the hermitian of W ( n ) . Traditionally, the icomplex LMS algorithm is implemented via t h e C C architecture, which are described by the following equations,

where X l ( n ) = X,(n)-Xi(n), c ~ ( n = ) c(n)+d(n), and d l ( n ) = c(n)-d(rz). Similarly, the W U D computation is described by,

y r ( n )= c T ( n- l)X,(n)

+dr(n

-

c ~ ( n=) C I ( R

+

( 2 . 4b)

+ p [ e , ( n ) x r ( n ) + et(n)xi(n)]

C Pipelined Strength-reduced (PIPSR) Ar-

( 2 . 4cj

d ( n ) = d ( n - 1)

+ p [er(n)Xi(n) +

-

( 2 . 6)

tecture (see Fig. l ) requires only 6 N multipliers and 8 N 3 adders. This is the reason why t h e SR architecture results in 2 1 - 25% power savings [3] over t h e CC architecture.

y z ( n ) = c T ( n- l ) X ; ( n j - d T ( n - l)X,(n),

c ( n - 1)

+ p [ e X ~ ( n+) e&(n)l + p[eXz(n) + e & ( n ) ] ,

1)

where e X l ( n ) = 2 e , ( n ) X i ( n ) , eXz(n) = 2e,(n)X,(n), eXs(n) = el(n)Xl(n), e l ( n ) = e , ( n ) - e , ( n ) , Xl(n) = Xr(n) - X;(n). It is easy t o show t h a t the SR archi-

l)Xi(n) ( 2 . 4a)

e(.)

-

d l ( n ) = d l ( n - 1)

chitecture

e,(n)X,(n)], ( 2 . 4d)

T h e dotted line in Fig. 1 indicates t h e critical p a t h of the SR architecture. As explained in [ 3 ] ,both t h e SR as well as CC architectures are bounded by a maximum possible clock rate due t h e computations in this critical path. This throughput limitation is eliminated via the application of the relaxed look-ahead transformation [7] t o the SR architecture (see (2.5-2.6)). T h e relaxed look-ahead transformation is an approximation of t h e look-ahead transformatzon [6] and it results in hardware efficient pipelined adaptive filter architectures. Application of relaxed look-ahead t o the SR architecture in (2.5-2.6) results in t h e following equations t h a t describe the F-block computations in the PIPSR architecture,

where e ( n ) = e , ( n ) j e , ( n ) and t h e F-block output is given by y ( n ) = y r ( n ) j y z ( n ) . Equations (2.4(a)-(b)) and (2.4(c)-(d)) define t h e computations in the F-block and the W U D - b l o c k , respectively. A direct-mapped implementation of (2.4) would require 81V multipliers and adders. We see t h a t (2.4(a)-(b)) has two complex multiplications (inner products) and hence can benefit from the application of strength reduction. Doing so results in the following equations, which describe t h e F-block computations of t h e SR architecture [ 3 ] ,

+

2l33

where D2 is the number of delays introduced before feeding the filter coefficients into t h e F-block. Similarly, t h e computation of t h e WUD block of the PIPSR architecture are given by, LA-1

\T$ Quadrature

*=O

e X s ( n - Dl - z)] LA-1

F i g u r e 3: T h e CAP t r a n s m i t t e r

t=O

e X s ( n - D1 - i)]

( 2 . 8)

where e X l ( n ) , exZ(.)and e X s ( n ) are defined in t h e previous subsection and Di 2 0 are t h e delays introduced into the error feedback loop. A block level implementation of the PIPSR architecture is shown in Fig. 2 where D1 and D2 delays will be employed t o pipeline the various operators such as adders and multipliers a t a fine-grain level. T h e high-throughput of the PIPSR architecture can be traded-off with supply voltage reduction resulting in additional power savings [.3] of 40 - 69%. Therefore, the PIPSR architecture results in 60 - 90% power savings as compared t o t h e serial CC architecture.

16-CAP

cq01

o;in

0100

I

.

-3

iii(a

0

1111

-

--

1--

I

-1

lion

1101

-1--

0201

0000 0

-

npio

I

I

1

3

1000

-3--

“021

0

ion1

l

a”

1010

0

1011

Figure 4: T h e 16-CAP signal constellation

3. Application t o 51.84 M b / s

transmit spectrum. T h e outputs of the filters are subtracted and the result is passed through a digital-toanalog (D,/A) converter, which is followed by an interpolating low-pass filter T h e output spectrum is braodband with a bandwidth of 25.92 M H z . T h e bit rate of 51.84 M b / s and 16-CAP signal constellation imply a symbol rate of 12.96 Mbaud. Hence, the chosen transmit spectrum has 100 % excess bandwidth.

ATM-LAN I n this section, we will study the performance of the proposed low-power adaptive filter architecture in a highspeed digital communication system. In particular, we will employ the proposed architecture as an adaptive equalizer in a CAP-QAM modulation scheme for a datar,ate of 51.84 M b / s over 100 meter of unshielded twistedpair(UTP-3) wiring. 16-CAP is currently the line-code of choice in this application [4]. While the standard does specify t h e line code to be 16-CAP, there is a flexibility in choosing the transmitter and receiver structure. For the purpose of demonstration of the low-power adaptive filter, we have assumed a C A P transmitter [4]and QAM receiver [5]. In addition to channel distortion, the received signal has near-end cross talk (NEXT) signal superimposed upon it. T h e K E X T impairment occurs due t o the physical proximity of wire pairs used for duplex mode operation. We present a brief description of t h e CAP transmitter [4] and QAM receiver [a, 51.

A

-

0Lll

B

The Q A M Receiver

T:he QAM receiver in Fig. 5, first demodulates t h e received signal (which is sampled a t 51.84 M s a m p l e s l s ) , such t h a t the output of the lowpass filters ( L P F ) has energy from DC t o 12.96 M H z . This allows us t o downsample the LPF output by a factor of two. T h e resulting complex signal can then be filtered via the adaptive equalizer, which can be implemented as t h e traditional CC architecture or the proposecl PIF’SR architecture. T h e equalizer output is sampled a t t h e symbol rate of 12.96 M H z , which is then passed through the slicer t o generate the detected symbols. T h e error across the slicer is employed t o adapt the equalizer coefficients once every symbol period. T h e detected symbols are also decoded t o generate the received bit-stream.

The CAP Transmitter

T h e block diagram of a digital C A P transmitter is shown in Fig. 3 . T h e bit stream is first passed through a scrambler. T h e scrambled bits are then fed into 16C A P encoder: which maps block of 4 bits into one of 115different complex symbols shown in Fig. 4. T h e real and imaginary symbol streams are processed by digital shaping filters. This requires t h e shaping filters t o be operated a t a sampling frequency f S , which is a t least twice the maximum frequency component of the

C Simulation Results A signal-to-noise rat,io ( S N R ) of 23.25 d B a t the slicer makes sure t h a t the bit-error probability with 16-CAP is less than lo-”. T h e values of step-size, fi employed in the simulations were deliberately made powers of two so t h a t the hardware implementation re-

2134

Figure

5: The QAM receiver

Figure 7: Convergence curves for the error across the slicer

Portormance of S t r e n g t h Reduced Pipeiined Next Equalizer(D1 0 2 L A >

-62

,

application. Thus, we conclude t h a t the proposed architecture is a viable alternative for Q A M based receivers especially in an ‘4TM-LAN environment It must be mentioned t h a t t h e low-power architecture presented in this paper is applicable t o any communication system which employs two dimensional signal constellations. While we have demonstrated the application of t h e proposed architecture for 51.84 M b / s A T M - L A N , numerous other applications exist. T h e finite precision analysis of the pipelined strength-reduced architecture was also carried out, and it was found t h a t the precision requirements of the low-power architectures are similar t o t h a t of t h e traditional cross-coupled architecture.

Figure 6: S N R , vs. s p e e d - u p

References

quires only shift-right operat,ions while implement,ing the multiplications by p in the weight-update ( W U D ) block. In general, we employed gear-shifting after the first 120,000 symbols and again after 240,000 symbols. Furthermore, the receive equalizer was chosen t o have a span of 32 symbol periods, which implies 128 complex coefficients in the adaptive equalizer. In Fig. 6, we plot the SNR,l,,,, with respect t o the speed-up, where the speed-up is defiried as the ralio of t h e TSRt o TPIPSR. It is clear t h a t SNR,i,,,, degradcs by less than 0.8 d B for speed-ups of upto 160. Speed-ups nptm lj0 or 60 maybe sufficient for most of t h e practical applications for which the performance loss is less than 0.27 dB. Thus, the proposed structure has substantial speed-ups with negligible performance loss. Clearly, t h e SNR,~,,,, is greater than 25dB (which means margin of approximately 2 dB) for most of the levels of pipelining.

[I] A . Chandrakasan et al., “Minimizing power using t r a m form a t ions ,” I E E E 7”n s . Co n t p .- A ided Design, vol. 14, no. 1, pp. 12-31, Jan. 1995. [2] R. D. Gitliri, J . F. Hayes, arid Communications Principles. 1992.

S.B. Wcinstien, Duta NY: Plenum Press,

N.R. Shanbhag, “Low-power adaptive filtpr architectures via st,rengt,h rrdiict,ion,” in Proceedings of International Symposium o n Low P o w e r Electronics and Design, (Monterey, C A ) , pp. 217220, August 1996.

[ 3 ] M. Goel and

[4] G. H. Im and J . J. Werner, “51.84 Mb/s 16-CAP ATM-LAN standard,” I E E E Journal on Selected Areas i n Communications, vol. 13, no. 4, pp. 620632, M a y 1995.

[ 5 ] E. A . Lee and D.G. Messerschmitt, Digital C o m m u nrcation. Boston, MA: Kluwer Academic Publishers,

increases to 20 dB within 4 i n 3 (iippnx&ridtely 50,000 symbols). This is iridicated in Fig. 7, where the convergence plot for SR and PIPSR (speed-up of 160) for 360,000 symbols is shown. It can be seen t h a t t h e degradation in the steady s t a t e SNR,l,,,, due t o t h e pipelining is less t h a n 1.0 dB. This is an attractive result, given t h a t the pipelined architecture enables power-savings of 60 - 90%. Also worth noting is fact t h a t the total convergence time is 28 m s . while a few hundred milliseconds is acceptable for this

For all cases, SNR,i,,,,

1994.

[6]

K.K . Parhi and D. G. Messerschmitt, “Pipeline interleaving and parallelism in recursive digital filters - part I1 : Pipelined incremental block filtering,” I E E E Trans. Acoust., Speech, and Signal Process., vol. 37, pp. 1118-1134, July 1989.

[7] N. R. Shanbhag and E(.I(.Parhi, PipelinedAdaptiue Digital Filters. Kliiwer Aca.demic Piiblishers, 1994.

2l35