Parallel Viterbi algorithm implementation: breaking the ... - IEEE Xplore

Report 3 Downloads 44 Views
IEEE TRANSACTIONS ON COMMUNICATIONS,

VOL. 31, NO.

785

8, AUGUST 1989

Parallel Viterbi Algorithm Implementation: Breaking the ACS-Bottleneck Abstract-The central unit of a Viterbi decoder is a data-dependent feedback loop which performs an add-compare-select (ACS) operation. This nonlinear recursion is the only bottleneck for a high-speed parallel implementation. This paper presents a solution to implement the Viterbi algorithm by parallel hardware for high data rates. For a fixed processing speed of given hardware it allows a linear speedup in the throughput rate by a linear increase in hardware complexity. A systolic array implementation is presented. The method described here is based on the underlying finite state feature. Thus it is possible to transfer this method to other types of algorithms which contain a data-dependent feedback loop and have a finite state property.

I. INTRODUCTION 0 boost the achievable throughput rate of an implementation of an algorithm, parallel and/or pipelined architectures can be used. For high-speed implementations of an algorithm, architectures are desired that at maximum lead to a linear increase in hardware complexity for a linear speedup in the throughput rate if the limit of the computational speed of the hardware is reached. An architecture that achieves this linear dependency is referred to as a linear scale solution. It can be derived for a number of algorithms such as those of the plain feedforward type. Also, for algorithms containing linear feedback loops a linear scale solution can be found [ 11. However, a linear scale solution has not yet been achieved for algorithms containing a data-dependent decision feedback. An algorithm of the latter type is the Viterbi algorithm (VA), which is related to dynamic programming

T

estimation of a finite state discrete-time Markov process where the optimality can be achieved by criteria such as maximumlikelihood or maximum-a posteriori. For a tutorial on the VA see [4]. Below, the VA is explained only briefly to introduce the notation used. The underlying discrete-time Markov process has a number of N,states z, . At time ( n + 1) T a transition takes place from the state of time n T to the new state of time ( n + 1) T. The transitions are independent (Markov process) and occur on a memoryless (noisy) channel. I The transition dynamics can be described by a trellis diagram, see Fig. 1. Note that parallel transition branches can also exist, as in Fig. 1 from z1 + z,. To simplify the notation, we assume T = 1 and the transition probabilities to be time invariant. The VA estimates (reconstructs) the path the Markov process has taken through the trellis recursively (sequence estimation). At each new time instant n and for every state the VA calculates the optimum path which leads to that state, and discards all other paths already at time as nonoptimal. This is accomplished by summing a probability measure called state metric rn,,,for each state z, at every time instant n. At the next time instant n + 1, depending on the newly observed transition, a transition metric Xn,zk+zr is calculated for all possible transition branches of the trellis. The algorithm for obtaining the updated rn+l,z, can be described in the following way. It is called the add-compareselect (ACS) unit of the VA. For each state, z, and all its predecessor states z k choose that path as optimum according to the following decision:

rn+I,zi := maximum (rnSzk + Xn,zk+zi).

PI.

In this paper, a linear scale solution (architecture) is presented which allows the implementation of the VA despite the fact that the VA contains a data-dependent decision feedback loop. In Section I1 the VA and its application are described. Section I11 introduces the new method which achieves the linear scale solution. The add-compare-select (ACS) unit of the VA can be implemented with this new method as a systolic array as shown in Section IV. Investigations concerning the implementation of the survivor memory are found in Section V. Conclusions form the contents of the summarizing Section VI. 11. PROBLEM DEFINITION In 1967, the VA was presented as a method of decoding convolutional codes [3]. In the meantime it has been proven to be a solution for a variety of digital estimation problems. The VA is an efficient realization of optimum sequence Paper approved by the Editor for Coding Theory and Applications of the IEEE Communications Society. Manuscript received August 12, 1987; revised May 1, 1988. This paper was presented in part at ICC'88, Philadelphia, PA, June 12-15, 1988. The authors are with Aachen University of Technology, Templergraben 55, 5100 Aachen, West Germany. IEEE Log Number 89291 11.

(all possible Zk'Z;)

The surviving path has to be updated for each state and has to be stored in an additional memory called survivor memory. For a sufficiently large number of observed transitions (survivor depth B) it is highly probable that all N,paths merge when they are followed back. Hence, the number B of transitions which have to be stored as the path leading to each state is finite, which allows the estimated transition of time instant n - B to be determined. Note, when parallel branches [(a) and ( b ) ]exist, one can find the maximum of their transition metrics before the ACS procedure is performed, since

Therefore, the notation used here assumes that the maximum metric of each set of parallel branches is found prior to the ACS operation being performed. It is the one referred to as Xll,Zj'Zk:

An implementation of the VA, called the Viterbi decoder

' For certain problems the VA has proven to be an efficient solution even for channels with memory (intersymbol interference [ 5 ] , [ 6 ] ) .

0090-6778/89/08OO-0785$01 .OO 0 1989 IEEE

786

IEEE TRANSACTIONS ON COMMUNICATIONS,

= - * : ... :

VOL. 37, NO. 8, AUGUST 1989

...

original 1-step t r elli s -

z3

nT

...

-

ln+llT

+J-y-t+-k

Input

transition metric

Fig. 2.

metrics

...

M-steptrellis I M = 31

--

Fig. 3.

r

survwor

2, 2,

time

Fig. 1. General example of a trellis. feed-back of updated s t a t e

21

output

memory

Pipeline structure of a Viterbi decoder.

(VD), can be modeled as shown in Fig. 2 where it is broken up into its three basic pipelined components; the computation unit of the transition metrics, the add-compare-select (ACS) unit and the survivor memory.

111. THEHIGH-SPEED PARALLEL IMPLEMENTATION OF THE VITERBI ALGORITHM A high speed implementation of the VA can only be achieved by increasing the speed of computation of all its three units. This can be done conventionally for the transition metric unit, as it is of simple feedforward type which can easily be implemented with parallel and pipelined architecture, thus allowing a linear scale solution. The survivor memory follows, as a slave unit, the decision feedback of the predecessor master unit (ACS). Since the ACS unit is much more complex, it is the bottleneck which limits the throughput rate. Consequently, we omit the survivor memory in the discussion below, but discuss it later in Section V. The ACS procedure has to be computed independently for each individual state. Thus, a parallel number of N,computation cells, called ACS cells, can be implemented in the ACS unit to perform the ACS operation separately for every state (then called shuffle exchange ACS unit). When requiring a very high decoding speed this approach is limited by the maximum achievable computation speed of an ACS cell [7]. Since the ACS feedback loop contains data-dependent multiplexing, it seems that the above-mentioned approach provides the maximum parallelism that can be achieved for the ACS unit. The nonlinear ACS feedback loop does not allow any linear algebraic methods to be used to obtain highly parallel/ pipelined architecture as is done for linear feedback loops, e.g., in [l]. However, a linear scale solution can be found. A . Introducing the M-Step Trellis The underlying Markov process is time discrete with rate 1 / T, i.e., transitions take place in intervals of length T between nT and ( n + 1) T. Consequently, the trellis describing the process is also time discrete with rate 1/T. However, the same Markov process can also be viewed upon in time intervals of length MT, i.e., when observing the transitions from n M T to ( n + 1)MT. Thus, the trellis describing the process by this lower rate is time discrete with rate I / ( M T ) . Since this trellis combines M transitions of the original trellis we refer to it as the M step trellis (Ms trellis, Ms transition, Ms-VA etc ... .), while the original trellis from now on is referred to as the 1-

Principle of the M-step trellis shown for a simple example ( M = 3).

step trellis (1s trellis, 1s-VA etc ... .). An illustration for a simple example with M = 3 is given in Fig. 3. Now the MS trellis can be used for Viterbi decoding the same process, allowing the Ms-ACS loop to be computed M times slower. But the number of transition branches in the Ms trellis increases exponentially as M increases linearly. Also states are connected by transition branches which were not connected in the 1s trellis.

B. Linear Scale Solution As was mentioned in Section II, to achieve a fast Ms-ACS unit parallel branches of the trellis should be eliminated prior to the Ms-ACS unit using them. This has to be done to simplify the Ms-ACS procedure as far as possible and to minimize the time required by the comparison and feedback. It is the actual part where the exponential increase in implementation effort arises in the Ms trellis. However, the search for the optimum Ms transition branch of each set of parallel branches can be achieved by the VA using the 1s trellis. This can be explained as follows. Let us calculate the Ms transition metrics for the simple example shown in Fig. 3, e.g., from state zI to all N, = 3 states zl, zz, and 2 3 . The exponential increase in branches of the Ms trellis is illustrated by the rooted tree shown in Fig. 4. However, this rooted tree can be redrawn as a trellis as shown in Fig. 5 which we refer to as a rooted 1s trellis in contrast to a 1s trellis as shown in Fig. 3. Hence, to estimate the optimum transition metric of the Ms trellis from state zI at time n T to all three states at time ( n + M ) T, the VA can be used based on this rooted 1s trellis. This has to be done for all states (see Fig. 5 ) . In general, the maximum transition metric out of every set of parallel branches can be computed and selected by applying the VA to decode each (of N,)rooted 1s trellis. The most important aspect of this approach, which allows the breaking of the ACS feedback bottleneck, is that the length (number of transitions or steps) of each rooted 1s trellis equals M. Thus, the computational complexity of each 1s-VA is asymptotically linearly dependent on M . Furthermore, the 1s-VA’s are independent of each other and independent of the state metrics r of the Ms-VA. They can therefore be computed with pipelined and/or parallel Is-VD’s. Note that the parallel rooted 1s trellises, as e.g., shown in Fig. 5, use the transition metrics based on the original trellis. Hence, the irnplementation of the transition metric unit is not affected by the parallel ACS implementation.

C . Linear Scale Solution: Verification We recall that the bottleneck in a high speed implementation of the VA is the ACS unit, containing a maximum number of N,ACS cells in its parallel version (shuffle exchange ACS unit). Thus, for a discussion of time and hardware scaling we need to introduce the two total cycle times of the ACS feedback loops; first for the Ms-ACS unit of the Ms-VD: 7 , and second for the 1s-ACS unit of the Is-VD: 8 . The Ms trellis in general has a much higher connectivity, i.e., many more states are connected by transition branches in the Ms

787

FETTWEIS AND MEYR: PARALLEL VITERBI ALGORITHM IMPLEMENTATION

Fig. 4. Rooted tree of Ms-transitions leaving state example of Fig. 3.

2,

for M = 3 of the

Fig. 6 . Timing diagram of the decoding cycles of the Ms-VD and the IsACS units given the time scale of the Ms transitions.

1s- ACS units

transition metric

2,

z,

WS

- ACS memory

rooted 1s -trelL$s O f 2,

Fig. 7. Multiplexed structure of a Ms/ls-VD implementation.

‘2

23

Fig. 5.

2,

rooted 1s-trellis of 23

The N, rooted Is trellisses of the example of Fig. 3 (M = 3).

trellis than in the 1s trellis. For this reason the minimum achievable T in general is greater than 8. Another time parameter which has already been introduced is the rate of transition 1 / T of the underlying Markov process. Thus, given T and T this directly implies the minimum M of the M s trellis by 7

Mz-. T

(1)

Since the 1s-VA is based on a rooted trellis, the first step of the 1s-VA is simply a “load” operation where transition metrics are loaded as state metrics. And because each 1s-VA is computed over a limited interval of M transitions only, (at maximum) a number of M - 1 add-compare-select operations have to be performed by each 1s-VD while the “missing” M t h ACS operation is inherently performed by the Ms-VD2. Thus each 1s-ACS unit needs the finite time (M - l ) e to perform one 1s-VA, and therefore can be time-multiplexed to perform the 1s-VA’s, see Fig. 6. Then the number L of lsACS units needed for each (of N,) rooted 1s-trellis is given by

During the time interval LT the 1s-ACS unit has to have finished a complete computation over M transitions of a 1s trellis to be multiplexed to its next 1s trellis (see Fig. 6). In other words, each 1s-ACS unit carries out a 1s-VA which is based on a rooted 1s trellis of each Lth Ms transition. The computations which are carried out on the LN, 1s-ACS units have to be synchronized in such a way that their outputs (ready after every M - 1 1s-ACS operations) form the sequence of transition metrics which is needed for the Ms-VD. The resulting block diagram of the parallel VD is given in Fig. 7. Equation (1) implies that for a given 7 the minimum M depends linearly on the rate 1 / T required, but M does not influence the amount of ACS-hardware needed. The factor Note that one ACS operation comprises up to N, ACS computations, one for each state.

that indicates the complexity of hardware is given by L (LN, 1s-ACS units), and (2) shows a linear dependency of L on the rate 1 / T for a given 0. Therefore, an implementation of the ACS procedure of the VA is found which is a linear scale solution. Note that for a given T the achievable T only implies the M needed, but the influence of M on the required L given in (2) is negligible. Therefore, this linear scale solution is independent of the Ms-VD, i.e., for a desired speedup of 1 / T only additional 1s-ACS cells are necessary and no additional Ms-VD hardware is required. The complexity of the implementation only depends on 8 (leading to the required L ) . Thus, the Ms-VD can be computed with a long cycle time T without influencing the amount of L fold ACS-hardware required. This is a very important result since the Ms-trellis in general has a much greater connectivity and therefore the MsACS unit has many more additions and comparisons to perform than the 1s:ACS unit. The parallel VD implementation, which we refer to as Ms/ 1s-VD, requires an additional multiplicity factor of 1s-ACS units by the number of states N, and the speedup L . Now, each 1s-ACS unit in its fully parallel shuffle exchange implementation comprises N, ACS cells. Thus, the linear scale solution presented here is linear assuming a given trellis, i.e., a given Markov process, but depends on the number of states at least by O ( N z 2 ) .However, various implementation architectures can be chosen for a 1s-ACS unit [8]. Each is characterized by its complexity A and its decoding cycle time 8 . When a Ms/ 1s-VD is composed of such 1s-ACS units a speedup by L leads to a complexity C proportional to

C-ALN,. By (2) the speedup L is proportional to BIT ( L which combined with (3) yields the (complexity) time) - measure

CT -A8Nz.

X

(3) BIT), (cycle-

(4)

This states that the additional multiplicity factor by N, also arises in the CT measure of the parallel Ms/ls-VD when compared to the AB measure [8] of a corresponding single implementation. We mention the fact that another new linear scale solution for the VD with proportionality between CT and AB is outlined in [9], [ 101. ARRAYIMPLEMENTATION IV . SYSTOLIC The newly derived parallel VD can easily be implemented by a simple multiplexed structure as shown in Fig. 7. The ls-

IEEE TRANSACTIONS ON COMMUNICATIONS,

VOL. 37,

NO. 8, AUGUST 1989

NI= 3 columns o f 1s - ACS units

L = M - 1 rows

explanotion :

input of s t a t e metrics 1-step-ACS

unit

updated s t a t e metrics

Fig. 8.

Systolic array solution of the Ms/ls-VD, clocked at time instances (rate 1/8, / is the time index). Here, X, is the complete set of Is transition metrics of time instant n. /8-/7

ACS cells have to be clocked in a way that the Ms-VD receives their results in the correct time slots which yields that the rate of multiplexing has to be equal the clock rate of the Ms-VD 1 / = ~ l / ( M T ) . Now, for 8 = 7 = MT this leads to an overall synchronous system, in which at each time instant IT (I as time index) the computation of a set of N,parallel l s VA’s is started and the same number of 1s-VA’s are completed. Therefore, instead of implementing a set of parallel multiplexed 1s-ACS units one can also implement a pipeline structure of these 1s-ACS units. The pipeline has the length M - 1 (8 = T = M T =. L = M - 1) and the computation of each 1s-VA is pipelined through this implementation. At the end of this pipeline the results of the last iteration of the 1s-VA’s can simply be fed to the Ms-ACS unit. This is shown in Fig. 8 for an example with Nz= 3, M = 9, and T = 8 = M T ( * L = M - 1). The systolic array is clocked at the time instants IT which results in a throughput rate of 1 / T = M/8 = ( L + 1)/8. Each column of the array computes the (M - 1) fold ACS procedure based on one rooted 1 s trellis. Therefore, a parallel number of N,columns has to be implemented. As a result this systolic array implementation consists of a number of N, independent parallel columns, each made up of cells which communicate only in the top-down direction. Since the input (transition metrics) to all ACS units of one row is the same, only one conventional 1 s transition metric unit has to be implemented for each row. T o minimize the interconnection wiring between the rows of each column of the array the methods presented in [11]-[13] and/or of the cascade processor presented in [8] can be applied (here as a pure feedforward implementation). The systolic array can be transferred to a wavefront array solution

c

E I M - step - V D

Fig. 9. N, fold pipelined and interleaved systolic Ms/ls-VD. Here, X, is the complete set of 1s transition metrics of time instant n. The sets of A i are fed in P = N, times in a row. Therefore, the index I is incremented every N, clock cycles. The array is clocked by rate P / 8 = N,/O.

which can be easier to clock in case of a large array (clock skew). For any implementation (systolic/wavefront array or multiplexed version) the 1s-ACS units can be divided into a set of P pipelined (latched) parts, e.g., for P = 3 into three parts with part 1: add, part 2: compare and part 3: select. Therefore, depending on the number P of pipelined parts, P 1s-VA’s can be interleaved in one ACS unit. An especially interesting pipelined architecture can be derived for the systolic array solution of Fig. 8, since the whore 1s-array is of simple feedforward structure. If one column is pipelined by P = N,, then the processing performed by all N, columns can be pipeline interleaved [15] in this one column, see Fig. 9. Hence, this new array is clocked at rate P/8 = N7/0. The main advantage of this pipelined systolic array is the better exploitation of processing hardware and the reduced amount of wiring required. The wiring is reduced in particular between the 1s-array and the Ms-VD. Here the simple array supplies the Ms-VD in parallel with N:Ms-transition metrics (equal to 1s-state metrics) where the pipelined array supplies the MsVD serially N7times in a row with Nz metrics. This allows the Ms-VD to carry out a serial processing of its ACS procedure which is another major advantage of the pipelined array. V. SURVIVOR MEMORY By introducing the Ms/ 1s approach a linear scale solution was presented for the ACS unit of a parallel high-speed Ms/ 1s-VD. Also a linear scale solution can easily be found for the transition metric unit. However, such a linear scale solution cannot be found for the total survivor memory needed. The size of the survivor memory of each 1s-VD is linearly

789

FETTWEIS AND MEYR: PARALLEL VITERBI ALGORITHM IMPLEMENTATION

. first hierarchy M, -step -trellis

origin a 1 t r e I1is 1 - step - t r e l l i s Fig. 10. Schematic view of the hierarchical order of trellisses.

dependent on M (the length of each rooted 1 s trellis). Since LN, 1s-VD’s are implemented, the total size of the survivor memory of the 1s-VD’s is a linear function of MLN,. A speedup of the transition rate 1/T by a factor b leads to an increase in L and M by a factor b ( l ) , (2). Therefore, the speedup results in an increase of required memory by b2which does not lead to a linear scale solution. However, one possible solution is given as follows. Since the 1s-VA is carried out only over a limited 1s trellis which consists of M 1 s transitions, the decisions of the 1s-ACS units can simply be stored in RAM’S. Then, according to the decision of the MsVD, only the optimum path has to be decoded by reading the contents of the RAM (e.g., with the low rate l/O) and tracing back the path wanted. Thus, because the survivor memory can be implemented with RAM, its realization is not a bottleneck. Another possible solution is not to implement a 1s survivor memory at all for the Nz parallel 1s-VD’s, but to store the 1s transition metrics in a RAM. Then, after the corresponding Ms transition has been decoded by the Ms-VD its beginning and ending states are known (coarse grain decoded). Therefore, a simple additional 1s-ACS structure can be implemented to decode the fine grain 1s transitions of the correct Ms transition (with the help of its stored 1s metrics). Note that the here required RAM space for the 1 s transitions again depends on ML and therefore increases by b 2 . As was pointed out in Section I1 the survivor only has to be implemented for a finite depth B . Since the Ms trellis always unites a set of M 1 s transitions to one Ms transition the survivor depth of the Ms-VD decreases when M is increased. For M > B it then takes on the minimum value of t w o M steps. Thus the survivor memory of the Ms-VD does not lead to any implementation problems. VI. CONCLUSIONS The presented method of implementing the VA allows the use of hardware with a limited processing speed to achieve a very high throughput rate, i.e., rate of decoding desired. It is a linear scale solution. The approach presented here is based on the principal idea of introducing two hierarchies of trellises. However, in general this can also be extended to additional hierarchies, see Fig. 10. Thus a whole variety of VD-systems can be developed. However, in most cases this leads to a larger hardware complexity. The compare-select feedback procedure based on a finite state process is not limited to the VA. Generally, it is a wellknown element of dynamic programming. Thus, the method described here for the special case of dynamic programming, the VA, may also be a solution or be of help in finding new high-speed implementations of related algorithms. To show that the method described is of practical interest a view on our design is given here; we examine the VLSI

implementation of a VD with the help of 1.5 pm CMOS standard and macrocell ASIC’s 1141. One 6-bit ACS cell as a standard cell block takes up about 0.4mm2 chip area and operates at 20 MHz. For N,= 4 and a speedup by a factor of BIT = 8 to achieve 120 MHz baud rate requires L = 6. This leads to a number of 96 1s-ACS units, which yields a chiparea of approximately 40 mm2 (shuffle exchange 1s-ACS unit). With the help of pipelining and interleaving the number of ACS units and thus the chip-area can be reduced (e.g., to half). Even when considering an on-chip overhead, this example clearly shows the practicability of the method described in this paper. REFERENCES M. Bellanger et al., “TDM/FDM transmultiplexer: Digital polyphase and FIT,” ZEEE Trans. Commun,, vol. COM-23, pp. 1199-1205, Sept. 1974. R. E. Bellman and S. E. Dreyfus, Applied Dynamic Programming. Princeton NJ: Princeton University Press, 1962. A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” ZEEE Trans. Inform. Theory, vol. IT-13, pp. 260-269, Apr. 1967. G. D. Forney, “The Viterbi algorithm,” Proc. ZEEE, vol. 61, pp. 268-278, Mar. 1973. -, “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” ZEEE Trans. Inform. Theory, vol. IT-18, May 1972, pp. 363-378. G. Ungerback, “Adaptive maximum-likelihood receiver for carriermodulated data-transmission systems,’’ ZEEE Trans. Commun., vol. COM-22, pp. 624-636, May 1974. J. Snyder, “High speed Viterbi decoding of high rate codes,’’ in Proc. 7th ZCDSC, Phoenix, AZ, 1983, Conf. Rec. pp. XII16-XII23. P. G. Gulak and T. Kailath, “Locally connected VLSI architectures for the Viterbi algorithm,” ZEEE J. Select. Areas Commun., vol. SAC6, pp. 527-537, 1988. G. Fettweis and H. Meyr, “A modular variable speed Viterbi decoding implementation for high data rates,” Conf. Rec. EUSZPCO-88, Grenoble France, Sept. 1988. G. Fettweis and H. Meyr, “Verfahren zur ausfuhrung des Viterbi algorithmus mit hilfe parallelverarbeitender strukturen,” German Pat. pend. No. P37 21 884.0, July 1987. H. Burkhardt and L. C. Barbosa, “Contributions to the application of the Viterbi algorithm,” ZEEE Trans. Inform. Theory, vol. IT-3 1, pp. 626-634, Sept. 1985. D. J. Coggins, D. J. Skellern, and B. S. Vucetic, “A partitioning scheme based on state relabelling for an 8Mbps single chip Viterbi decoder,” in Proc. 10th SZTA, Enoshima Island Japan, Nov. 1987, vol. 2, ED2-1, pp. 643-648. C. M. Rader, “Memory management in a Viterbi decoder,” ZEEE Trans. Commun., vol. COM-29, pp. 1399-1401, Sept. 1981. E. Horbst, C. Muller-Schloer, and H. Schwartzel, Design of VLsz Circuits. Based on VENUS. New York: Springer-Verlag, 1987. K. K. Parhi, D. G. Messerschmitt, “Concurrent cellular VLSI adaptive filter architectures,” ZEEE Trans. Circuits Syst., vol. CAS-34, pp. 1141-1151. Oct. 1987.

Gerhard Fettweis (S’84) was born in Wilrijk (Antwerpen), Belgium, on March 16, 1962. He received the Dip1.-Ing. degree in electrical engineering from the Aachen University of Technology in 1986. During 1986 he was with the communications group of the Brown Boveri Corporation research center, Baden, Switzerland to work on his diploma thesis. He is currently working towards the Ph.D. degree in electrical engineering at the Aachen University of Technology, West Germany. His interests are in dieital communications, especiallv the interaction between algorithm and architecture for high-speed parailel VLSI implementations. ~~

790 Heinrich Meyr (M’75-SM’83-F’86) received the Dip1.-Ing. and Ph.D. degrees from the Swiss Federal Institute of Technology (ETH), Zurich, in 1967 and 1973, respectively. From 1968 to 1970 he held’research positions at Brown Boveri Corporation, Zurich, and the Swiss Federal Institute for Reactor Research. From 1970 to the summer of 1977 he was with Hasler Research Laboratory, Bern, Switzerland. His last position at Hasler was Manager of the Research Department. During 1974 he was a Visiting Assistant Professor with the Department of Electrical Engineering, University of Southern California, Los Angeles. Since the summer of 1977 he has been Professor of

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 37, NO. 8, AUGUST 1989

Electrical Engineering at the Aachen University of Technology (RWTH), Aachen West Germany. His research focuses on synchronization, digital signal processing, and in particular, on algorithms and architectures suitable for VLSl implementation. In this area he is frequently in demand as a consultant to industrial concerns. He has published work in various fields and journals and holds over a dozen patents. Dr. Meyr served as a Vice Chairman for the 1978 IEEE Zurich Seminar and as an International Chairman for the 1980 National Telecommunications Conference, Houston, TX. He served as Associate Editor for the IEEE TRANSACTIONSON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING from 1982 to 1985, and as Vice President for International Affairs of the IEEE Communications Society.