An Efficient and Effective VLSI Architecture for a Wavelet-based ...

Comment

Report 1 Downloads 107 Views

An Efficient and Effective VLSI Architecture for a Wavelet-based Broadband Sonar Signal Detection System Sheng Cheng Department of Electrical Engineering University of Warwick, Coventry, UK

Chien-Hsun Tseng Department of Electrical Engineering University of Warwick, Coventry, UK

Abstract - A compact and fast hardware architecture using a digital approximation and current VLSI technology as a fieldprogrammable gate array (FPGA) structure has been developed for applications in broadband sonar signal detection. The VLSI implementation is sought to maximize the speed of a hybrid algorithm developed previously for the underwater active sonar echolocation system. It consists of two parts: Discrete Wavelet Transform (DWT) filtering bank, and Continuous Wavelet Transform (CWT) convolver. The DWT filtering bank is employed to minimize the unwanted sonar channel characteristics thus enhancing the signal received from the target. The output of the DWT denoising block is then processed by the CWT convolver in order to estimate target’s motion parameters. Simulation results obtained have clearly demonstrated the capability of the VLSI architecture in providing the very efficient and accurate solution in sonar signal detection.

I.

INTRODUCTION

An underwater active sonar echolocation system, which transmits and receives sonar waveforms, and then processes them for targets identification and localization is considered in this paper. Techniques used for signal processing in the echolocation system are commonly known as correlation processing. In wideband condition, these techniques evaluate the received signals in timescale domain by the cross correlation operation of the incoming signals with a replica of the known transmitted signal. Due to the equivalent structure between the correlation process and the CWT operation [1] in similarity measurement, the wideband detection process can be done in the CWT domain, which offers significant advantages on fine time and scale resolutions. A fast hybrid de-noising algorithm has been developed previously [2], which makes use of an intelligent fuzzy system called ANFIS for adaptive noise reduction (ANR) in combination with wavelet transform techniques for underwater target motion estimate (TME). The ANFIS initially appeared in [3] possesses extremely well features to track both the nonlinear and linear relations among signal. It, however, is suited for off-line implementation. In this paper we concentrate on the hardware implementation of TME part of the algorithm while ANR part has been described in detail elsewhere [2]. A compact and efficient hardware architecture using a digital approximation and VLSI technology in a form of FPGA structure has been developed to implement TME block. Because the TME is the core activity of the algorithm, a fast and accurate computation was the main consideration in its hardware implementation. The structure of the VLSI architecture consists of two parts: DWT filtering bank, and CWT convolver. The DWT filtering bank works together with

This project is supported by EPSRC (UK) and MOD (UK) joint project under grant no. GR/T 12262101.

Marina Cole Department of Electrical Engineering University of Warwick, Coventry, UK

octave subband decomposition and soft thresholding to further iteratively minimize unwanted channel characteristics enhancing the return signal contained in the output of the ANR operation. The output of the DWT filtering bank is then fed into a CWT convolver as a similarity measurement to identify and localize the targets’ motion parameters. VLSI architectures for 1-D and 2-D DWTs have been used in various applications, especially 2-D DWTs in image compression applications [4]. However, the idea presented in this paper, of designing a digital architecture for implementing the CWT directly is novel. To achieve the most area-saving design, the DWT denoising filtering bank is implemented using the recursive pyramid algorithm (RPA) [5], while the CWT implementation is based on FIR filtering structure and the basic theory behind this implementation is presented in the following section.

II.

WIDEBAND CORRELATION AND CWT OPERATION

Let us consider wideband signals in a single non-directional sonar channel for which the transmitted signal  (t ) assumed in a finite duration of L () at time t   propagates through the medium. In the presence of a target, this signal is returned to the receiver resulting in a received total signal being mathematically modeled as 2

g (t )  g (t )   (t )

(1)

Where  (t ) is an additive zero-mean uncorrelated background noise, and the noise-free echo g (t) at time t can be expressed as [1]

g (t )  S ( S (t  D)) which is a replica of g (t) being time-scaled by S and time-shifted by D due to the target's location and motion. In the correlation processing, the operation output of a wideband replica correlator for model in the form of (1) can be described mathematically by

WC g ( s,  )  g (t ) s (t   )dt  g , s , 



where  s (t ) 

s (st ) constituting the form of the hypothetical

signal scaled by s is served as a template (or basis function), and the inner product due to s , (t )   s (t  ) is known as the similarity measurement. If the incoming signal receives maximum output signal-to-noise-ratio (SNR) and  (t ) is an additive white Gaussian noise, the correlation process is optimum [6]. As the correlation is calculated in time over the entire signal for each given scale, the largest value of this dot product quantity occurs when the correlated signal is proportional (parallel) to the template one, i.e., they are completely alike but in a different scale. The detection problem due to the correlation process can now be described as the

maximization of the correlation function WC g ( s,  ) over both parameters simultaneously:

max{|| WC g ( s,  ) || } || WC g ( s 2

( s , )

*

,  ) || *

2

where s*   S ,   0 and  *  D. One of the principle drawbacks to the optimization process above is that it is computationally expensive with two decision variables involved, which is not suitable for real-time hardware applications. Provided the variable change s 1/ s and making use of the CWT, the aim of echolocation detection problem may be solved by seeking the local maximum of CWT coefficients:

max {|| CWT g ( s,  ) || } 2

(2)



s 0, 

Let  (t )   (t ) which yields

 s (t )   s (t )

, and adopting

g (t )  g (l ) for t  [l , l  1] , the CWT waveform in (2) is approximated as [3] CWT g ( s,  ) 

 g (  l )h(s, l ) l

where h(s, l )  s[ F ([(l  1) / s])  F ([l / s])] and F ([ x / s])  

[x / s]

 (t )dt  .



Consider the problem of (2) with support t [0, T ] and an effective support t  [0, L] (and hence l [0, sL] ) of  (t ). Applying filtering concepts, the discrete-time solution of the problem consists of breaking the time interval [0, T] into N=m×n subintervals, and approximating the input signal g  [ g1 , , gm ] with n samples in each signal segment gi , i  1, , m . The CWT coefficients obtained for the input signal g at the scale s are then represented by a bank of FIR filter output response with filter coefficients h( s , l ) , i.e., CWT g ( s )  y ( s )  [ y1 ( s ),

yi (s, k ) 

with

min{k , n 1}



l  max{0, k ( sL 1)}

, ym ( s )]   , N

h( s , l ) g i ( k  l ) ,

(3)

for k=0,…,n-1. As can be viewed from (3) that yi(s)=Өi h(s) sL L   where h( s )  [ h( s, 0), , h( s, sL  1)]   is a vector from of

Figure 1. (a)An m-channel,L-tapped delay line FIR filter (b)A single channel L-tapped delay line FIR

continuous nonlinear objective function of the parameter s . Combining golden section search and successive parabolic interpolation [7], the local maximizer s* is solved resulting in τ* in the time domain. The estimated target arrival time is then obtained by t f  t   * , which yields the estimate of scale factor in the wideband process Sˆ  t f / ti . Here ti ≡ t –τ0 is a reference time whose corresponding time delay τ0 =2R0/c is assumed to occur at the range R0 (at t=0) away from the receiver. Thus, the signed radial velocity of the target can be obtained by the following expression

v  c(1  sˆ) /(1  sˆ) .

We denote v  v for source motion moving forward close to the receiver. Likewise, v  v is denoted with moving backward of the target away from the receiver. These signs are made in accordance with the scale defined for the Doppler effects. As a result of (6), the radial position of the target due to linear motion is predicted in terms of an initial range R (t i ) and uniform radial velocity v by

R(t f )  R(ti )  v (t f  ti ) .

L  sL

filter impulse responses and Өi   is a input Sylvester matrix[1]. For a given sampling rate fsw of  (t ) , t   [0, L ] , h(s,l)

can be obtained with the index

l  [0, sL]  ( fsw  1)

(4)

sL

A tapped delay line FIR filtering structure for processing the input signal segments g i ( k ) , i=1,…m, is depicted in Fig. 1(b) where z -1 is the unit delay operator and L is the order of the filter. Due to the convolution and discrete time processes in the CWT implementation, the continuous time problem of (2) can be approximated by

III.

HARDWARE IMPLEMENTATION

In this section the hardware implementation of the TME operation described in the hybrid algorithm [2] is presented. The TME operation is composed of two functional blocks: the DWT noise-reduction block and the CWT convolution block. Registers are used to store the partial computation results for iterations within the DWT noise-reduction and the CWT convolution blocks. Fig.2 shows the block diagram of TME hardware implementation. In view of the diagram, the input signal is the output from the ANR

max{max s 0 || y ( s, k ) || }  max{max s  0  i 1 || y ( s, k ) || } (5) m

2

k

k

2

i

whose corresponding block diagram in parallel form is illustrated in Fig. 1(a) provided that the optimal scale s* is sought by the subproblem of which is a constrained optimization problem with

(6)

Figure 2. Block Diagram of TME Hardware Implementation

operation in which the total signal reflected from the target was prefiltered by the off-line ANFIS operation in the CWT domain, in order to minimize spurious returns such as background noise and reverberation from the ocean surface or bottom. The output from ANR is fed as an input into two data buffers sized n samples. The samples are then in turn fed into the CWT filters to extract the characteristics of target. The DWT noise-reduction block is then used to reduce the coefficients related to noise. Following the process of noise reduction, the signals are re-fed into the CWT filters for similarity measurement again. After several iterations this process boosts amplitude of the TME block output to its highest level so that the target information can be easily extracted. These iterations are carried on as needed bases and in general less than 10 is enough to achieve successful result. During processing the input samples are stored into two data buffers whereby while the computation is performed on the data already sotred in one buffer the remaining samples are being fed into the second buffer. Processing in this way, the input smples are divided into m blocks and the computational blocks can just deal with short samples at the time instead of the whole input signal. This ensures that a large storage area is not necessary for partial results during computational iterations.

A. CWT convolver By feeding input signals into a FIR filter (Fig.2) with coefficients which represent characteristics of dilated mother wavelet, the CWT convolution described in Fig.1 is achieved at the hardware level. The discrete-time CWT coefficients are computed for obtaining a local optimizer s* in order to capture significantly small similarity in the structure of the return signal. Combining golden section search and successive parabolic interpolation method, the local maximizer s* is sought in the range of 16 to 36. Following from (4), the number of optimal filter taps can be determined as well as its corresponding range of tap indices. For example, with an effective support [0, 8] of ψ (t), the maximum number of filter taps would reach 288. In this case it would not be hardware-efficient to use traditional multiplier-tree or parallel multiplier algorithm in the implementation. Therefore multiplier-reusing was taken into consideration. In the proposed sonar application a novel, extreme area-saving design algorithm is used. Only one multiplier and one adder are used to implement FIR filter with 272 taps. Fig. 4 shows the flow chart of the filter implementation. The filter coefficients are stored in a storage unit. Every cycle a data sample and a filter coefficient are read from corresponding storage units into multiplier. The output of multiplier is then loaded into the adder together with the result from last cycle. The partial result is then stored into the

Figure 4. DWT implementation architecture

register for next loop until all data and coefficients are processed. The working frequency of the FIR filter is higher than sample frequency, which provides the input data, so that it satisfies the word-serial model and operates correctly in real time. As shown by (3), only n-length outputs out of n+sL-1 are kept when feeding n samples into sL-length filter. Therefore the computational time can be reduced by skipping computations on unnecessary outputs. Furthermore, the number of multiplying and adding operations for every output is the minimum number between n and SL. In this design sL is larger than n and the computation time can be reduced by only performing n iterations in CWT filter for each output instead of traditional sL iterations. Also since the filter coefficients are symmetric, the size of the buffer which stores the coefficients can be halved thus reducing the hardware cost as well. These techniques described above are implemented in our design to achieve a fast and area-efficient implementation of the system.

B. DWT Denosing filter bank The output of CWT is fed into DWT Denoising block as shown in Fig. 2. The functionality of this block is to decompose input data using DWT and hence remove unwanted wavelet coefficients according to threshold value. The wavelet coefficients are then reconstructed into data again using the inverse discrete wavelet transform (IDWT). For DWT implementation a pair of FIR filters is used based on the pyramid algorithm developed by Mallat [4]: one low-pass filter for the scaling function-w(n) and one high-pass filter for wavelet function-h(n). Parallel filter architecture using RPA is used in building this DWT filter bank [5]. Only one computation unit and K (J-1) cells of storage are needed using RPA to compute N-point DWT, where K is the length of the wavelet filter and J is the number of octaves. The outputs of the first octave are computed every other cycle, and all higher octave outputs are computed between two first octave output computations. The main components of this DWT implementation are two parallel filters which compute the lowpass and highpass outputs and a data buffer of size K*J to store the inputs required to compute J octave outputs. For the simulation purposes, the wavelet used is Daubechies wavelet-DB20 and the level of decomposition is 5. Thus, K=20 and J=5 in this application. Fig. 3 gives the block diagram of the DWT implementation. The architecture of IDWT is almost the same as DWT implementation, sharing the same filter pair, except feeding these two filters with different wavelet coefficients.

IV.

Figure3. CWT filter architecture

SIMULATION RESULTS AND DISCUSSION

The system described in section III is integrated and synthesized on FPGA chip Xilinx Virtex II Pro XC2V30P using Xilinx ISE 8.2i and simulated in Modelsim XE 6.1e. In order to characterize performance of the system the transit 10ms long Gaussian weighted LFM signal [1] depicted in Fig. 5(a) is adopted as the transmitted signal. Fig. 5(b) illustrates a returned echo signal where the target signal is buried in the noise with SNR (signal-noise

format with 16-bit width. The following graphs in figure 6 represent output signal after the first, the second and the fourth TME iteration. It is clear that after each iteration the position of the target is more and more apparent; the target signal is extracted from the noise and its amplitude boosted afterwards. However, in the hardware only 16 bits are used to represent the coefficients, and the amplitude would eventually exceed the hardware limit. In order to prevent this, the output amplitude is deliberately decreased after every other iteration. In the graph shown above, the peak coefficient appears at the time 0.6809 s after only 4 iterations, which compares favorably with the theoretical result of 0.6794 s. By detecting the position of highest coefficient the motion parameters can be determined using the algorithm outlined in section II. From the simulation results we can conclude that the hardware system is able to work correctly in real time applications. We have estimated that by using only 25 CWT filters this system can cover the target’s speed range from -150 knots to +150 knots, which is more than enough for most sonar applications. Figure 5. (a)Transit signal; (b) Received signal; (c)Filter coefficients

ratio) of -20dB. This signal is sampled at 32 kHz in the maximum time range [0, 1] sec. The received signal is the actual input of the hybrid system where the preprocessing of this input is firstly performed using the ANFIS in the ANR operation. The output of the ANR part is then used at the input signal for our proposed hardware implementation. Referring to the implementation block diagram shown in Fig. 2, every 128 input samples within 256 sample blocks are fed into the hardware system one by one at the frequency of 32 kHz. Results in a set of filter coefficients shown in Fig. 5(c) and used for the CWT convolver implementation. In the fig. 6 the first signal is the input signal fed into the hardware system, which is the output of ANR with reduced SNR of -7.8539db. The input samples are represented in two’s complement

In this application only 1 multiplier and 1 adder is used for CWT filter. This novel design algorithm offers great advantage in terms of hardware cost compared to traditional multiplier-tree filter designs. However, the requirement of working frequency for filters is much stricter than traditional one. With the sample rate of 32 kHz; the maximum working frequency of the CWT filter will be 53MHz. By carefully mapping the instances on FPGA chip this speed is achieved in our design. However, if the sample rate is increased, more multipliers should be taken into consideration to keep timings of hardware working correctly. Due to the area-limit and speed limit of FPGA the tradeoff between area and speed should be carefully balanced.

V.

CONCLUSION

A VLSI architecture was developed to speedup the core process of a hybrid algorithm for underwater active sonar echolocation system. The architecture consists of a DWT decomposition filter bank and a CWT convolver. The DWT filter bank was performed to suppress the noise part from the output of ANR operation, while and the CWT convolver was employed to measure the target motion. Simulation results had clearly shown the full capacity and excellent performance of the VLSI architecture in estimating motion parameters of a moving target.

REFERENCES [1] [2]

[3]

[4] [5]

[6] Figure 6. Signals after TME Operation

[7]

G. Weiss, ``Wavelets and wideband correlation processing," IEEE Signal Processing Magazine, pp. 13-32, 1994. C. H. Tseng and M. Cole, “A hybrid algorithm based on neuron-fuzz and wavelet transforms for wideband sonar detection in a reverberation-limited environment”, schedued for publication in EUSIPCO 2007. J-S R. Jang, ``ANFIS: adaptive-network-based fuzzy inference systems," IEEE Tran. Sys. Man, Cybernetics, Vol. 23, pp. 665-685, 1993. Stephane Mallat, “A Wavelet Tour of Signal Processing”, Academic Press, 1997 Chaitali Chakrabarti,.and Mohan Vishwanath, “Efficient Realizations of the Discrete and Continuous Wavelet Transforms: From Single Chip Implementations to Mappings on SIMD Array Computers”, IEEE Transctions On Signal Processing, Vol. 43, No. 3, March 1995 H. Van Trees, Detection, Estimation, and Modulation Theory, Parts I, II, III, Wiley, New York, 1968. R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, New Jersey, 1973

Recommend Documents

An Area-Efficient Systolic Architecture for Real-Time VLSI Finite ...

Energy-efficient Hardware Architecture and VLSI Implementation of a ...

VLSI Architecture for the Low-Computation Cycle and Power-Efficient ...