piecewise volterra filters based on the threshold decomposition operator

PIECEWISE VOLTERRA FILTERS BASED ON THE THRESHOLD DECOMPOSITION OPERATOR Edwin A. Heredia Gonzalo R. Arce Department of Electrical Engineering University of Delaware Newark, Delaware 19716, USA e-mail [email protected] or [email protected]

ABSTRACT

In this paper we report our results concerning the study of multivariate functions of threshold-decomposed signals. In particular we show that multilinear tensor forms of the decomposed signal yield a class of lters that we propose to call Piecewise Volterra Filters (PWV). A lter can be viewed as a transformation of RN ! R, where N is the number of lter taps. PWV lters partition RN using a hyper-rectangular lattice, and assign a Volterra lter to each of the partition regions. At the partition boundaries continuity between the multivariate polynomials is preserved resulting in class C 0 piecewise polynomials. PWV lters constitute an ecient alternative for describing some systems rich in hard nonlinear structures, especially since parameter estimation remains a linear problem for PWV's.

1. INTRODUCTION

Nonlinear system description based on polynomial series (Volterra lters) is a well developed theory with a varied number of applications in signal processing [1, 6]. Several features make of this theory a solid conceptual framework, with the most important ones being perhaps: (i) parameter linearity, and (ii) the relation with higher-order spectral analysis. However, polynomial models have some well known drawbacks too. For example, smooth nonlinearities are easily modeled by a polynomial series of a few number of terms, but hard nonlinearities may require very high order polynomials. From a practical point of view, a system can have regions with linear, smooth nonlinear, and hard nonlinear features simultaneously. Therefore, the modeling these types of systems may become a dicult task for standard Volterra lters. For this reason, researchers have been studying alternative representations for nonlinear models with neural networks (NN's) and radial basis functions (RBF's) as the most popular ones [2]. For example, during the last three years alone, more than 15 papers have been published on neuralnetwork channel equalization, with all of them exploiting the universal approximator features of NN's and RBF's [2]. Despite their ability to cope with any type of nonlinearity, these methods are often impractical since they require nonlinear parameter search procedures which result in slow learning algorithms and a performance extremely dependent on initial conditions. Adaptive ltering for example,

becomes a much more dicult task. In this paper we introduce an alternative method for coping with hard nonlinearities while simultaneously maintaining the linearity of the parameter estimation procedure. We show that functionals of the threshold decomposition operator can be used to characterize continuous multivariate piecewise polynomial forms and therefore can be used to postulate piecewise Volterra lters (PWV). These new class of lters partition the input space into rectangular regions and assign a low-order Volterra lter equation to each of the regions. The continuity of the solution is enforced at the boundaries between regions. By choosing properly the partitions, low-order PWV lters can deal eciently with hard nonlinearities while simultaneously, the estimation of coecients of the localized polynomial expansion remains a linear problem.

2. THRESHOLD DECOMPOSITION

Threshold decomposition is an operator that was originally developed in the context of integer numbers for stack ltering [7]. In a previous paper we extended the threshold decomposition operator to the eld of real numbers and used it to develop piecewise linear lters [3] and approximation networks with adaptive piecewise polynomial kernels [4]. In this paper, we show that tensor operations on thresholddecomposed variables lead to a class of piecewise Volterra lters. Consider a set of S strictly increasing real numbers called the thresholds and denoted as  = f1 ; 2 ; . . . ; S g, also let 0 = ?1 and S+1 = 1. Then a real value x can be decomposed into a vector X = [X 0 ; X 1 ; . . . ; X S ]T using the threshold decomposition operator de ned as

X i = L[J (x) \ Ii ]

(1)

where Ii is the semi-open interval Ii = [i ; i+1 ). J (x) is the closed interval [0; x] if x  0, or [x; 0] if x < 0. The function L() is called the signed length function. Let  be the length of an arbitrary interval I, then the signed length function is de ned as  + [ f0g L(I) = ? II 22 R (2) R? Notice that when I has both, positive and negative elements, the signed length function is unde ned. The e ect

of the threshold decomposition operator de ned above is the segmentation of a value into multiple levels. When a sequence x(n) passes through this operator, the result is a multi-level signal as depicted in Fig. 1. Piecewise Volterra lters result from multichannel operations performed on the multilevel signal. λ3 λ2 λ1

is C 0 continuous piecewise bilinear. The functional ijk X i X j X k f (x0 ; x2 ; . . . ; xN ?1 ) = uvw (5) u v w is C 0 continuous piecewise trilinear, and so on for higher orders. In all these expressions, the integers i; j; k are segment indices, that is i; j; k = 0; 1; . . . ; S , while u; v; w are variable indices, that is u; v; w = 0; 1; . . . ; N ? 1, and ; ; ; . . . are tensor coecients 2. The functionals described in the previous proposition partition the input space RN into a set of hyper-rectangular regions Rq using a grid speci ed by the thresholds. For every region Rq , a multilinear (linear, bilinear, etc) form is constructed. At the boundaries between regions, the described functionals have C 0 continuity, that is the derivatives can be discontinuous but not the functional itself. Figure 2 shows an example of a piecewise-linear and a piecewise-bilinear surfaces de ned on R2 . The surfaces were obtained using the tensor functionals de ned above. Two thresholds were used and the coecients were selected arbitrarily. (a) Piecewise-linear component

Figure 1. An example of the threshold decomposition. The set of thresholds  is used to decompose the original real signal x(n) into a multi-component signal. Piecewise Volterra Filters result from multichannel operations performed on the components For the purpose of programming, the threshold decomposition algorithm can be speci ed in a table as follows. Given x and the threshold de ned intervals, X i is given by

Ii is posit. Ii is mixed Ii is negat. x 2 Ii x ? i x ; x  0 x > i+1 i+1 ? i i+1 0 x  i 0 ; ; x 2 Ii ; x x ? i+1 x < 0 x > i+1 ; ; 0 x  i 0 i i ? i+1 where ; denotes the empty set since that particular combination is not possible.

3. PIECEWISE POLYNOMIAL FUNCTIONS

The following propositions describe some analytic features of functions based on the threshold decomposition operator. Their proofs are sketched in the appendices. Proposition 1. Consider N real variables x0 ; x2 ; . . . ; xN ?1 and a unique threshold set  = f1 ; . . . ; S g. Then, each variable can be segmented into a vector of S + 1 elements using the threshold decomposition algorithm. Let the tensor Xui describe the i-th segment of the u-th variable, that is u = 0; 1; . . . N ? 1 and i = 0; 1; . . . ; S , then the multivariate functional f (x0 ; x2 ; . . . ; xN ?1 ) = vi Xvi (3) corresponds to a continuous piecewise linear functional on RN , the functional ij X i X j f (x0 ; x2 ; . . . ; xN ?1 ) = uv (4) u v

20 10 0 −10

10 5

−20 10

0 5

x

0

x

−5

2

−5

1

−10

−10

(b) Piecewise-bilinear component

100

50

0 10 −50

5

10

0 5 0

x1

−5 −5 −10

x 2

−10

Figure 2. (a) Piecewise-linear and (b) piecewisebilinear surfaces obtained using two thresholds and the tensor products described in proposition 1. uvw... be an arbitrary tensor coeProposition 2. Let ijk ... cient of the piecewise polynomial functionals de ned before. Then, in order to avoid redundancy, the piecewise multilinear coecients must satisfy the following properties:

uvw... is not necessarily zero only for index Property 1: ijk ... combinations such that i  j  . . ., and u  v  . . .. Property 2: For a xed variable u = v = w = . . . = q, the qqq... is not necessarily zero only for i = j = . . . 2. tensor ijk ...

4. PIECEWISE VOLTERRA FILTERS

Let s(n) be the input signal and y(n) be the ltered output. The general form for a nonlinear nonrecursive lter based on a window of N observations is given by

y(n) = f [s(n); s(n ? 1); . . . ; s(n ? N + 1)] (6) The functional f () is precisely the expression we intend to characterize. When f () is described by a polynomial series we have a Volterra lter. Consider a unique set of thresholds  and change the variables using xu = s(n ? u) for u = 0; 1; . . . ; N ? 1 Then the previous propositions can be used to construct a piecewise polynomial series expansion, and therefore, a piecewise Volterra lter can be represented as

a=

z1 =

1 (Xui )

(8)

Similar functions can be de ned for higher-order tensors, and thus b = 2 ( ijuv ) z2 = 2 (Xui Xvj ) (9) and so on for higher orders. Using the tensor-vector conversion functions () on the piecewise Volterra lter equation, and de ning explicitly the time n on the time-dependent terms zi , the piecewise Volterra lter equation can be transformed into y(n) = ZT (n) (10) where T = [aT ; bT ; . . .] and ZT (n) = [zT1 (n); zT2 (n); . . .]. From this equation it is evident that the coecients of the piecewise polynomial expansion remain linear and can be estimated using methods such as Wiener ltering (inverting the correlation matrix), Least Squares, or even time adaptive procedures such as LMS or RLS. In this paper we make use of the LS method based on the singular value decomposition expansion (SVD). Let Y = [y(0); y(1); . . . y(L ? 1)]T be the collection of L observed output samples, then de ne the matrix H as the one whose rows are given by ZT (n) for n = 0; 1; . . . L ? 1. The optimum coecient vector  is the solution to the overdetermined problem Y = H, which is

o = H+ Y

As an example of the potential of piecewise Volterra lters, consider the nonlinear system modeling problem illustrated in Fig. 3. The saturation device was in this case simulated using a sigmoidal function with equation y = f20=[1 + e?ax]g ? 10 where a is a constant that de nes the smoothness of the introduced nonlinearity. Small values of a (0:1  0:4) make the system soft nonlinear, while for larger values of a, the system nonlinearity becomes of the hard type. Figure 4 illustrates this behavior. w(n) x(n)

LINEAR SYSTEM

(11)

y(n)

SATURATION DEVICE

SYSTEM MODEL

(7)

subject to the restrictions described in proposition 2. For optimum lter design, the L-point input and output signals s(n) and y(n) are observed. Let 1 () be any function that rearranges the tensor iu into a column vector in such a way that the necessary zeros indicated in proposition 2 are avoided. Then, we have 1 ( ui )

5. APPLICATION EXAMPLE

e(n)

Figure 3. Block diagram describing the nonlinear system modeling problem 10

0.11

8

0.5

6

0.3

4

0.1

2 OUTPUT

uvw X i X j X k + . . . y(n) = ui Xui + ijuv Xui Xvj + ijk u v w

where H+ is the Moore-Penrose pseudo-inverse of the data matrix H and which is obtained eciently from its truncated SVD expansion. Once the vector  has been obtained, the tensor coecients can then be generated rearranging the components of  to its original tensor form.

0 −2 −4 −6 −8 −10 −10

−8

−6

−4

−2

0 INPUT

2

4

6

8

10

Figure 4. The class of sigmoid functions used to describe the saturation device when the constant changes from 0:1 to 1:1 at intervals of 0:2. The input signal x(n) consisted of 200 samples uniformly distributed in the interval [?10; 10]. The signal was then passed through a linear system with transfer function given by T (z) = 0:5 + 1:0z?1 and then saturated using the sigmoidal function. No external noise was included in the experiment to isolate the modeling error. The input and output signals x(n) and y(n) were then used to construct the following models: (i) linear, (ii) second order Volterra, (3) piecewise linear, and (4) a second order piecewise Volterra structure. For the selection of the thresholds, several preliminary trials with 1, 2, and 3 thresholds at di erent locations were performed and it was observed that 2 thresholds at ?2 and 2 were able to consistently minimize the AIC criterion. The

averaged sum of squared errors was calculated for each of these methods from the residual signal for di erent values of of the saturation constant. Figure 5 shows the values obtained. As predicted, for smooth nonlinear behavior, acceptable performance can be obtained for any of the methods, but under a harder nonlinearity, the piecewise Volterra approach provides a better option reducing the averaged sum of squared errors up to 30%. 14

n m

AVERAGED SUM OF SQUARED ERRORS

12

Linear model

which is a translated bilinear functional. For higher orders, the proof is an extension of the one presented here. 2

2nd order Volterra

10

Piecewise Linear Model

7. APPENDIX B: PROOF OF PROPOSITION 2

8

6

Piecewise Volterra

4

2

0 0

0.2

0.4 0.6 SATURATION CONSTANT

0.8

1

Figure 5. Estimated variance of the modeling error for di erent methods. As the constant increases, smooth nonlinearities turn into hard nonlinearities. From top to bottom: Linear, Volterra, Piecewise Linear, Piecewise Volterra. 6. APPENDIX A: PROOF OF PROPOSITION 1

In this section we present an overview of the proof for proposition 1. For a more detailed description the reader is referred to [5]. Consider the univariate function f (x) = i X i . According to equation 1 and the table of section 2, an explicit form for the threshold decomposition operator is



ak (x ? bk ) + ck x 2 Ik (12) Kk otherwise The coecients ak ; bk ; ck and Kk depend on each interval. Therefore, using the tensor summation convention, we have

L[J (x) \ Ii ] =

f (x) = i X i = ak (x ? bk ) + ck +

g (x) be the piecewise linear factors, from equation 12, Pthen ?1 a x + C and they can P be written as: g (x) = Nn=0 R;n n R ?1 b x + C 0 , where the subscript R has g (x) = Nn=0 R;n u R been introduced to explicitly indicate that the hyperplane coecients depend on the hypercube where x is located. After some algebraic manipulation, it is possible to show that the expansion of the product g (x) g (x) is equivalent to the expansion of XX Km;n (xm + dm )(xn + dn ) (14) b(x) =

S X i=0;i6=k

Ki

(13)

which is equivalent to having a linear function whose parameters change from interval to interval, and thus f (x) is piecewise linear on x. Let x represent the N -tuple (x0 ; x1 ; . . . ; xN ?1 ). The previous result can now be extended to the multivariate case. Recall that f1 (x = ui Xui = 0i X0i + 1i X1i +. . . + Ni ?1 X0N ?1 , and consequently, f1 (x) is piecewise linear since it is a combination of piecewise linear functions on each variable. Using the de nition for continuity, it is possible to show that the function L[J (x) \ Ii ] is continuous, and then f (x) and f1 (x) are also continuous. For the bilinear form, equation 4 can be factored as f2 (x) = (iuXui ) (vj Xvj ), that is, f2 (x) is the product of two continuous piecewise linear functions. Let g (x) and

The tensor product is commutative and thus products of the form Xui Xvj and Xvj Xui are essentially the same, and therefore one of the terms is redundant. One way to suppress this u;v;w;... imredundancy is to make the tensor coecient i;j;k; ... mune to index permutations, resulting in symmetric tensors. Another way is to force the \upper triangular" terms be zero, and thus, the only allowed combinations for the indices are: u = 0; . . . ; N ? 1, v = u; . . . ; N ? 1, w = v; . . . ; N ? 1, and so on. A similar rule is valid for the covariant indices. The second approach is the one that we have adopted here, and Property 1 is only another way of stating this rule. 2 With respect to Property 2, in reference [5] we proved that a canonical representation for a univariate piecewise polynomial function is given by: p(x) = ai (X i )[m] , where the exponent [m] indicates that each of the tensor elements is raised to the m-th power. If the variable indices are xed to the same number, that is if we make u = v = w = . . . = q then the term q;q;q;... X i X j X k . . . i;j;k; (15) ... q q q must be equivalent to p(x), therefore, only the main diagonal must have elements that are not necessarily zero. Any term o the diagonal must be zero, which is the redundancy constraint given as Property 2. 2

REFERENCES

[1] J. S. Bendat, Nonlinear System Identi cation From Random Data, Wiley-Interscience, New York, 1990. [2] S. Haykin, Neural Networks, A Comprehensive Foundation, Macmillan, New York, 1994. [3] E. A. Heredia and G. R. Arce, \Piecewise Linear System Modeling Based On Continuous Threshold Decomposition", To appear in the IEEE Trans. on Sig. Process., 1996. [4] E. A. Heredia and G. R. Arce, \Shape-Adaptive Networks Based on Piecewise Polynomial Kernels", Submitted to the IEEE Trans. on Neural Networks, 1995. [5] E. A. Heredia, A Piecewise Polynomial Theory for Nonlinear Signal Processing, PhD Thesis, University of Delaware, Newark, DE, USA, 1996. [6] V. J. Mathews, \Adaptive Polynomial Filters", IEEE Signal Processing Magazine, pp 10-26, July 1991. [7] P. D. Wendt, E. J. Coyle, and N. C. Gallagher, \Stack lters", IEEE Trans. Acoust., Speech, Signal Processing, Vol. 34, No. 4, pp. 898-911, August 1986.