Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Bartlomiej Beliczynski Warsaw University of Technology, Institute of Control and Industrial Electronics, ul. Koszykowa 75, 00-662 Warszawa, Poland
[email protected] Abstract. In this paper an approximation of multivariable functions by Hermite basis is presented and discussed. Considered here basis is constructed as a product of one-variable Hermite functions with adjustable scaling parameters. The approximation is calculated via hybrid method, the expansion coefficients by using an explicit, non-search formulae, and scaling parameters are determined via a search algorithm. A set of excessive number of Hermite functions is initially calculated. To constitute the approximation basis only those functions are taken which ensure the fastest error decrease down to a desired level. Working examples are presented, demonstrating a very good generalization property of this method. Keywords: Function approximation, Neural networks, Orthonormal basis.
1
Introduction
Thanks their elegance and usefulness, for many years Hermite polynomials and Hermite functions have been attractive in various fields of science and engineering. In quantum mechanics of harmonic oscillators, ultra high band telecommunication channels, ECG data compression and various sorts of approximation tasks they proved to be useful tools. A set of Hermite functions forming an orthonormal basis is naturally suitable for approximation, classification and data compression tasks. These basis functions are defined over the real numbers set R and they can be recursively calculated. The approximating function coefficients can be determined relatively easily to achieve the best approximation property. Since Hermite functions are eigenfunctions of the Fourier transform, time and frequency spectra are simultaneously approximated. Each subsequent basis function extends frequency bandwidth within a limited range of well concentrated energy; see for instance [1]. By introducing a scaling parameter we may control this bandwidth influencing at the same time the dynamic range of the input argument. As pointed out in [2] the product of time and frequency bandwidths for Hermite functions, is the largest over set of continuous functions. Hermite functions display various geometrical shapes controlled by simple parameter(s). It was suggested to use Hermite functions as activation functions in ˇ A. Dobnikar, U. Lotriˇ c, and B. Ster (Eds.): ICANNGA 2011, Part I, LNCS 6593, pp. 130–139, 2011. c Springer-Verlag Berlin Heidelberg 2011
Approximation of Functions by Multivariable Hermite Basis
131
neural schemes. In [3], a so called “constructive” approximation scheme is used. It is a type of incremental approximation developed in [4], [5]. Every node in the hidden layer has a different activation function. Intuitively the most appropriate shape can be applied. However, in such approach the orthogonality of Hermite functions is not really exploited. If Hermite, one-variable functions are extended into two-variables, the approximation retains the same useful properties and it turns out to be very suitable for image compression tasks. For n-variables case, although main features are the same, the whole process become more complicated. The biggest advantage of approximation by Hermite basis is, that due to its orthonormality, the approximation does not involve search algorithms. However for an initial step of approximation, one has to consider the time and frequency bandwidths. For one-variable case, these two bandwidths could be controlled by a simple scaling parameter which could be selected to some extent arbitrarily. Much more difficult is to choose appropriate scaling parameters in a multivariable case. So we are suggesting to use a search algorithm for that, while the expansion coefficients are calculated explicitly via appropriate formulae. Because approximation by orthonormal basis is numerically very efficient, one can take advantage of that and calculate a larger number of basis functions, then select from them only those which contribute the most to the approximation error decrease. It seems, that this basis selection procedure is the main reason for a good generalization property of this method. This paper is organized as follows. In Section 2 basic facts about approximation needed for later use are recalled. In Section 3 one-variable Hermite functions as basic components for multivariable case, are shortly described. Then we present our results in Section 4, describing multivariable Hermite basis construction, scaling parameters selection, final choice of basis functions and working examples. Finally in Section 5, conclusions are drawn.
2
Approximation Framework
Some selected facts on function approximation useful for this paper will be recalled. Let us consider the following function fn+1 =
n
wi gi ,
(1)
i=0
where gi ∈ G ⊂ H, and H is a Hilbert space H = (H,||.||), i = 0, ..., n, and wi ∈ IR, i = 0, . . . , n. For any function f from a Hilbert space H and a closed (finite dimensional) subspace G ⊂ H with basis {g0 , ..., gn } there exists a unique best approximation of f by elements of G ([6]). Let us denote it by gb . Because the error of the best approximation is orthogonal to all elements of the approximation space f −gb ⊥G, the coefficients wi may be calculated from the following set of linear equations gi , f − gb = 0 for i = 0, ..., n where ., . denotes inner product.
(2)
132
B. Beliczynski
n nThe formula (2) can also be written as gi , f − k=0 wk gk = gi , f − k=0 wk gi , gk = 0 for i = 0, ..., n or in the matrix form Γ w = Gf
(3)
where Γ = [gi , gj ], i, j = 0, ..., n, w = [w0 , ..., wn ] , Gf = [g0 , f , ..., gn , f ]T and “T” denotes transposition. Because there exists a unique best approximation of f in a n + 1 dimensional space G with basis {g0 , ..., gn }, the matrix Γ is nonsingular and wb = Γ −1 Gf . For any basis {g0 , ..., gn } one can find such orthonormal basis {e0 , ..., en }, ei , ej = 1 when i = j and ei , ej = 0 when i = j that span{g0, ..., gn } = span{e0 , ..., en }. In such a case, Γ is a unit matrix and T wb = e0 , f , e2 , f , . .., en , f . (4) T
Finally (1) will take the form fn+1 =
n
ei , f ei ,
i = 0, 1, ..., n.
(5)
i=0
The squared error errorn+1 =< f − fn , f − fn > of the best approximation of a function f in the basis {e0 , ..., en } is thus expressible by 2
2
||errorn+1 || = ||f || −
n
wi2 .
(6)
i=0
In a typically stated approximation problem, a basis of n + 1 functions {e0 , e1 , .. ., en } is given and we are looking for their expansion coefficients wi = ei , f , i = 0, 1, ..., n. According to formula (6) those expansion coefficients are contributing directly to the error decrease, and they can be used to order the basis from the most to the least significant as far as error decrease is concerned.
3
One-Variable Hermite Functions
Our multivariable basis for approximation will be composed from one-variable Hermite functions, so we will briefly describe these components. Let us consider +∞ a space L2 (−∞, +∞) with the inner product defined < x, y >= x(t)y(t)dt. −∞
In such space a sequence of orthonormal functions could be defined as follows (see for instance [6]): h0 (t), h1 (t), ..., hn (t), ... (7) where t2
2
hn (t) = cn e− 2 Hn (t); Hn (t) = (−1)n et and Hn (t) is a polynomial.
dn −t2 1 (e ); cn = n √ 1/2 . (8) dtn (2 n! π)
Approximation of Functions by Multivariable Hermite Basis
133
The polynomials Hn (t) are called Hermite polynomials and the functions hn (t) Hermite functions. According to (8) the first several Hermite functions could be calculated h0 (t) =
1 π 1/4
t2
e− 2 ;
h1 (t) = √
t2 1 e− 2 2t; 2π 1/4
(9)
t2 t2 1 1 h2 (t) = √ e− 2 (4t2 − 2); h3 (t) = √ e− 2 (8t3 − 12t) 1/4 1/4 2 2π 4 3π
(10)
Plots of several Hermite functions are shown in Fig.1. 0.8 h0 h3 h9
0.6
0.4
0.2
0
−0.2
−0.4
−0.6 −8
−6
−4
−2
0
2
4
6
8
Fig. 1. Hermite functions h0 , h1 , h9
One can see that increasing of indices of Hermite functions cause enlarging bandwidths in time and frequency. So when approximating a function, it is reasonable to start from lower indices basis functions and gradually go for higher ones. If approximated function is located not in the range of a Hermite function as displayed in Fig. 1, then one can modify the basis (7) by scaling t variable via σ ∈ (0, ∞) as a parameter. So if one substitutes t := σt into (8) and modifies cn to ensure orthonormality, then hn (t, σ) = cn,σ e and
2 − t 2 2σ
t 1 √ Hn ( ) where cn,σ = n σ (σ2 n! π)1/2
√ 1 t hn (t, σ) = √ hn ( ) and hn (ω, σ) = σ hn (σω) σ σ
(11)
(12)
134
B. Beliczynski
Note that hn as defined by (11) is the two arguments function whereas hn as defined by (8) has only one argument. These functions are related by (12). Thus by introducing scaling parameter σ into (11) one may adjust both the dynamic range of the input argument hn (t, σ) and its frequency bandwidth √ √ 1√ 1√ t ∈ [−σ 2n + 1, σ 2n + 1] ; ω ∈ [− 2n + 1, 2n + 1] σ σ
(13)
Suppose that one-variable function f defined over the range of its argument t ∈ [−tmax , tmax ] has to be approximated by using Hermite expansions. Assume that the retained function angular frequency should at least be ωr , then according to (13), the following two conditions should be fulfilled √ σ 2n + 1 ≥ tmax
and
1√ 2n + 1 ≥ ωr σ
or tmax σ ∈ [σl , σh ] where σl = √ and σh = 2n + 1 One would expect that σl ≤ σh , what is equivalent to
√
2n + 1 ωr
tmax ωr ≤ 2n + 1
(14)
(15)
(16)
In order to preserve orthonormality of the set {h0 (t, σ), h1 (t, σ), ..., hn (t, σ)}, σ must be chosen the same for all functions hi (t, σ), i = 0, ..., n. Widely discussed on such occasion the lost of basis orthonormality due to basis truncation, in many practical cases is not crucial [7].
4 4.1
Multivariable Function Approximation Multivariable Hermite Basis
Let function to be approximated f belongs to Hilbert space f ∈ H, H = (H,||.||) and be function of n-variables. Let denote it explicitly as f (x1 , x2 , ..., xn ). Let one-variable Hermite function be denoted as hi (xj , σj ), where j ∈ {1, ..., m} and i ∈ {0, 1, ..., n}
(17)
and multivariable basis function hl (x1 , x2 , ..., xm , σ1 , σ2 , ..., σm ) be the following hl (x1 , x2 , ..., xm , σ1 , σ2 , ..., σm ) = hi1 (x1 , σ1 )hi2 (x2 , σ2 )...him (xm , σm )
(18)
where i1 , i2 , ..., im ∈ {0, 1, ..., n}. Clearly for each one out of m variables, there are n + 1 indices of Hermite functions. This gives total (n + 1)m basis functions. They can be enumerated l=
m j=1
so l ∈ {0, 1, ..., (n + 1)m − 1}.
ij (n + 1)j−1
(19)
Approximation of Functions by Multivariable Hermite Basis
135
Naming now x = (x1 , x2 , ..., xm ) and σ = (σ1 , σ2 , ..., σm ), then instead of hl (x1 , x2 , ..., xm , σ1 , σ2 , ..., σm ), we will write in short hl (x, σ) or hl . Finally the multivariable basis is the following h0 , h1 , ..., h(n+1)m −1 (20) One can easily verify that the multivariable basis is orthonormal i.e.
1 for i = j hi , hj = 0 elsewhere The approximant f(n+1)m of f will be expressed as (n+1)m −1
f(n+1)m (x, σ) =
wl hl (x, σ)
(21)
l=0
where
wl = hl , f .
(22)
f(n+1)m approaches function f if number of elements n goes to infinity f = f∞ = ∞ l=0 wl hl . An interesting survey of math research on multivariables polynomials and Hermite interpolation one can find in [8]. 4.2
Scaling Parameters
Hermite functions are well localized in frequency and time. If a scaling parameter is introduced, it influences both time and frequency ranges but in opposite ways (13). If it is chosen too small, then a fragment of function could poorly be approximated. If it is chosen too large only part of the approximated function spectrum is preserved. If only one-variable function is being approximated, the scaling parameter σ can even intuitively be chosen. If however several variables are involved, the best choice is more complicated and must be calculated. We suggest the following criterion 2 σ0 = arg min f(n+1)m (x, σ) − f (x) σ
Usually, in order to get σ0 , a number of iterations is needed. 4.3
Basis Selection
If we approximate m-variables function and along each variable we use n + 1 orthonormal components, then it will be (n + 1)m summation terms in (21). For instance if we approximate a 3 variables function with 15 Hermitian components along each variable, then we have 3375 summation terms. One expects that a significant part of all components have a very small, practically negligible influence to the approximation. As clearly visible from formula (6), the components
136
B. Beliczynski
associated with large wi2 (or |wi |) are contributing the most to the error decrease. So taking advantage of efficiency of approximation by orthonormal basis, we initially calculate an excessive number of Hermite expansion terms and select only the most significant as far as error decrease is concerned. This basis selection can be interpreted as a simple pruning method, a classical neural technique improving generalization, see for instance [9]. 4.4
Examples
Example 1. Let function to be approximated be the following 2
2
f (x1 , x2 ) = x1 e−x1 −x2 .
(23)
Its plot is presented in Fig.2. Let us approximate the function in the range [−3, 3]2 . We take 41 points along each axis obtaining totally 1681 pairs of the (argument, function value) to be processed. Along each axis number of Hermite components was set to 3, so every one-variable Hermite function could have indices 0, 1 or 2. We obtained 32 Hermite components. The expansion coefficients (weights) were calculated according to (22). Two scaling factors σ1 and σ2 were determined via search-type procedure. Finally we found that σ1 = σ2 = 0.7071. The Hermite expansion components (21) were ordered by squares of their coefficients wi . The first two components are written in (24). f9 (x1 , x2 , σ1 , σ2 ) = w1 h1 (x1 , σ1 )h0 (x2 , σ2 ) + w7 h1 (x1 , σ1 )h2 (x2 , σ2 ) + ... (24) and their expansion coefficients were w1 = 0.6267 and w7 = 1.5613e − 018. It is clear that to approximate this function it is sufficient to take only one node. Finally the result is the following f1 (x, σ) = 0.6267h1 (x, σ), or f1 (x1 , x2 , 0.7071, 0.7071) = 0.6267h1(x1 , 0.7071)h0(x2 , 0.7071) The h0 and h1 functions are calculated by using (12) and (9). Mean Squares Error (MSE) of the approximation is 5.6e − 12, so the approximant is almost exactly the same as the origin. Performance of this approximation is an argument in favour of a good generalization property of this Hermite function based approximation. In fact one can write the following √ 2 x2 − 12 π 1 1 − x22 f (x1 , x2 ) = x1 e = ( √ )( √ 1 e 2σ1 2x1 )( 1 e 2σ2 ) = 2 2 2π 4 π4 √ π 1 1 = ( √ )h1 (x1 , √ )h0 (x2 , √ ) = 0.6267h1(x1 , 0.7071)h0(x2 , 0.7071) 2 2 2 2 −x21 −x22
what means that generalization from numerical data is almost perfect. We have obtained the function formula which is suitable to be used anywhere, also outside the given region [−3, 3]2 .
Approximation of Functions by Multivariable Hermite Basis
137
z
0.5
0
−0.5 4 2
4 2
0
0
−2
−2 −4
y
−4
x
Fig. 2. The original function
More demanding generalization experiment is the following. For every function value, the noise signal is randomly generated in the range [−0.1, 0.1] and added to the function. The noised function is presented and Fig.3.
0.5
0
−0.5 4 2
4 2
0
0
−2
−2 −4
−4
Fig. 3. Random noise added to the function values to be used as an input for approximation algorithm
138
B. Beliczynski
As in the previous case there was only one expansion term sufficient. Because random feature of the experiment, we ran it 5 times, averaging obtained numbers. As the result w1 = 0.6283, σ1 = 0.7039, σ2 = 0.7050 were calculated. Those parameters are very close to the originals. MSE between the original function and the approximation obtained from noisy function was 1.42e − 5, what seems to be very good result of generalization. Example 2. In this example the function to be approximated is the following 2
2
f (x1 , x2 , x3 ) = x1 e−x1 −x2 sin(x1 + x2 + x3 ).
(25)
Let us use again the range [−3, 3]2 . We take 21 points along each axis obtaining totally 9261 pairs of arguments and function values to be processed. Along each axis, the number of Hermite components was set again to 3, so every one-variable Hermite function could have indices 0, 1 or 2. We obtained 33 Hermite components. Squares of the expansion coefficients (weights) ordered in nonincreasing order are plotted in Fig.4 It is clear from this plot that 14 out of 27 expansion Hermite terms is sufficient to approximate function (25). MSE between the original function and approximated function is on the level of 3.4e − 4. If instead, one takes only 10 out of 27, this ensures 99% of error reduction. When similarly to the previous example a noise generated randomly from the range [−0.1, 0.1] was added and noisy data were used to process function approximation, then again difference between the original function (25) and the approximant (MSE), was on similar level 3.6e − 4. Again this is a good sign of generalization ability of this type Hermite based approximation.
10
10
10
10
10
10
10
−5
−10
−15
−20
−25
−30
−35
5
10
15
20
25
l
Fig. 4. Squares of wi (22) versus l from the most significant to the least
Approximation of Functions by Multivariable Hermite Basis
5
139
Conclusions
We presented a hybrid method of multivariable function approximation by Hermite basis. The basis is composed from one-variable Hermite functions. Scaling parameters are determined via search algorithm, while expansion coefficients are calculated explicitly from appropriate formulae. Initially we take an excessive number of expansion terms and select only those which contribute the most to the error decrease. This procedure seems to be the reason for a very good generalization property of the method.
References 1. Beliczynski, B.: Properties of the Hermite activation functions in a neural approximation scheme. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007, Part II. LNCS, vol. 4432, pp. 46–54. Springer, Heidelberg (2007) 2. Hlawatsch, F.: Time-Frequency Analysis and Synthesis of Linear Signal Spaces. Kluwer Academic Publishers, Dordrecht (1998) 3. Ma, L., Khorasani, K.: Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Transactions on Neural Networks 16, 821–833 (2005) 4. Kwok, T., Yeung, D.: Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans. Neural Netw. 8(3), 630–645 (1997) 5. Kwok, T., Yeung, D.: Objective functions for training new hidden units in constructive neural networks. IEEE Trans. Neural Networks 8(5), 1131–1148 (1997) 6. Kreyszig, E.: Introductory functional analysis with applications. J. Wiley, Chichester (1978) 7. Beliczynski, B., Ribeiro, B.: Some enhanencement to approximation of one-variable functions by orthonormal basis. Neural Network World 19, 401–412 (2009) 8. Lorentz, R.: Multivariate hermite interpolation by algebraic polynomials: A survey. Journal of Computational and Applied Mathematics 122, 167–201 (2000) 9. Reed, R.: Pruning algorithms - a survey. IEEE Trans. on Neural Networks 4(5), 740–747 (1993)