Accepted for Publication by IEEE Transactions on Neural Networks.
Approximation Capability to Functions of Several Variables, Nonlinear Functionals and Operators by Radial Basis Function Neural Networks Tianping Chen and Robert Chen 1
2
Abstract The purpose of this paper is to explore the representation capability of radial basis function (RBF) neural networks. The main results are: (1) The necessary and sucient condition for a function of one variable to be quali ed as an activation function in RBF network is that the function is not an even polynomial. (2) The capability of approximation to nonlinear functionals and operators by RBF networks is revealed, using sample data either in frequency domain or in time domain, which can be used in system identi cation by neural networks. Key words and phrases: Approximation theory, neural networks, system identi cation, radial basis networks, functionals, operators, wavelets.
The author is with the Department of Mathematics, Fudan University, Shanghai, P.R.China. His work has been supported in part by NSF of China, Shanghai NSF, the \Climbing" Project in China, Grant NSC 92092, and the Doctoral Program Foundation of Educational Commission in China. 2 The author is with VLSI Libraries, Inc., 1836 Cabrillo Ave., Santa Clara, CA 95050, USA. 1
1 Introduction There have been several recent works concerning the representation capabilities of multilayer feedforward neural networks. For example, in the past few years, several papers ([1]-[7] and many others) related to this topic have appeared. They all claimed that a three-layered neural network with sigmoid units on the hidden layer can approximate continuous or other kinds of functions de ned on compact sets in
Rn. In many of those papers, sigmoidal functions need to be assumed to be continuous or monotone. Recently [9] [10], we pointed out that the boundedness of the sigmoidal function plays an essential role for its being an activation function in the hidden layer, i.e., instead of continuity or monotonity, the boundedness of sigmoidal functions ensures the network capability. In addition to sigmoidal functions, many other functions can be used as activation functions of neural networks. For example, in [12], we proved that for a function
g 2 C (R ) \ S 0(R ) to be an activation function in feedforward neural networks, the necessary and sucient condition is that the function is not a polynomial (also see [5]). 1
1
The above papers are all signi cant advances towards solving the problem of whether a function is quali ed as an activation function in neural networks. However, they only dealt with ane-basis-function neural networks (ABF), also called multilayer perceptrons (MLP) and the goal there is to approximate continuous func-
2
tions by the family
N X i=1
ci g(yi x + i)
where yi 2 Rn, ci; i 2 R and yi x denotes the inner product of y and x. 1
Among the various kinds of promising neural networks currently under active research, there is another type called radial-basis-function (RBF) networks [13] (also called localized receptive eld network [16]), where the activation functions are radially symmetric and produce a localized response to input stimulus. For a survey, see, for example, [15]. A block diagram of an RBF network is shown in Fig. 1. One of the special basis functions that are commonly used is a Gaussian kernel function. Using the Gaussian basis function, RBF networks is capable of forming an arbitrarily close approximation to any continuous functions, as shown by [17] [18] [19]. More generally, the goal here is to approximate functions of a nite number of real variables by
N X i=1
ci g(ikx ? xikRn )
where x 2 Rn and kx ? xikRn is the distance between x and xi in Rn. Here, the activation function g is not necessarily Gaussian. In this direction, several results concerning RBF neural networks were obtained [13] [14]. In [14], Park and Sandberg proved the following important theorem: Let K : Rn ! R be a radial symmetric, integrable, bounded function such that R
K is continuous almost everywhere and Rn K (x)dx 6= 0, then the family N X ci g( kx ?xikRn ) i =1
3
is dense in Lp(Rn), where g(kxkRn ) = K (x). In [23], Park and Sandberg discussed several related results on the L , L approx1
2
imation. For example, they proved the following very interesting theorem:
Theorem A. Assuming that K : Rn ! R is a square-integrable function, then the family N X ci g( kx ?xikRn ) i is dense in L (Rn) if and only if K is pointable. 1
=1
2
In [24], we generalized this result and proved
Theorem B. Suppose that g: R ! R , such that g(kxkRn ) 2 L (Rn ), then the family of +
functions
1
2
N X
is dense in L (Rn).
i=1
ci g( kx ?xikRn ) i
2
It is now natural to ask the following questions: (1) What is the necessary and sucient condition for a function to be quali ed as an activation function in RBF
neural networks? (2) How to approximate nonlinear functionals by RBF neural networks? (3) How to approximate nonlinear operators (eg. the output of a system) by RBF neural networks, using sample data in frequency (phase) domain or in time (state) domain? The purpose of this paper is to give strong results in answering those questions. 4
This paper is organized as follows. In Section 2, we list our symbols, notations and review some de nitions. In Section 3, we show that the necessary and sucient condition for a continuous function in S 0(R ) to be quali ed as an activation function 1
in RBF networks is that it is not an even polynomial. In Section 4, we show the capability of RBF neural networks to approximate nonlinear functionals and operators on some Banach space as well as on some compact set in C (K ), where K is a compact set in any Banach space. Furthermore, we establish the capability of neural networks to approximate nonlinear operators from C (K ) to C (K ). Approximations using 1
2
samples in both frequency domain and time domain are discussed. Examples are given, which includes the use of wavelet coecients to the approximation. It is also pointed out that the main results in Section 4 can be used in computing the outputs of nonlinear dynamical systems, thus identifying the system. We conclude this paper with Section 5.
2 Notations and De nitions We list here the main symbols and notations that will be used throughout this paper.
X : some Banach space with norm k kX .
Rn: Euclidean space of dimension n with norm k kRn . K : some compact set in a Banach space.
5
C (K ): Banach space of all continuous functions de ned on K , with norm
kf kC K = max jf (x)j x2K (
)
.
V : some compact set in C (K ). S (Rn ): All Schwartz functions in distribution theory, i.e., all the in nitely dierentiable functions, which are rapidly decreasing at in nity. S 0(Rn ): All the distributions de ned on S (Rn), i.e., all the linear continuous functionals de ned on S (Rn ). C 1(Rn ): All in nitely dierentiable functions de ned on Rn . CC1: all in nitely dierentiable functions with compact support in Rn . We review the following de nitions.
De nition 1.
A function : R ! R is called a (generalized) sigmoidal
function, if it satis es
1
(
1
limx!?1 (x) = 0 ; limx!1 (x) = 1 :
(1)
De nition 2. Let X be a Banach space with norm kkX . If there are elements xn 2 X , n = 1; 2; : : :, such that for every x 2 X there is a unique real number sequence an(x), such that
x=
1 X n=1
an(x)xn ;
where the series converges in X , then fxng1n is called a Schauder basis in X and X =1
is called a Banach space with Schauder basis. 6
De nition 3. Suppose that X is a Banach space, V X is called a compact set in X , if for every sequence fxng1n with all xn 2 V , there is a subsequence fxnk g which converges to some element x 2 V . It is well known that if V X is a compact set in X , then for any > 0, there is a -net N () = fx ; : : :; xn g, with all xi 2 V , i = 1; : : : ; n(), i.e. for every x 2 X , there is some xi 2 N () such that kxi ? xkX < . =1
1
( )
3 Characteristics of Continuous Functions as Activation in RBF Networks In this section, we show that for a continuous function to be quali ed as an activation function in RBF networks, the necessary and sucient condition is that it is not an even polynomial and we prove two approximation theorems by RBF networks. More precisely, we prove
Theorem 1 Suppose that g 2 C (R ) \ S 0(R ), i.e., all those continuous functions such that sR g(x)s(x) dx makes sense for all s 2 S (R ), then the family 1
1
1
1
N X i=1
cig(i kx ? yikRn )
is dense in C (K ), if and only if g is not an even polynomial, where K is a compact set in Rn, yi 2 Rn, ci, i 2 R1, i = 1; : : : ; N .
Theorem 2 Suppose that g 2 C (R ) \ S 0(R ) and is not an even polynomial, K is a compact set in Rn, V is a compact set in C (K ), then for any > 0, there are 1
1
7
a positive integer N , i 2 R1 , yi 2 Rn , i = 1; : : : ; N , which are all independent of
f 2 V and constants ci(f ) depending on f , i = 1; : : : ; N , such that
jf (x) ?
N X i=1
ci(f ) g(i kx ? yikRn )j <
(2)
holds for all x 2 K , f 2 V . Moreover, every ci (f ) is a continuous functional de ned on V .
Remark 1.
It is worth noting that i and yi are all independent of f in V
and ci(f ) are continuous functionals. This fact will play an important role in the approximation to nonlinear operators by RBF networks. To prove Theorem 1, we need to prove the following Lemma which is of great signi cance by itself.
Lemma 1 Suppose that h(x) 2 C (Rn) \ S 0(Rn), then the family N X i=1
ci h(i i(x) + yi)
is dense in C (K ) if an only if h is not a polynomial, where i are rotations in Rn,
yi 2 Rn, i 2 R , i = 1; : : : ; N . 1
Proof.
Suciency. Suppose that the linear combinations Pni ci h(ii(x) + i) =1
are not dense in C [a; b], then by Hahn-Banach extension theorem of linear functionals and Riesz representation theorem (see [6]), we conclude that there is a bounded signed Borel measure d with supp(d) K and Zb a
h( (x) + y) d(x) = 0 8
(3)
for all 2 R , y 2 Rn and all rotation . Pick any w 2 S (Rn), then 1
Z Z w(y) dy n h((x) + y) d(x) = 0 R Rn
(4)
Changing the order of integral, we have Z Z h(u)du n w(y) d(?1 ( u ? y )) = 0 n R R
(5)
^ (? (t))i = 0 h^h(t); w^ (t)d
(6)
which is equivalent to 1
for all 2 R , 6= 0, rotations , where h^ represents the Fourier transform of h in 1
the sense of distribution. In order for (6) to make sense, we have to show that ^ (? (t)) 2 S 0(Rn) : w^ (t)d 1
In fact, w^ (t) 2 S (Rn). Moreover, since supp(d) K , it is easy to verify that ^ (t) = R e?itx d(x) 2 C 1(Rn) and there are constants ck , k = 1; 2; : : :, such that d
j@ k d^ (t)j ck :
(7)
^ (t) 2 S (Rn). Thus, w^(t)d ^ (t) 2 C 1(Rn), there is t 2 Rn, t 6= 0 and a neighborhood Since d 6= 0 and d ^ (t)j > c > 0 for all t 2 O(t ; ). O(t ; ) = fx : kx ? t kRn < g such that jd 0
0
0
0
0
Pick t 2 Rn, t 6= 0 arbitrarily. Let t = (t ), where is a rotation in Rn. ^ ((t))j > c for all t 2 O(t ; =). Then jd 1
1
0
1
9
1
^ ((t)) 2 Cc1(O(t ; )), because jd ^ ((t))j > Let w^ 2 Cc1(O(t ; =)), then w^(t)=d ^ ((t)) 2 C 1(Rn ). Therefore, and d 1
1
h^h(t); w^(t)i = h^h(t); ^ w^(t) d^ ((t))i = 0 : d((t))
(8)
Previous argument shows that for any t 2 Rn, t 6= 0, there is a neighborhood of t: O(t; ) such that h^h(t); w^(t)i = 0 (9) holds for all w^ with support supp(w^ ) O(t; ), which means that supp(^g) f0g. It is well known that a distribution is the Fourier transform of a polynomial if and only if its support is a subset of f0g: (see [24, Section 7.16]). Thus g is a polynomial. Necessity. If g is a polynomial of degree m, then all the functions N X i=1
cig(i i(x) + yi)
are polynomials of x ; : : :; xn with total degree m, which, of course, are not dense in 1
C (K ). Lemma 1 is proved.
Proof of Theorem 1. Suppose that g 2 C (R ) \ S 0(R ), then h(x) = g(kxkRn ) 2 S 0(Rn) \ C (Rn) and 1
1
N X i=1
ci g(i kx ? yikRn )
= =
N X i=1 N X i=1
ci g(i k(x) ? (yi)kRn ) ci g(ki (x) ? zikRn )
where zi = i (yi). 10
(10)
From Lemma 1, we see that the family PNi ci g(i kx ? yikRn ) is dense in C (K ), =1
if and only if g(kxkRn ) is not a polynomial in Rn , which is equivalent to that g is not an even polynomial. Theorem 1 is proved.
Lemma 2 [20] Let V C (K ), then V is a compact set in C (K ), if and only if (i) V is a closed set in C (K ); (ii) there is a constant M such that kf kC K M for all f 2 V ; (iii) for any > 0, there is > 0, such that jf (x0) ? f (x00)j < for all f 2 V , provided that x0, x00 2 K and kx0 ? x00kX < . (
)
We are now ready to prove Theorem 2.
Proof of Theorem 2. Let h(x) = e?kxkRn and h (x) = ?nh( x ) and (f h )(x) =
Z
K
f (t) h (x ? t) dt :
(11)
It is easy to prove that for any > 0, there is > 0 such that
j(f h )(x) ? f (x)j < =3
(12)
holds for all x 2 K and f 2 V . Writing (f h )(x) in Riemann sum approximately, we see that there are positive integer N , xj 2 Rn, constants cj (f ) depending on f continuously, i = 1; ; N such that
j
N X cj (f )e? kx?xj kRn ? (f h )(x)j < =3 1
2
j =1
for all x 2 K and f 2 V .
11
(13)
Now, for every j = 1; : : : ; N , by Theorem 1 we can nd another integer Nj , constants dij , i , yi j 2 Rn, i = 1; : : : ; Nj such that ( )
Nj X
j dij g(i kx ? yi j kRn ) ? e? kx?xj kRn j < 3L 1
( )
2
i=1
(14)
for all x 2 K , where L = supf 2V PNj jcj (f )j. =1
Combining inequalities (12) (13) and (14), we conclude that there are integer N , vectors yi 2 Rn , ci; i 2 R , i = 1; : : : ; N such that 1
jf (x) ?
N X i=1
ci(f )g(i kx ? yikRn )j <
(15)
for all x 2 K and f 2 V . Moreover, for every i, ci (f ) is a continuous functional de ned on V . Theorem 2 is proved.
4 Approximation to Nonlinear Functionals and Operators by RBF Neural Networks In this section, we show some results concerning capability of RBF neural networks in approximating nonlinear functionals and operators de ned on some Banach space, which can be used to approximate outputs of dynamical systems using sample data in either frequency (phase) domain or time (state) domain. We rst introduce one of our results.
Theorem 3 Suppose that g 2 C (R ) \ S 0(R ) and is not an even polynomial, X is a Banach space with Schauder basis fxn g1 n , K X is a compact set in X , f is 1
1
=1
12
a continuous functional de ned on K . Then for any > 0, there are positive integers
M , N , yiM 2 RM , constants ci; i 6= 0, i = 1; : : : ; N , such that
jf (x) ?
N X i=1
ci g(i kxM ? yiM kRM )j <
(16)
holds for all x 2 K , where xM = (a1(x); : : :; aM (x)) 2 RM , x = P1 n=1 an (x)xn .
To prove Theorem 3, we need the following two lemmas.
Lemma 3 [21] Suppose that X be a Banach space with Schauder basis fxn g1n , then K X is a compact set in X if and only if the following two conditions are =1
satis ed simultaneously: (1) K is a closed set in X ; (2) for any > 0, there is a positive integer M such that
k holds for all x 2 K .
1 X n=M +1
an(x)xnkX <
Lemma 4 Suppose that K is a compact set in a Banach space X with Schauder Pn S1 n n n basis fxng1 n . De ne K = f i ai (x)xi ; x 2 K g and K = K [ n K , then K is a compact set in Rn and K is a compact set in X . =1
Proof.
=1
=1
It is easy to verify that K n is a compact set in Xn (also in X ), provided
that K is a compact set in X . Now, suppose fung1 n is a sequence in K , then one of the following two cases =1
occurs: (i) There is a subsequence funk g1k of fung1n with all elements being in K =1
=1
or in some xed K n; (ii) There is no such subsequence. 13
In case (i), it is obvious there is another subsequence of funk g1k , which converges =1
to some element u in K or in K n, because K and K n are compact sets in X . In case (ii), there is a sequence vn = P1i ai(vn)xi and integers Mn tending to =1
in nity as n ! 1, such that un = PMi n ai(vn)xi. By taking a suitable subsequence, =1
without loss of generality, we can assume that vn converges to some v 2 K as n ! 1. By Lemma 3, un ? vn ! 0 as n ! 1. Thus un converges to v. Combining the two cases, we conclude that K is a compact set in X . Lemma 4 is proved.
Proof of Theorem 3. By Tietze extension theorem, we can extend f to a continuous functional f de ned on K . Since f is a continuous functional de ned on the compact set K , then for any
> 0, there is a > 0 such that jf (x0) ? f (x00)j < =2 provided that x0; x00 2 K and kx0 ? x00k < . By Lemma 3, there is an integer M such that
kx0 ? xM kX < for all x 2 K . Therefore
jf (x) ? f (xM )j < =2
(17)
for all x 2 K . K M is homomorphic to some compact set KRM in RM by the map: M X
f an(x)xn : x 2 K g ! fxM = (a (x); : : :; aM (x)) 2 RM : x 2 K g: n=1
1
14
Now, f is a continuous functional de ned on every KRM . Applying Theorem 1,
we can nd an integer N , yiM 2 RM , ci; i 2 R , i = 1; : : : ; N , such that 1
jf (xM ) ? for all xM 2 K M .
N X i=1
ci g(ikxM ? yiM kRM )j < =2
(18)
Combining (17) and (18), we conclude that
jf (x) ?
N X i=1
ci g(i kxM ? yiM kRM )j <
(19)
for all x 2 K . Theorem 3 is proved. To illustrate the applications of Theorem 3, we now give some examples.
Example 1. Let H be a Hilbert space, K H be a compact set in H . As we showed in [22], K can be considered as a compact set in a Hilbert space H with 1
countable basis fxng1 n . Thus, Theorem 3 can be applied and we conclude that =1
every continuous functional can be arbitrarily well approximated by the sum N X i=1
ci g(i kxM ? yiM kRM )
Example 2. Let X = L [0; 2], then f1; fcos nxg1n ; fsin nxg1n g is a Schauder 2
=1
=1
basis and an(x) are just the Fourier coecients correspondingly. Thus, we can approximate every nonlinear functional de ned on some compact set in L [0; 2] by 2
RBF neural networks using sample data in frequency domain.
Example 3.
Let X = L (R ) and f 2
1
j;k g1 j;k=1
be wavelets in L (R ). Then we 2
1
can approximate continuous functionals de ned on a compact set in L (R ) by RBF 2
1
neural network using wavelet coecients as sample data (also in frequency domain). 15
Remark 2. Since in Theorem 3 we only require that fxng1n is a Schauder basis (no orthogonal requirement is imposed), therefore we do not require that f j;k g1j;k =1
=1
be orthogonal wavelets. This is a property of signi cant advantage, for non-orthogonal wavelets are much easier to construct than orthogonal wavelets. The following theorem shows the possibility of approximating functionals by RBF neural networks using sample data in time (or state) domain.
Theorem 4 Suppose that g 2 C (R ) \ S 0(R ) is not an even polynomial, X is a Banach space, K X is a compact set, V is a compact set in C (K ), f is a 1
1
continuous functional de ned on V , then for any > 0, there are positive integers N ,
M , x ; : : : ; xM 2 K and i; ci 2 R , i = (i ; : : :; iM ) 2 RM , i = 1; : : :; N such that 1
1
jf (u) ?
N X i=1
1
ci g(i kuM ? iM kRM )j <
(20)
for all u 2 V , where uM = (u(x1); : : : ; u(xM )) 2 RM .
Proof of Theorem 4. We follow a similar line as used in [12]. Pick a sequence > > > n ! 0, then we can nd another sequence > > > n ! 0, such that jf (u) ? f (v)j < k for all f 2 V , whenever u; v 2 V and ku ? vkC K < k. Moreover, we can nd > > n ! 0, such that ju(x0) ? u(x00)j < k for all u 2 V , whenever x0; x00 2 K and kx0 ? x00kX < k . 1
2
1
2
(
1
)
2
By induction and rearrangement, we can nd a sequence fxig1i with each xi 2 K =1
and a sequence of positive integers n( ) < n( ) < : : : < n(k ) ! 1, such that the 1
2
rst n(k ) elements N (k ) = fx ; : : : ; xn k g is an k -net in K . 1
(
)
16
For each k -net, de ne functions
Tk;j (x) =
(
1 ? kx?xkj kk 0
if kx ? xj kX < k otherwise
(21)
and
T (x) (22) Tk ;j (x) = Pn kk;j T ( x ) j k ;j for j = 1; : : : ; n(k ). It is easy to verify that fTk ;j (x)g is a partition of unity, i.e. 8 > < Pn 0 Tk ;j (x) 1 k T (x) 1 (23) for x 2 K > : T j (x) =k0;j if kx ? xj kX > k : k ;j For each u 2 V , de ne a function ( ) =1
( ) =1
uk (x) =
nX (k ) j =1
u(xj )Tk ;j (x)
(24)
Moreover, let Vk = fuk : u 2 V g and V = V [ ([1k Vk ). We then have the =1
following conclusion:
3
1. For any xed k, Vk is a compact set in a space of dimension n(k ) in C (K ). 2. For every u 2 V , there holds
ku ? uk kC K < k (
)
(25)
3. V is a compact set in C (K ). Now, similar to the proof of Theorem 3, we can extend f to a continuous functional
f on V , such that f (x) = f (x) 3
if x 2 V :
For a proof of these three propositions, see Lemma 7 [12] or the Appendix of this paper.
17
(26)
Now, for any > 0, we can nd a > 0 such that jf (u) ? f (v)j < =2 provided that
u; v 2 V and ku ? vkC K < . (
)
Let k be xed such that k < , then by (24), for every u 2 V ,
ku ? uk kX < k
(27)
jf (u) ? f (uk )j < =2
(28)
which implies for all u 2 V . By the argument used in the proof of Theorem 3, letting M = n(k ), there is M ) 2 RM , i = 1; : : : ; N and M points an integer N , i ; ci 2 R , iM = (iM ; : : :; iM 1
1
x ; : : : ; xM 2 K such that 1
jf (uk ) ?
N X i=1
ci g(i kuM ? iM kRM )j < =2
(29)
which, combined with (28), leads to
jf (u) ?
N X i=1
ci g(ikuM ? iM kRM )k <
(30)
for all u 2 V . This completes the proof of Theorem 4. To conclude this section, we construct an RBF neural network to approximate nonlinear operators. More precisely, we prove
Theorem 5 Suppose that g 2 C (R ) \ S 0(R ) and is not an even polynomial, X is a Banach space, K X , K Rn are two compact sets in X and Rn respectively, V 1
1
1
2
is a compact set in C (K1), G is a nonlinear continuous operator, which maps V into
18
C (K ), then for any > 0, there are positive integers M; N; m, constants cki ; k ; i 2 R , k = 1; : : :; N , i = 1; : : : ; M , m points x ; : : :; xm 2 K , ! ; : : :; !N 2 Rn, such 2
1
1
that
jG(u)(y) ?
N M X X i=1 k=1
1
1
cki g(ikum ? ikm kRm ) g(k ky ? !k kRn )j <
(31)
m ), for all u 2 V and y 2 K2 , where um = (u(x1); : : : ; u(xm)), km = (km1 ; : : : ; k;m
k = 1; : : :; N .
Proof of Theorem 5. Since G is a continuous operator which maps a compact set V in C (K ) into C (K ), then the range G(V ) = fG(u) : u 2 V g is also a compact set in C (K ). By Theorem 2, for any > 0, there are a positive integer N , !k 2 Rn, k 2 R , ck (G(u)) 2 R, k = 1; : : :; N , such that 1
2
2
1
jG(u)(y) ?
N X k=1
ck (G(u)) g(k ky ? !k kRn )j < =2
(32)
for all u 2 V and y 2 K , and all ck (G(u)), k = 1; : : : ; N are continuous functional 2
de ned on V . Carefully examing the proof of Theorem 5, we can nd a common m, integer Nk , constants cki , i , ikm 2 Rm, i = 1; : : : ; Nk , m points x ; : : :; xm 2 K , 1
1
um = (u(x ); : : :; u(xm)) 2 Rm, such that 1
jck (G(u)) ?
Nk X i=1
cki g(i kum ? ikm kRm )j < 2L
(33)
holds for all k = 1; : : :; N , u 2 V , where
L=
N X
sup jg(k ky ? !k kRn )j :
k=1 y2K2
(34)
Substituting (33) into (32), we obtain
jG(u)(y) ?
Nk N X X k=1 i=1
cki g(ikum ? ikm kRm ) g(k ky ? !k kRn )j < 19
(35)
for all u 2 V and y 2 K . 2
Let M = maxk fNk g and let cki = 0 for Nk < i M . Thus (35) can be rewritten as
jG(u)(y) ?
N X M X k=1 i=1
cki g(ikum ? ikm kRm ) g(k ky ? !k kRn )j <
for all u 2 V and y 2 K . Theorem 5 is proved. 2
Remark 3. Theorem 5 shows the capability of RBF networks in approximating nonlinear operators using sample data in time (or state) domain. Likewise, by Theorem 3, we can construct RBF networks using sample data in frequency (or phase) domain.
Remark 4. We can also construct neural networks, where ane basis functions are mixed with radial basis functions. For example, we can approximate G(u)(y) by N X M X
k=1 i=1
cki g(i kum ? ikm kRm ) g(!k y + k ) :
The details are omitted here.
Remark 5.
In engineering, a system can be viewed as a operator (linear or
nonlinear). Theorem 5 shows the capability of RBF neural networks in identifying systems, as compared with the Cybenko's theorem, which shows the capability of ABF neural networks in pattern recognition.
5 Conclusion In this paper, the problem of approximating functions of several variables, functionals and nonlinear operators by radial basis function neural networks are studied. The 20
necessary and sucient condition for a continuous function to be quali ed as an activation function in RBF networks is given. Results on using RBF neural networks for computing the output of dynamical systems by sample data in frequency domain or time domain are given.
Acknowledgements. The authors wish to thank Prof. R.-W. Liu of University of Notre Dame and Prof. I.W. Sandberg of University of Texas-Austin for bringing some of the papers in this area to their attention.
References [1] A. Wieland and R. Leighten, \Geometric analysis of neural network capacity", IEEE First ICNN. 1, pp. 385-392 (1987). [2] B. Irie and S. Miyake, \Capacity of Three-layered Perceptrons," in IEEE ICNN 1, pp. 641-648 (1988). [3] S. M. Carroll and B. W. Dickinson, \Construction of Neural Nets using Radon Transform," in IJCNN Proc. I, pp. 607-611 (1989). [4] K. Funahashi, \On the Approximate Realization of Continuous Mappings by Neural Networks," Neural Networks, pp. 183-192, Vol. 2, (1989). [5] H.N. Mhaskar, C.A. Micchelli, \Approximation by Superposition of Sigmoidal and Radial Functions," Advances on Applied Mathematics, pp. 350-373, vol. 13, (1992). [6] G. Cybenko, \Approximation by Superpositions of a Sigmoidal Function," in Math. of Control, Signals and Systems, Vol. 2, No. 4, pp. 303-314 (1989). 21
[7] Y. Ito, \Representation of Functions by Superpositions of a Step or Sigmoidal Function and Their Applications to Neural Network Theory", Neural Networks, Vol. 4, pp. 385-394 (1991). [8] K. Hornik, \Approximation Capabilities of Multilayer Feedforward Networks," Neural Networks, Vol. 4, pp. 251-257 (1991). [9] T. Chen, H. Chen and R.-W. Liu, \Approximation Capability in C (Rn) by Multilayer Feedforward Networks and Related Problems," IEEE Trans. on Neural Networks, accepted. [10] T. Chen, H. Chen and R.-W. Liu, \A Constructive Proof of Cybenko's Approximation Theorem and Its Extensions," pp. 163 - 168 in Computing Science and Statistics (editors LePage and Page), Proc. of the 22nd Symposium on the Interface (East Lansing, Michigan), May 1990. [11] T. Chen and H. Chen, \Approximation to Continuous Functionals by Neural Networks with Application to Dynamic Systems," IEEE Trans. on Neural Networks (in press). [12] T. Chen and H. Chen, \Universal Approximation to Nonlinear Operators by Neural Networks with Arbitrary Activation Functions and Its Application to Dynamic Systems," submitted for publication. [13] R.P. Lippmann, \Pattern Classi cation using Neural Networks," IEEE Commun. Magazine, pp. 47-64, Vol. 27 (1989). [14] J. Park and I.W. Sandberg, \Universal Approximation Using Radial-BasisFunction Networks," Neural Computation, pp. 246-257, Vol. 3 (1991). [15] D. Hush and B. Horne, \Progress in Supervised Neural Networks," IEEE SP Magazine, Jan. (1993). 22
[16] J.E. Moody and C.J. Darken, \Fast Learning in Networks of Locally-tuned Processing Units," Neural Computation, 1:281-293 (1989). [17] F. Girosi and T. Poggio, \Networks and the Best Approximation Property," Arti cial Intelligence Lab, Memo 1164, MIT (1989). [18] E.J. Hartman, J.D. Keeler and J.M. Kowalski, \Layered Neural networks with Gaussian Hidden Units as Universal Approximators," Neural Computation, pp. 210-215, Vol. 2(2) (1990). [19] S. Lee and R.M. Kil, \A Gaussian Potential Function Network with Hierarchically self-organizing learning," Neural Networks, pp. 207-224, Vol. 4 (1991). [20] J. Diedonne, Foundation of Modern Analysis, Academic Press : New York and London (1969), p. 142. [21] L.A. Liusternik and V.J. Sobolev, Elements of Functional Analysis (3rd ed., translated from Russian 2nd. ed.), Wiley : New York (1974). [22] T. Chen, \Approximation to Nonlinear Functionals in Hilbert Space by Superposition of Sigmoidal Functions," Kexue Tongbao (1992). [23] J. Park and I.W. Sandberg, \Approximation and Radial-Basis-function Networks," Neural Computation, Vol 5 (1993). [24] T. Chen and H. Chen, \L (Rn) Approximation by RBF neural Networks," Chinese Annals of Mathematics, (in press). 2
[25] W.Rudin, Functional Analysis, McGraw-Hill, New York, 1973.
Appendix Proof of The Three Propositions in the Proof of Theorem 4: We will prove the three propositions individually as follows.
23
1. For a xed k, let u(jk), j = 1; 2; : : :; be a sequence in Vk and u(j ) be the sequence in V , such that nX (k ) u(j ) (xj )Tk ;j (x) : (36) u(jk) = j =1
Since V is compact, there is a subsequence u(jl ) (x) which converges to some u 2 V , it then follows that u(jkl ) (x) converges to uj (x) 2 Vk , which means that Vk is a compact set in C (K ). 2. By the de nition and the property of unity partition, we have u(x)
? un k (x) = (
)
=
nX (k ) j =1
[u(x) ? u(xj )]Tk ;j (x) X [u(x) ? u(xj )]Tk ;j (x) :
kx?xj kX k
Consequently,
ku(x) ? un k (x)kX k (
)
nX (k ) j =1
Tk ;j (x) = k
(37)
(38)
for all u 2 V . j 1 j 1 3. Suppose fuj g1 j =1 is a sequence in V . If there is a subsequence fu l gl=1 of fu gj =1 with all ujl 2 V , l = 1; : : : ; then by the fact that V is compact, there is a subsequence j of fujl g1 l=1 which converges to some u 2 V . Otherwise, to each u , there corresponds j a positive integer k(j ) such that uj = vn(k j ) . There are two possibilities: (i) We can nd in nite jl such that ujl 2 Vk for some xed k0 . By proposition 1. proved in this Appendix, Vn(k ) is a compact set, hence there is a subsequence of fvnj (k j ) g which converges to some v 2 Vk , i.e. a subsequence of fuj g converging to v . (ii) There are sequences j1 < j2 < : : : ! 1 and k(j1 ) < k(j2) < : : : ! 1 such that ujl 2 Vk j . Let v jl 2 V be such that l ( )
0
0
( )
0
(
)
ujl (x) =
n(X k(jl ) ) i=1
v jl (xi)Tk(j );i (x) :
l
(39)
Since V jl 2 V and V is compact, we observe that there is a subsequence of fv jl g1 l=1, which converges to some v 2 V . By the proposition 2. proved in this Appendix, the corresponding subsequence of fujl g1 l=1 also converges to v . Thus the compactness of V is proved.
24
. . .
. . .
. . .
NETWORK INPUTS
NETWORK OUTPUTS
KERNEL NODES
Figure 1: A Radial Basis Function Network
25