Appeared in IEEE Transactions on Neural Networks, Nov. issue, 1993.
Approximations of Continuous Functionals by Neural Networks with Application to Dynamical Systems
Tianping Chen1 and Hong Chen2
Abstract The main concern of this paper is to give several strong results on neural network representation in an explicit form. Under very mild conditions, a functional de ned on a compact set in C [a; b] or Lp [a; b], spaces of in nite dimensions, can be approximated arbitrarily well by a neural network with one hidden layer. In particular, if U is a compact set in C [a; b], is a bounded sigmoidal function, and f is a continuous functional de ned on U , then for all P P N m u 2 U , f (u) can be approximated by i=1 ci ( j =0 i;j u(xj ) + i ), where ci, ij , i are real numbers. u(xj ) is the value of u evaluated at point xj . These results are a signi cant development beyond the original works (see [1-9]), where theorems of approximating continuous functions de ned on Rn , a space of nite dimension by neural networks with one hidden layer were given. Finally, all the results are shown applicable to the approximation of the output of dynamical systems at any particular time. Key words: Approximation theory, neural networks, dynamical systems, compact set, convex set, functional. The author is with the Department of Mathematics, Fudan University, Shanghai, P.R.China. The author is with the Department of Electrical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, USA. 1 2
1 Introduction The problem of approximating a function of several variables by neural network has been studied by many authors. In 1987, Wieland and Leighton dealt with the capability of networks consisting of one or two hidden layers. Miyake (1988) obtained an integral representation formula with an integrable kernel xed beforehand. This representation formula is a kind which would be realized by a three-layered neural network. In 1989, several papers related to this topic appeared. They all claimed that a three-layered neural network with sigmoid units on the hidden layer can approximate continuous or other kinds of functions de ned on compact sets in Rn. They used dierent methods. Carrol and Dickinson used the inverse Radon transformation. Cybenko used the functional analysis method, combining the Hahn-Banach theorem and Riesz representation theorem. However, his proof is existential. Funahashi approximated Irie and Miyake's integral representation by a nite sum, using a kernel which can be expressed as a dierence of two sigmoid functions. Hornik et al. applied the Stone-Weierstrass theorem, using trigonometric functions, where their approximations were not only in the uniform topology on a compact set, but also in the
-topology. However, the latter can be attained if the uniform approximation can be attained in any compact set, because the uniform convergence topology is stronger than -topology. Recently [9], we gave a constructive approach of proving the above result, and proved that instead of the continuity of (x), a sucient condition for Cybenko's theorem to be true is the boundedness of (x). Moreover, if (f1(x); : : :; fq (x)) is a 2
continuous map from [0; 1]n to Rq, then for any > 0, there exist N , cj , i 2 R,
yj 2 Rn, cj;k = cj (fk ) 2 R, j = 1; : : :; N , k = 1; : : : ; q, such that
jfk (x) ?
N X
j =1
cj;k (yj x + j )j <
(1)
for all x 2 [0; 1]n, and f1; : : : ; fq , where x y is the inner-product of x and y. This type of approximation theorem is useful in the theory and application of arti cial neural networks, since many types of neural networks are formed from compositions and superpositions of one simple nonlinear activation function. A nontrivial but simple class of neural networks are those with one hidden layer and they exactly implement the set of functions given by N X j =1
cj (yj x + j )
(2)
Approximation of functions by neural networks is not only interesting and meaningful in pure and applied mathematics, but also useful in engineering and physical sciences, where such approximations have found wide applications in areas such as system identi cation, modeling and realization, signal decomposition and generation, pattern classi cation, adaptive lering, etc. Theoretically, the aforementioned result not only settles a long-standing question on the realizability of C ([0; 1]n) by a single hidden layer feedforward neural network, but also is an alternative substitution to Kolmogorov's well-known resolution to Hilbert's 13th Problem. In a famous paper, Kolmogorov proved that for any continuous function de ned on [0; 1]n, there is the following representation
f (x1; : : :; xn) =
2n X
q=0
g(1q (x1) + : : : + n q (xn)) 3
(3)
where g, i(x), i = 1; : : : ; n are functions of a single variable, 0 i 1, i = 1; : : :; n, and 0 q (t) 1, t 2 [0; 1], q = 1; : : : ; n. However, the construction of g and q is very complicated. Cybenko's theorem shows that every continuous function de ned on [0; 1]n can be approximated within any prescribed error by nite linear combination as in (2) where is a very simple univariate function. All these works are concerned with approximation to continuous functions de ned on a compact set in Rn (a space of nite dimensions). However, in practice we often encounter situations where we need to compute functionals de ned on some set of functions (a space of in nite dimensions). For example, the output of a dynamical system at any particular time can be viewed as a functional (see Example 1 in Section 3). Thus it is of great importance to discuss the problem of approximation to nonlinear functionals by neural networks. This is the main motivation and concern of our paper. Recently, Sandberg did important work [11] and obtained interesting approximation theorems for discrete-time dynamical systems. Despite the restriction to the discrete-time case, his work began to reveal the possibility and eectiveness of using neural networks (with so-called sigmoidal nonlinearity, as will be introduced shortly, or more general nonlinearity) in approximating dynamical systems. The further treatment (including the more general continuous-time systems) of this topic, however, remains still unclear. Especially, we ask: can we give any result with a form as explicit as in (2) ?
This paper is organized as follows, we rst concentrate on approximating continuous functionals by (single-hidden layer feedforward) neural networks. We will obtain 4
several strong results, which are of high interest in the research of neural network representation capability. Then, we study approximation of dynamical systems and will provide a uniform viewpoint and treatment to both continuous-time and discretetime systems. All the results presented in the rst part of this paper can be readily applied to the approximation of the outputs of dynamical systems (at any particular time). As one example, some of the results in [11] can be obtained from our results in this paper. Thus, ours are a signi cant generalization of those in [11].
2 Approximations of Continuous Functionals Let C [a; b] and Lp[a; b] denote the space of all continuous functions and p-th integrable functions on [a; b], with norms kf kC[a;b] = supx2[a;b] jf (x)j, and kf kLp[a;b] = R
( ab jf (x)jp dx)1=p respectively. Throughout this paper, unless otherwise speci ed, we shall always let 1 < p < 1. A set U in C [a; b] is called compact if for any sequence fn 2 U , there exists a function f in U and a subsequence fnk of fn, such that kf ? fnk kC[a;b] ! 0. Similarly, we can de ne a compact set in Lp[a; b]. If (): R ! R satis es
(
x ! +1 ; (x) ! 10 as (4) as x ! ?1 : then we call () a generalized sigmoidal function. It is worth noting that all monotone increasing sigmoidal functions belong to this class. Moreover, continuity of (x) is not required in this de nition or in later theorems. The main results of this paper are as follows. 5
2.1 Main Results Theorem 1 Suppose that U is a compact set in Lp[a; b] (1 < p < 1), f is a continuous functional de ned on U , and (x) is a bounded generalized sigmoidal function, then for any > 0, there exist h > 0, a positive integer m, m + 1 points
a = x0 < x1 < < xm = b, xj = a + j (b ? a)=m, j = 0; 1; : : : ; m, a positive integer N and constants ci, i, i;j , i = 1; : : : ; N , j = 0; 1; : : : ; m, such that
jf (u) ?
Z xj +h m X ci ( i;j 21h x ?h u(t) dt + i)j < j j =0 i=1
N X
(5)
holds for all u 2 U . Here it is assumed that u(x) = 0, if x 62 [a; b].
Theorem 2 Suppose that U is a compact set in C [a; b], f is a continuous functional de ned on U , and (x) is a bounded generalized sigmoidal function, then for any > 0, there exist m + 1 points a = x0 < < xm = b, a positive integer N and constants
ci, i, i;j , i = 1; : : :; N , j = 0; 1; : : : ; m, such that
jf (u) ?
N X i=1
m X
ci ( i;j u(xj ) + i)j < ; j =0
8u 2 U :
(6)
Suppose that V = Rq , Qnk=1 [ak; bk ] is a rectangle in Rn, and CV (Qnk=1[ak ; bk]) stands for the set of all continuous maps G(x1; : : : ; xn) = (g1(x1; : : :; xn), : : :, gq (x1, : : :, xn)) de ned on Qnk=1[ak ; bk ], taking values in V , that is, each gl(x1; : : :; xn) is continuous in Qnk=1 [ak; bk ], l = 1; : : :; q. Moreover, if G, F 2 CV (Qnk=1 [ak; bk ]), de ne v u q uX (G; F )C = t kgl(x1; : : :; xn) ? fl(x1; : : : ; xn)k2C : l=1
6
(7)
Similarly, let LpV (Qnk=1 [ak; bk ]) denote all the mappings (g1; : : :; gq ), where each gl (x1,
: : :, xn) is p-th integrable over Qnk=1 [ak; bk ]. If G, F 2 LpV (Qnk=1[ak ; bk ]), de ne v u q uX (G; F )Lp = t kgl(x1; : : :; xn) ? fl(x1; : : : ; xn)k2Lp : l=1
(8)
Theorem 3 Suppose that U is a compact set in LpV (Qnk=1 [ak; bk ]) (1 < p < 1), f is a continuous functional on U , and (x) is a bounded generalized sigmoidal function, then for any > 0, there exist (m + 1)n points (xj11 ; : : :; xjnn ), xjkk = ak + jk (b ? a)=m,
jk = 0; 1; : : : ; m, k = 1; : : : ; n, a positive integer N and constants ci, i and q (m + 1)n -vectors i, such that
jf (u) ?
N X i=1
ci (i uq;n;m + i)j < ;
8u 2 U ;
(9)
where uq;n;m are q (m + 1)n -vectors obtained by replacing ul (xj11 ; : : :; xjnn ) in uq;n;m R R by ( 21h )n xjkk ?hxk xjkk +h ul(x1; : : : ; xn ) dx1 dxn.
Theorem 4 Suppose that U is a compact set in CV (Qnk=1[ak ; bk ]), f is a continuous functional de ned on U , and (x) is a bounded generalized sigmoidal function, then for any > 0, there exist (m + 1)n points (xj11 ; : : :; xjnn ), xjkk = ak + jk (b ? a)=m, jk =
0; 1; : : : ; m, k = 1; : : : ; n, a positive integer N and constants ci, i and q (m + 1)n vectors i , such that
jf (u) ?
N X i=1
ci (i uq;n;m + i)j < ;
8u 2 U ;
(10)
where uq;n;m = (ul(xj11 ; : : : ; xjnn )), l = 1; : : :; q, jk = 0; 1; : : : ; m, k = 1; : : : ; n are
q (m + 1)n -vectors. 7
2.2 Remarks We rst explain the signi cance of these theorems by the following remarks.
Remark 1: Most of the papers published (see [1-9]) discuss the problem of approximating a continuous function de ned on some compact subset in Rn (a space of nite dimensions). Instead, this paper discusses approximation to continuous functionals de ned on some compact subset in some space of functions (a space of in nite
dimensions), thus making the problem much more dicult and complicated than that in the nite dimensional case.
Remark 2: Our above theorems not only solve the representation capability by neural networks, but also give an explicit form of the approximant. These results can not be obtained by the Stone-Weierstrass theorem, which forms the basis of several papers. It is known that the Stone-Weierstrass theorem is existential and gives no explicit form for the approximant.
Remark 3: Let a = t1 < t2 < : : : < tn = b, ti = a + (i ? 1) nb??a1 , then for any point x = (x1; : : :; xn ) 2 [0; 1]n , de ne a function ux(t) as follows xi (t ? t ); ux(t) = xi + xti+1 ? i ?t i+1
ti t ti+1;
i
i = 1; : : : ; n ? 1
(11)
which is a piecewise linear function taking values xj at point tj . Let U be the set of all these functions de ned above. It is easy to verify that there is a one-to-one mapping between U and [0; 1]n. Moreover, for any continuous function
f (x1; : : :; xn) de ned on [0; 1]n , there corresponds a unique functional f (u) de ned 8
on U , which is a compact set in C [a; b]. By Theorem 2, f (x1; : : :; xn) = f (ux) can
be approximated by P ci(Pmj=1 ij u(tj ) + j )=P ci(Pmj=1 ij xj + j ), for ux(ti) = xi,
which can be re-written as P ci( x + i), where i = (i1; : : : ; in), x = (x1; : : :; xn). The previous argument shows that the results of approximation by neural networks in Rn can be viewed as a special case of our Theorem 2, when all functions in U are piecewise linear functions with n knots. However, in general U can be an arbitrary compact set in C [a; b], which is much more complicated than those piecewise linear functions. It is dealing with these more general situations that constitutes the main contribution of this paper.
2.3 Proofs of Main Results Prior to the proof, we rst give a sketch as a road map. For the case of Lp space,
Step 1: Find an h > 0 small enough, such that k 21h R?hh u(x + t) dt ? u(x)kLp[a;b] is uniformly small for all u 2 U . If U is a convex compact set in Lp[a; b], then for the xed h, Uh = fuh(x) : uh(x) = 21h R?hh u(x + t) dt; u 2 U g is a convex compact set in both Lp[a; b] and C [a; b].
Step 2: On Uh , de ne a functional f~, such that jf (u) ? f~(uh )j is uniformly small for all u 2 U . Step 3: Find an integer m and points a = x0 < x1 < : : : < xm = b, such that the piecewise linear interpolation uh;m of uh at points xi, i = 0; 1; : : : ; m satis es 9
kuh;m ? uhkC[a;b] being uniformly small for all uh 2 Uh.
Step 4: De ne f~ on Uh;m = fuh;m : uh;m is piecewise linear and uh;m(xi) = uh(xi)g, such that jf~(uh;m) ? f~(uh)j is uniformly small for all uh 2 Uh. Step 5: At this point, f~ is a functional de ned on Uh;m , which can be viewed as a space of dimension m, and the proof will be completed by using the result in nite dimensional case. For the case of C [a; b], we just modify the previous procedure. The following lemmas will work out the details.
Lemma 1 U is a compact set in C [a; b], if and only if U is a closed set and (1) All functions in U are uniformly bounded, i.e. for all x 2 [a; b], ju(x)j M , for any u 2 U . (2) These functions are equicontinuous, i.e. for any > 0, there exists > 0, such that for every pair of points x0; x00 2 [a; b] and jx0 ?x00j < , then ju(x0)?u(x00)j < holds for every u 2 U . U is a compact set in Lp[a; b] (1 < p < 1), if and only if U is a closed set and
(1) There is a constant M such that kukLp[a;b] M , for all u 2 U ; (2) For any > 0, there exists an h0 > 0 such that if h < h0, then k 21h
Zh
?h
u(x + t) dt ? u(x)kLp[a;b] < 10
(12)
holds for all u 2 U , where u(x) = 0 if x 62 [a; b]. Proof. See [12].
Let U c be the convex hull of U , that is, U c = closurefu : u = Pki=1 i ui, ui 2 U ,
0 i 1, Pki=1 i = 1g. Obviously, U c is a compact set, whenever U is compact. On U c we can de ne a continuous functional, which is an extension of f . Therefore, from the very beginning, we will assume that U is a compact convex set. Because f is a continuous functional on a compact set U in Lp[a; b], there exists a > 0 such that if ku1 ? u2kLp[a;b] < , then jf (u1) ? f (u2)j < =6. By taking
= min(=6; ), it can be assumed that =6. For the xed mentioned above, by Lemma 1, there exists an h > 0 such that Zh 1 k 2h ?h u(x + t) dt ? u(x)kLp[a;b] < =2 ; 8u 2 U : (13) De ne Uh = fuh(x) : uh(x) = 21h
Rh
?h u(x + t) dt; u 2 U g.
It is easy to verify that
Uh is also a convex compact set in Lp[a; b], whenever U is convex and compact. On Uh , de ne a new functional f~ as follows: for all uh 2 Uh,
f~(uh) = f (v) + kuh ? vkLp[a;b] ;
(14)
where v is the unique function in U such that
kuh ? vkLp[a;b] = min ku ? wkLp[a;b] : w2U h
(15)
Lemma 2 Functional f~ de ned by equation (14) makes sense for every uh 2 Uh, and is uniformly continuous on Uh .
11
Proof. It is well known that Lp [a; b] (1 < p < 1) is a strictly convex space, i.e. for
any g1 6= kg2, 0 1; 2 1, 1 + 2 = 1, then k1g1 + 2 g2kLp[a;b] < 1kg1kLp[a;b] +
2kg2kLp[a;b]. Moreover, if U is a convex compact set in Lp[a; b], then for uh 2 Uh, there is a unique v such that
kuh ? vkLp[a;b] = min ku ? wkLp[a;b] ; w2U h
(16)
due to the fact Lp[a; b] (1 < p < 1) is strictly convex, which indicates that f~ makes sense on Uh. Now, we will prove the continuity of f~. Suppose uh;n and u 2 Uh , limn!1 kuh;n ?
ukLp[a;b] = 0, then we claim that the corresponding vn and v de ned in (15) satisfy limn!1 kvn ? vk = 0. Otherwise, there is a subsequence of vn , say fvnk g, which converges to some v1 2 Lp[a; b], and v1 6= v. Then, by the de nition of v in (15), kuh ? vkLp[a;b] < kuh ? v1kLp[a;b]. Therefore, there exists > 0, such that
kuh ? vkLp[a;b] + 2 < kuh ? v1kLp[a;b] ? :
(17)
However, kuh;n ? uhkLp[a;b] ! 0, kvnk ? v1kLp[a;b] ! 0. Therefore, for suciently large
k, we have
kuh;nk ? vkLp[a;b] + < kuh ? vkLp[a;b] + 2 < kuh ? v1kLp[a;b] ? < kuh;nk ? vnk kLp[a;b] + ;
(18)
which means kuh;nk ? vkLp[a;b] < kuh;nk ? vnk kLp[a;b], contradicting the de nition of vnk by (15). Therefore, the map uh ! v de ned by (15) is a continuous map from Uh 12
to U . Furthermore, if uh;n ! uh , then vn ! v, f (vn) ! f (v), kuh;n ? vnkLp[a;b] ! kuh ? vkLp[a;b], i.e. f~ is a continuous functional de ned on Uh . Since Uh is compact,
f~ is uniformly continuous. Lemma 2 is proved.
We now estimate f~(uh) ? f (u) = f (v) ? f (u) + kuh ? vkLp[a;b]. According to the de nition of uh and v, kuh ? ukLp[a;b] < =2, kuh ? vkLp[a;b] kuh ? ukLp[a;b] < =2. Consequently, ku ? vkLp[a;b] ku ? uhkLp[a;b] + kuh ? vkLp[a;b] < =6, which implies
jf (u) ? f (v)j < =6. Thus jf (u) ? f~(uh )j jf (u) ? f (v)j + kuh ? vkLp[a;b] < =3 :
(19)
From now on, instead of f , we will discuss the functional f~ de ned on Uh. For xed h, we claim that Uh is a uniformly bounded and equicontinuous set in
C [a; b]. In fact, Zh Zh 1 1 juh(x)j 2h ?h ju(x + t)j dt ( 2h ?h ju(x + t)jp dt)1=p Zb 1 1 =p ( 2h ) ( ju(x)jp dt)1=p ( 21h )1=p M a
(20)
The second inequality comes from the Jensen Inequality, and the last inequality comes from Lemma 1. Moreover, 1 Z x +h u(t) dt ? 1 Z x +h u(t) dtj 2h x ?h Z 2h x ?h Z x ? h x + h j 21h x ?h u(t) dtj + j 21h x +h u(t) dtj Z x ?h Z x +h 21h ( x ?h ju(t)jp dt)1=pjx00 ? x0j1=q + 21h ( x +h ju(t)jp dt)1=p jx00 ? x0j1=q Mh jx00 ? x0j1=q ; (21)
juh(x0) ? uh(x00)j = j
0
00
0
00
00
0
00
0
00
00
0
0
13
where (1=p) + (1=q) = 1, and the second inequality comes from Holder's Inequality. Therefore, Uh is uniformly bounded and equicontinuous; consequently, Uh is a compact set in C [a; b] as well as in Lp[a; b]. Now, for any > 0, there exist an integer m, and
m + 1 points a = x0 < x1 < < xm = b, xj = a + j (b ? a)=m, j = 0; 1; : : : ; m, such that for all uh 2 Uh , if jx0 ? x00j < b?ma , then juh(x0) ? uh(x00)j < . For the xed h and m, associated with all uh 2 Uh , we de ne a function
uh(xj ) (x ? x ); x x x ; j = 0; 1; : : : ; m ? 1; uh;m(x) = uh (xj )+ uh(xxj+1) ? j j j +1 j +1 ? xj (22) that is, uh;m(x) is piecewise linear and interpolates uh (x) at points xj , j = 0; 1; : : : ; m. (It is possible that there might be several uh 2 Uh corresponding to one uh;m(x)). For xed h and m, let Uh;m = fuh;m; u 2 U g, then it is clear that Uh;m is also a convex compact set in Lp[a; b] as well as in C [a; b]. Similar to the arguments made previously, on Uh;m, we de ne a new functional
f~(uh;m) = f (vh) + kuh;m ? vhkLp[a;b] ;
(23)
where vh is such that for all uh;m 2 Uh;m ,
kuh;m ? vhkLp[a;b] = wmin kuh;m ? whkLp[a;b] : h 2Uh
(24)
Similar to the proof of Lemma 2, we can prove
Lemma 3 The functional f~ de ned on Uh;m by (23) is continuous. Moreover, jf~(uh;m) ? f~(uh)j < =3 holds for all uh 2 Uh . 14
We need one more lemma.
Lemma 4 For xed h and m, the set Sh;m = fuh (x0); : : :; uh(xm); u 2 U g is a compact set in Rm+1 . Proof. The boundedness of Sh;m is a direct consequence of the boundedness of Uh
in Lp[a; b]. Next, if (uk;h(x0); : : : ; uk;h(xm)), uk 2 U , k = 1; 2; : : :, converges to (u0; : : : ; um), then there exists a subsequence ukj which converges to some u 2 U , since
U is compact. Therefore, (ukj;h (x0); : : :; ukj;h (xm)) converges to (uh(x0); : : : ; uh(xm)), which implies that (u0; : : : ; um) = (uh(x0); : : :; uh(xm)). Thus Sh;m is compact. Having established these Lemmas, we now proceed to prove Theorem 1.
Proof of Theorem 1. According to Lemmas 1-4 and previous arguments, for any > 0, there exist h > 0, and a functional f~ on Uh, such that jf (u) ? f~(uh)j < =3 for all u 2 U . For the xed h, there exist an integer m and a functional f~, such that jf~(uh) ? f~(uh;m)j < =3 for all u 2 U . Now, de ne a function g on Sh;m, by
g(uh;m(x0); : : : ; uh;m(xm)) = f~(uh;m) :
(25)
Because uh;m is piecewise linear, the fact that (ukh;m(x0); : : :; ukh;m(xm)) converges to (uh;m(x0); : : :; uh;m(xm)) in Sh;m implies kukh;m ? uh;m kLp[a;b] ! 0, thus f~(ukh;m) !
f (uh;m ), as k ! 1, which means g is continuous on Sh;m. By the well-known approximation theorem (see [3], [9]), there exist a positive 15
integer N and constants ci, i, i;j , i = 1; : : : ; N , j = 0; 1; : : : ; m, such that
jg(uh;m(x0); : : :; uh;m(xm)) ? Summing up, we obtain
jf (u) ? or
N X i=1
m X
ci ( i;j uh;m(xj ) + i)j < =3 : j =0
Z xj +h m X ci ( i;j 21h u(t) dt + i)j < ; xj ?h j =0 i=1
N X
jf (u) ?
N X i=1
ci (i uh + i)j < ;
where i is the vector (i;0; : : :; i;m), uh = ( 21h
R x0 +h x0 ?h
u(t) dt; : : :; 21h
R xm +h xm ?h
(26) (27) (28)
u(t) dt). The
proof of Theorem 1 is thus complete. In order to prove Theorem 2, we need to modify the de nition process of the functional f~, because C [a; b] is not strictly convex. Since U is a compact set in C [a; b], for all > 0, there is a positive integer m, such that for all u 2 U , for all x0; x00 2 [a; b], if jx0 ? x00j < bm?a , then ju(x0) ? u(x00)j < =2.
Let xj = a + j (b ? a)=m, j = 0; 1; : : : ; m, for this xed m, we de ne the function u(xj ) (x ? x ) ; x x x ; j = 0; 1; : : : ; m ? 1 : um(x) = u(xj )+ u(xxj+1) ? j j j +1 j +1 ? xj (29) It is clear that ku(x) ? um(x)kC[a;b] < . Let Um = fum; u 2 U g, and on Um we de ne a functional f~ by
f~(um) = f (v) + kum ? vkC[a;b] ;
(30)
kum ? vkLp[a;b] = min ku ? wkLp[a;b] ; w2U m
(31)
where v is determined by
16
(here v is the nearest function in U from um, in Lp[a; b], not in C [a; b]).
Lemma 5 Suppose that U is a compact set in C [a; b], u 2 U , un 2 U , n = 1; 2; : : :, then ku ? un kC[a;b] ! 0 if and only if kun ? ukLp[a;b] ! 0. Proof. It is obvious that ku ? unkC[a;b] ! 0 implies ku ? un kLp[a;b] ! 0. Now,
assuming ku ? unkLp[a;b] ! 0, we will prove ku ? unkC[a;b] ! 0. Suppose that ku ?
unkC[a;b] 6! 0, then there exist two subsequences unk1 and unk2 which converge to v1, v2 in C [a; b], respectively, since U is compact in C [a; b]. Thus, kunk1 ? v1kLp[a;b] ! 0, kunk2 ? v2kLp[a;b] ! 0, which, combined with the assumption that ku ? unkLp[a;b] ! 0, leads to the fact u = v1 = v2 a.e., a contradiction. Thus, the proof of Lemma 5 is complete.
Lemma 6 Functional f~ de ned by (30) is continuous on Um . Proof. If um 2 Um and uk;m 2 Um, k = 1; 2; : : : and kuk;m ? umkC[a;b] ! 0, then
kuk;m ? umkLp[a;b] ! 0. Consequently, vk, v 2 U determined by (31) corresponding to uk;m, um respectively, satisfy kvk ? vkLp[a;b] ! 0. By Lemma 5, kvk ? vkC[a;b] ! 0. Because Um is a compact set in C [a; b], we have for all > 0, there is an > 0 such that for all u0m; u00m 2 Um , ku0m ? u00mkC[a;b] < implies kv0 ? v00kC[a;b] < . By the arguments similar to those used in Lemma 5, we conclude that jf~(um) ? f (v)j < =2, for all um 2 Um .
Lemma 7 The set Sm = fum(x0); : : : ; um(xm); u 2 U g is a compact set in Rm+1. 17
Proof. The proof is the same as that of Lemma 4.
Having established these lemmas, we proceed to prove Theorem 2.
Proof of Theorem 2. Similar to the proof of Theorem 1, we de ne a function on Sm by
g(u(x0); : : : ; u(xm)) = f~(um) :
(32)
Because um(x) is piecewise linear, the fact that (uk (x0); : : :; uk (xm)) converges to (u(x0); : : : ; u(xm)) in Rm+1 implies kuk;m(x) ? um(x)kC[a;b] ! 0, which means g is a continuous function on Sm . Therefore, for any > 0, there exist N , ci, i, i;j ,
i = 1; : : : ; N , j = 0; 1; : : : ; m, such that
jg(um(x0); : : : ; um(xm)) ?
N X i=1
m X
ci ( i;j u(xj ) + i)j < =2 : j =0
(33)
Combining it with jf (u) ? f~(um)j < =2, we obtain
jf (u) ? or
N X i=1
jf (u) ?
m X
ci ( i;j u(xj ) + i)j < ; j =0
N X i=1
ci (i um + i)j < ;
(34) (35)
where i = (i;0; : : :; i;m); um = (u(x0); : : :; u(xm)). Therefore, the proof of Theorem 2 is complete. The proofs of Theorems 3 and 4 proceed in the similar line to those in the proofs of Theorems 1 and 2. The only places that need changing are:
(1) Instead of the linear interpolant appearing in (22) and (29), we use multilinear interpolant; 18
(2) Instead of directly using Cybenko's Theorem, we use (1), which is obtained from [9].
Remark 4. In [9], we pointed out that for any function (continuous or discontinuous) !(x), if the linear combinations P ci !(i x + i), where ci, i, i 2 R, are dense in any C [a; b], then the linear combinations P ci !(i x + i) are dense in C (K), where ci, i, 2 R, i, x 2 Rn and K is a compact set in Rn . Combining this fact with the proof of our theorems in this paper, we conclude that if P ci !(i x + i) are
dense in any C [a; b], then all our theorems remain valid, if the sigmoidal function is replaced by !. Therefore, in order to approximate nonlinear functionals using neural networks with a nonlinear activation function !, what we need to do is to prove the P denseness of ci !(i x + i) in any C [a; b].
Among them there exist many functions that occur in approximation theory. For example, Schoenberg Cardinal Splines, B-Splines, wavelets, etc., all satisfy the conditions imposed on !. Consequently any of them can be used to replace the generalized sigmoidal functions in our previous theorems. The details are not elaborated here.
3 Application to Dynamical Systems As an application, we discuss approximation of dynamical systems. Sandberg [11] made important contribution to approximation of discrete-time dynamical systems. Here, we consider approximation of dynamical systems as approximation of continuous functionals de ned on a compact set (of input functions). By doing so, we are 19
able to signi cantly generalize Sandberg's result and provide a uniform viewpoint and treatment to both continuous-time and discrete-time systems. The main results obtained earlier in this paper can be readily applied to the approximation of the output of dynamical systems at any particular time3. First, we introduce some notations and de nitions, which come basically from [11] and [10] with slight variation. Suppose that X1 (or X2) stands for the set of Rq1 -valued (or Rq2 -valued) functions de ned in Rn. A dynamical system G can be viewed as a map from X1 to X2, that is, for all
u 2 X1, Gu = v 2 X2. Let x 2 X , de ne a \windowing" operator W by (
(W;ax)( ) = x(0 )
if 2 ?;a ; if 62 ?;a ;
(36)
where = (1; : : :; n ) 2 Rn, ?;a = fr = (r1; : : : ; rn) 2 Rn; jrj ? j j a; 8j = 1; : : : ; ng, that is, W;a is a \windowed version" of x with the (n-dimensional) window centered at and its width 2a. If U is a nonempty set in X1 , de ne U;a = fuj?;a ; u 2 U g, where uj?;a is the restriction of u to ?;a, that is, uj?;a = W;au. A map G from X1 to X2 is said to be of approximately nite memory, if for all 3
We are not yet approximating the whole system transfer function at this time.
20
> 0, there is an a > 0 such that
j(Gu)j () ? (GW;au)j ()j < ;
j = 1; : : : ; q2
(37)
holds for any 2 Rn , u 2 U . For each 2 Rn, de ne T : X1 ! X1 to be the (shift) operator given by (T x)() = x( ? ) for all 2 Rn. A map from X1 ! X2 is shift invariant if (GT u)() = (Gu)( ? ) for any pair (; ), 2 Rn , 2 Rn, u 2 X1 . We assume that the entire set U (the domain of G, in which we deal with the approximation problem) satis es
(1) If u 2 U , then uj?;a 2 U for any 2 Rn; a > 0; (2) for all 2 Rn, a > 0, U;a is a compact set in CV (Qnk=1 [k ? ak ; k + ak ]) or a compact set in LpV (Qnk=1[k ? ak ; k + ak ]), where V stands for Rq ; 1
(3) Let (Gu)() = ((Gu)1(); : : : ; (Gu)q ()), then each (Gu)j () is a continuous functional de ned over U;a, with the corresponding topology in CV (Qnk=1[k ? ak ; k + ak ]) or LpV (Qnk=1 [k ? ak; k + ak ]). 2
Theorem 5 If U and G satisfy all the assumptions (1)-(3) made above, and G is of approximately nite memory, then for any > 0, there exist a > 0, a positive integer m, (m +1)n points in Qn [ ? a ; + a ], a positive integer N , constants c (G; ; a) k=1 k
k
k
k
i
depending on G, , a only, and q2 (m + 1)n -vectors i , i = 1; : : : ; N , such that
j(Gu)j () ?
N X i=1
ci(G; ; a) (i uq1;n;m + i)j < ; 21
j = 1; 2; : : : ; q2 ;
(38)
or
j(Gu)j () ?
N X i=1
ci(G; ; a) (i uq1;n;m + i)j < ;
j = 1; 2; : : : ; q2 ;
(39)
where uq1;n;m and uq1 ;n;m are the same vectors as de ned in Theorem 3 and Theorem
4, and (x) is any bounded generalized sigmoidal function. Proof. Because G is of approximately nite memory, for all u 2 U we can nd
W;au 2 U;a such that
j(Gu)j () ? (GW;au)j ()j < ;
j = 1; 2; : : : ; q2 :
(40)
Applying Theorems 3 and 4 to GW;a yields the desired result.
Remark 5. Let N and N + denote f0; 1; : : :g and f1; 2; : : :g respectively. Let S denote the metric space of all maps from N (or N +) to the compact set E of Rp with the metric given by
(sa; sb) = sup ksa(t) ? sb (t)k ;
(41)
t2N
and R denotes the collection of all R-valued maps de ned on N . In [11], under the assumption (denoted as A.1) that G : S ! R is causal, time-invariant and of approximately- nite memory, (x) is a continuous generalized sigmoidal function, and G()(t) : S ! R is continuous for each t 2 N + , the following interesting theorem was obtained.
Theorem A (Sandberg[11]) For a 2 N + , let Ta : S ! Rp(a+1) be de ned by (Tas)(t) = [s(t); s(t ? 1); : : : ; s(t ? a)]tr ; 22
s2S ;
(42)
where
tr
denotes transpose, and if the condition A.1 above is satis ed, then for any
> 0, there exist m and a 2 N + , real numbers 1; : : :; m, 1; : : : ; m and real row vectors 1 ; : : :; m of order p(a + 1), such that
j(Gs(t)) ? for all s 2 S .
m X l=1
l [l(Tas)(t) + l ]j < ;
t2N
(43)
We point out that Theorem A is the discrete case of our Theorem 2. In fact, let ( if t ? a t ; (Wt;as)( ) = s(0 ) (44) otherwise, then G(Wt;as)(t) can be viewed as a continuous functional de ned on the compact set [s(t); : : :; s(t ? a)]tr, hence Theorem A can be obtained by the approximately- nite memory property and Theorem 2. That is, our result generalizes Sandberg's Theorem A. A graphical representation [11] of the approximation to (Gs)(t) is shown in Fig. 1. To illustrate the eectiveness of our theorems, we give some more examples.
Example 1: Suppose that the input u(x) and output s(x) = G(u(x)) of a nonlinear system G, are subject to the following dierential equation d s(x) = g(s(x); u(x); x); s(a) = s0 (45) dx where g(v; w; x) satis es Lipschitz condition with respect to variables v and w, i.e. there is a constant c > 0 such that
jg(v; w; x) ? g(v0; w; x)j cjv ? v0j 23
(46)
jg(v; w; x) ? g(v; w0; x)j cjw ? w0j
(47)
Moreover, we assume that the dierential equation has a unique solution for any
u(x) 2 C [a; b]. Under these assumptions, we have (Gu)(x) = s0 +
Zx a
g((Gu)(t)); u(t); t) dt
(48)
If we are given two inputs, u1(x) and u2(x), then for any xed d 2 [a; b], we have
j(Gu1)(d) ? (Gu2)(d)j
Zd
c c
aZ
jg((Gu1)(t); u1(t); t) ? g((Gu2)(t); u2(t); t)j dt d
Zad a
j(Gu1)(t) ? (Gu2)(t)j dt + c ju1(t) ? u2(t)j dt + c
Z dZ v a a
Zd a
ju1(t) ? u2(t)j dt
ju1(t) ? u2(t)jet?v dt dv
(49)
The last inequality comes from generalized Gronwall Inequality. From Inequality (49), we conclude that (Gu)(d) is a continuous functional de ned on C [a; b]. If the input set U is a compact set in C [a; b] (also in C [a; d]), then our previous Theorem 2 shows that the output of the nonlinear system G at a speci ed time d can be approximated by
N X i=1
m X
ci( ij u(xi) + i)
where a = x0 < x1 < < xm = d.
j =1
Example 2: For a Boolean function f?1; 1gn ! f?1; 1g, we de ne
U = fu : u(x) = u(j ) + [u(j + 1) ? u(j )](x ? j ); if j x j + 1; 24
(50)
u(j ) = 1 or ? 1; j = 1; : : :; n ? 1; u(n) = 1 or ? 1g (51) which is a compact set in C [1; n]. Moreover, there is a one-to-one correspondence between f?1; 1gn and U , and every Boolean function b 2 B corresponds to a (continuous) functional de ned on U , thus b can be approximated by N X i=1
n X
ci( ij u(j ) + i) : j =1
(52)
In this case, we can take N = 2n neurons with their 2n weights being (i1; : : :; in ) = (ki1; : : :; kin ), where k is a positive real number depending on the threshold value, (i1; : : :; in ), i = 1; 2; : : : ; 2n are the 2n distinct elements in f?1; 1gn , which means that we can use 2n neurons to identify a Boolean function f?1; 1gn ! f?1; 1g, as illustrated in Fig. 2.
4 Conclusion In this paper, several strong approximation theorems concerning neural network representation capability have been obtained. The \generalized sigmoidal" basis functions that exist in many (nonlinear) neural networks are used to approximate continuous functionals de ned on spaces C [a; b] or Lp[a; b] (1 < p < 1). As discussed, many other functions, instead of sigmoidal functions, can also be used without the loss of the validity of the results in this paper. Some application to dynamical systems has been reported, including showing that an earlier result in [11] can be obtained from our theorems. Applying the method used in this paper, we can also discuss 25
approximations of continuous functionals in more general topological spaces, which will be reported later.
Acknowledgements. The authors wish to express their gratefulness to the anonymous reviewers for their valuable comments and suggestions on revising this paper. They also wish to thank Prof. R.-W. Liu of University of Notre Dame, for bringing some of the papers in this area to their attention.
References [1] A. Wieland and R. Leighten, \Geometric analysis of neural network capacity", IEEE First ICNN. 1, pp. 385-392 (1987). [2] B. Irie and S. Miyake, \Capacity of three-layered perceptrons", IEEE ICNN 1, pp. 641-648, (1988). [3] G. Cybenko, \Approximation by superpositions of a sigmoidal function", Mathematics of Control, Signals and Systems, Vol. 2, No. 4, pp. 303-314 (1989). [4] S. M. Carroll and B. W. Dickinson, \Construction of neural nets using Radon transform", IJCNN Proc. I, pp. 607-611 (1989). [5] K. Hornik, M. Stichcombe and H. White, \Multi-layer feedforward networks are universal approximators", Neural Networks, Vol. 2, pp. 359-366 (1989). [6] K. Hornik, \Approximation capabilities of multilayer feedforward networks", Neural Networks, Vol. 4, pp. 251-257 (1991). 26
[7] V. Y. Kreinovich, \Arbitrary nonlinearity is sucient to represent all functions by neural networks: a theorem", Neural Networks, Vol. 4, pp. 381-383 (1991). [8] Yoshifusa Ito, \Representation of functions by superpositions of a step or sigmoidal function and their applications to neural network theory", Neural Networks, Vol. 4, pp. 385-394 (1991). [9] Tianping Chen, Hong Chen and Ruey-wen Liu, \A constructive proof of Cybenko's approximation theorem and its extensions", pp. 163 - 168 in Computing Science and Statistics (editors LePage and Page), Proceedings of the 22nd Symposium on the Interface (East Lansing, Michigan, May 1990), Springer-Verlag ISBN 0-387-97719-8. [10] I. W. Sandberg, \Representation theory and nonlinear systems", Proc. Int. Conf. Integral Methods in Science and Engineering, Arlington, TX, May (1990). [11] I. W. Sandberg, \Approximation theorems for discrete-time systems", IEEE Trans. on Circuits and Systems, Vol. 38, No. 5, pp. 564-566, May (1991). [12] I. P. Natanson, \Theory of functions of a real variable" (= Teoria functsiy veshchestvennoy peremennoy), translated from the Russian by Edwin Hewitt, Rev. ed., New York, (1961).
27
OUTPUT : Approximation to (GS) (t)
Σ
c1
σ
.
Σ
ρ1
.
cN
.
.
.
.
.
.
.
.
.
.
.
.
σ
Σ
η
1n
η
11
η12
η
m1
ρm
η m,a+1
η
m2
.
s (t)
.
.
.
.
.
.
s (t -1)
s (t - a)
Time Delays
INPUT : s(t)
Figure 1: Graphical representation of approximation to (Gs)(t) OUTPUT : {-1, 1}
Σ
c1
σ
θ1 ξ11
u(1)
Σ
cN
.
.
.
.
.
.
.
.
.
.
.
.
.
.
σ
Σ
ξ1n
ξ 12
ξ n 2,1 u(2)
.
θ2n
ξ n
ξ n 2,
2
.
.
2,n
.
.
.
.
u(n)
n INPUT : {-1, 1}
Figure 2: Identi cation of Boolean functions by a neural network 28