Reconstruction of chaotic dynamics by on-line EM algorithm
Shin Ishiiy
Masa-aki Sato z
y Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0101, Japan (TEL) (+81)-743-72-5980
(FAX) (+81)-743-72-5989
(E-mail)
[email protected] z Advanced Telecommunication Research Institute International 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan (TEL) (+81)-774-95-1039
(FAX) (+81)-774-95-1259
(E-mail)
[email protected] to appear in Neural Networks Abstract In this paper, we discuss the reconstruction of chaotic dynamics by using a normalized Gaussian network (NGnet), which is a network of local linear regression units. The NGnet is trained by an on-line EM algorithm in order to learn the vector eld of the chaotic dynamics. We investigate the robustness of our approach to two kinds of noise processes: system noise and observation noise. System noise disturbs the system dynamics itself. Observation noise is added when the state variables are observed. It is shown that the trained NGnet is able to reproduce a chaotic attractor, which approximates the complexity and instability of the original chaotic attractor well, even under various noise conditions. The trained NGnet also shows good prediction performance. When only part of the dynamical variables are observed, the delay embedding method is used. The NGnet is then trained to learn the vector eld in the delay coordinate space. Experiments show that the chaotic dynamics embedded in the delay coordinate space is able to be learned under the two types of noise.
1
Reconstruction of chaotic dynamics 1
2
Introduction
One of the main topics in the eld of nonlinear dynamical systems is estimating dynamical characteristics and future behaviors of an unknown dynamical system from its observation. Especially when the system is chaotic, such inverse problems are dicult. Long-term characteristics of dynamical systems can be estimated by invariant measure [4, 10], Lyapunov exponents [11], correlation dimension [12], unstable periodic orbits [3], and so on. These estimations have been done by mimicing the dynamical system using some approximation methods. Such methods include neural networks [17, 4], Markov chain approximation based on triangulations [10, 11], and piecewise-linear approximators based on triangulations [2, 3]. On the other hand, the estimation of future behaviors is called short-term prediction [8, 28, 5, 13, 1], because the prediction for a long future period is impossible due to the sensitive dependency of chaotic dynamics on the initial condition [14]. This paper deals with both of the long-term dynamical characteristics and the short-term prediction, by using a neural network model and a stochastic learning algorithm. We [21, 22] formerly showed that a recurrent neural network (RNN) could learn continuous time chaotic dynamics from observed data. In these studies, we emphasized that long-time trajectory generation has its own validity. That is, once the learning system acquires the dynamical characteristics of the target chaotic dynamics from the observed data, a long-time trajectory can be generated that goes around the chaotic attractor. Although the time sequence of this trajectory may be dierent from the observed data, the dynamical characteristics of the synthesized trajectory can well approximate those of the observed data. As an application, we showed the possibility of natural sound synthesis by an RNN trained by observed natural sound waveforms. The RNN was able to synthesize a waveform with naturally correlated uctuations which made the sound natural [20]. In this paper, a normalized Gaussian network (NGnet) [18] is employed in order to approximate the discretized vector eld of a chaotic dynamical system. We will show that this new approach is able to learn chaotic dynamics faster and better than an RNN trained by the \back-propagation through time" algorithm [19]. The NGnet is a network of local linear regression units. The model softly partitions the input space by using normalized Gaussian functions, and each local unit linearly approximates the output within its partition. The NGnet is trained by the on-line EM algorithm, which we previously proposed [23]. It has been shown that this on-line EM algorithm is faster than the gradient descent algorithm. Moreover, the on-line EM algorithm is faster and more accurate than the batch EM algorithm [31] especially for a large amount of data [23, 25]. Therefore, it is advantageous to use the online EM algorithm for learning chaotic dynamics because a large amount data is necessary for approximating the vector eld accurately. In the on-line EM algorithm, the positions of the local units can be adjusted according to the input and output data distribution. Unit creation and unit deletion are also performed according to the data distribution. Since a chaotic attractor is usually distributed in a lower dimensional manifold than the input space, the NGnet is suitable for approximating the vector eld in the neighborhood of the attractor. For the short-term prediction, a local polynomial approximation is often used [8]. For a given data point, this method rst searches for a number of nearest neighbor points in the observed data. The target function in the local region is then approximated by using a polynomial function whose parameters are determined using the nearest neighbor points. The estimated
3
Reconstruction of chaotic dynamics
polynomial function has dierent parameters for each observed data point and the number of parameters in this method is proportional to the amount of the observed data. A large amount of data requires much memory and computation time in the prediction process. The NGnet, on the other hand, requires only a moderate amount of parameters even for a large amount of data, because each unit softly partitions the input space according to the data distribution. Therefore, the prediction can be performed quickly in our method. In addition, the learning and the prediction can be done concurrently under real time operation by using the on-line EM algorithm. In real world problems, observed variables often include noise. In our approach, the discretized vector eld is estimated from discretized time derivatives of the observed data. It is well known that the estimation of time derivatives from discretized data is very sensitive to noise, since the derivative operation emphasizes noise components. Hence, we investigate the robustness of our approach to two kinds of noise processes. They are system noise and observation noise. System noise disturbs the system dynamics itself. Observation noise, in contrast, is added when the state variables are observed. In order to measure the performance of our method, the dynamical characteristics of chaotic attractors generated by the NGnet trained under various noise conditions are compared with those of the original chaotic attractor. The results show that our method is able to mimic the chaotic dynamics even under the noise conditions. The trained NGnet also shows good performance for the short-term prediction. In many situations, part of the dynamical variables are often observed. This situation was also investigated in our former study [22], where the RNN was trained to learn the chaotic dynamics with hidden variables. Although the trajectory of the observed variables was reproduced fairly well, it remained unclear whether the dynamics of the trained RNN was equivalent to the original dynamics because of coordinate transformation ambiguity [22]. In order to avoid such problems, we employ a delay coordinate embedding [29, 26] in this study. The NGnet is trained to learn the discretized vector eld in the delay coordinate space. It is shown that the chaotic dynamics is able to be learned with this method in the delay coordinate space under noise conditions. An alternative learning method, in which the NGnet is trained to learn the mapping function in discretized time, is also investigated in this study. The performances of this method are then compared with those of the discretized vector eld method. 2
NGnet and on-line EM algorithm
In this section, we review the on-line EM algorithm for the NGnet, which was proposed in our previous paper [23]. The NGnet [18], which transforms an N -dimensional input vector x to a D-dimensional output vector y, is de ned by ! X G (x) ~ x~ y = W (1a) P G (x) N 1 1 0 0 0 0 2 2 G (x) (2) j6 j exp 0 (x 0 ) 6 (x 0 ) ; (1b) 2 M
i=1
i
i M j =1
i
j
i
i
i
1
i
4
Reconstruction of chaotic dynamics
where M denotes the number of units and the prime (0) denotes a transpose. G (x) is an N dimensional Gaussian function, which has an N -dimensional center vector and an N -by-N covariance matrix 6 . j6 j is the determinant of 6 . W~ (W ; b ) is a D-by-(N + 1) linear regression matrix, where x~0 (x0; 1) and W~ x~ W x + b . The NGnet can be interpreted as a stochastic model, in which a pair of an input and an output, (x; y), is a stochastic event. For each event, a unit index i 2 f1; :::; M g is assumed to be selected, which is regarded as a hidden variable. The stochastic model is de ned by the probability distribution for a triplet (x; y; i), which is called a complete event: # " D+N 0 1 1 0 1 0 0 0 0 ~ P (x; y; ij) = (2) 2 j6 j 2 M exp 0 (x 0 ) 6 (x 0 ) 0 2 2 jy 0 W x~j : (2) ~ j i = 1; :::; M g is the set of model parameters. A pair (x; y) is called an f ; 6 ; ; W Rincomplete event. Since the expectation value of output y for a given input x, i.e., E [y jx] yP (yjx; )dy, is identical to equation (1), the probability distribution (2) provides a stochastic model for the NGnet. From a set of T incomplete events (data) (X; Y ) f(x(t); y(t)) j t = 1; :::; T g, the model parameter of the stochastic model (2) can be determined by the maximum likelihood estimation method. In particular, the EM algorithm [6] can be applied to a stochastic model with hidden variables. The EM algorithm repeats the following E- and M-steps. E (Expectation) step: Let be the present estimator. By using , the posterior probability that the i-th unit is selected for each datum (x(t); y(t)) is calculated by P (x(t); y(t); ij) : (3) P (ijx(t); y (t); ) = P P (x(t); y (t); j j) Using this posterior probability, the expected log-likelihood Q(j; X; Y ) for the complete events is de ned by XX Q(j; X; Y ) = P (ijx(t); y (t); ) log P (x(t); y (t); ij): (4) M (Maximization) step: Since an increase of the expected log-likelihood (4) implies an increase of the log-likelihood for the data set (X; Y ) [6], the expected log-likelihood is maximized with respect to . A solution of the necessity condition @Q=@ = 0 is given [31] by = hxi (T )=h1i (T ) (5a) (5b) 60 = [hxx0i (T )=h1i (T ) 0 (T )0 (T )]0 ~W = hyx~0i (T )[hx~x~0i (T )]0 (5c) h i 1 hjyj i (T ) 0 Tr W~ hx~y0i (T ) =h1i (T ); (5d) = D i
i
i
i
i
i
i
i
i
2
i
D
i
i
i
i
1
i
i
i
i
1
i
2
i
i
2
i
M j =1
T
M
t=1 i=1
i
i
i
i
1
i
i
i
i
i
2
2
i
i
i
i
i
i
1
1
i
where Tr(1) is the matrix trace. h1i denotes a weighted mean with respect to the posterior probability (3) and it is de ned by X hf (x; y)i (T ) 1 f (x(t); y(t))P (ijx(t); y(t); ): (6) i
T
i
T
t=1
5
Reconstruction of chaotic dynamics
The EM algorithm introduced above is based on batch learning [31], namely, the parameters are updated after seeing all of the data. We introduce here an on-line version [23] of the EM algorithm. Let (t) be the estimator after the t-th datum (x(t); y(t)). In this on-line EM algorithm, the weighted mean (6) is replaced by T T Y X
hhf (x; y)iii(T ) (T ) ( (T )
0 @
t=1 s=t+1
T T Y X
(
t=1 s=t+1
(s))f (x(t); y (t))P (ijx(t); y(t); (t 0 1))
(7a)
101
(s))A ;
(7b)
(s) 1 is assumed. Parameter (t) 2 [0; 1] is a discount factor, which is where Q introduced in order to forget the eect of an earlier inaccurate estimator. is a normalization coecient, and it works as an eective learning coecient. For the present datum (x(t); y(t)), the E-step calculates the posterior probability P (t) P (ijx(t); y(t); (t 0 1)) by using equation (3). The M-step calculates the weighted means, hh1ii (t), hhxii (t), hhjyj ii (t), and hhyx~0ii (t), by the following step-wise equation: hhf (x; y)ii (t) = hhf (x; y)ii (t 0 1) + (t) [f (x(t); y(t))P (t) 0 hhf (x; y)ii (t 0 1)] (8a) (t) = (1 + (t)=(t 0 1))0 : (8b) Using the weighted means, the new parameters are obtained by (t) = hhxii (t)=hh1ii (t) (9a) " # 0 ~ ~ (t 0 1)~x(t)~x (t)3 (t 0 1) 3~ (t) = 1 01(t) 3~ (t 0 1) 0 (1=P((tt))03 1) (9b) + P (t)~x0(t)3~ (t 0 1)~x(t) ~ (t) = W~ (t 0 1) + (t)P (t)(y(t) 0 W~ (t 0 1)~x(t))~x0(t)3~ (t) W (9c) h i 1 hhjyj ii (t) 0 Tr W~ (t)hhx~y0ii (t) =hh1ii (t); (9d) (t) = D where 3~ (t) [hhx~x~0ii ]0 . 60 (t) can be obtained from the following relation with 3~ (t). ! 60 (t0) 060 (t) (t) ~3 (t)hh1ii (t) = (10) 00 (t)6 (t) 1 + 0 (t)60 (t) (t) : Due to the introduction of the auxiliary variable 3~ (t), it is not necessary to calculate the inverse covariance matrix, which is needed in equations (5b) and (5c) in the batch EM algorithm. It can be proved [23] that this on-line EM algorithm is equivalent to the stochastic approximation method for nding the maximum likelihood estimator, if the eective learning coecient satis es the following conditions: 1 1 X X (t) < 1: (11) (t) = 1; T s=T +1
i
i
i
2
i
i
i
i
i
i
1
i
i
i
i
i
i
2
2
i
i
1
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
1
i
i
i
i
1
i
i
1
i
1
i
1
i
i
i
2
t=1
t=1
!1 1, condiFor example, when the discount factor (t) is scheduled to approach 1, i.e., (t) 0! tion (11) is satis ed and the on-line EM algorithm locally converges to a maximum likelihood estimator. An example for such scheduling is given by (t) = 1 0 (1 0 (t 0 1)); (12) t
6
Reconstruction of chaotic dynamics
where 0 1. In this paper, we use scheduling (12) with = 0:9993. We also employ dynamic unit manipulation mechanisms in order to eciently allocate the units [23]. The probability P (x(t); y(t) j (t 0 1)) indicates how probable the present model is expected to produce a datum (x(t); y(t)) with the present parameter (t 0 1). If this probability is less than a certain threshold value, a new unit is produced to account for datum (x(t); y(t)). In the production, = x(t) and W~ = (0; y(t)). The covariance matrix 6 is determined based on the distance from the neighboring units. On the other hand, the weighted mean hh1ii (t) indicates how much the i-th unit has been used to account for the data until t. If the mean becomes less than a certain threshold value, this unit is deleted. Our unit manipulation mechanisms above are based on a heuristic model selection framework . Although the threshold value for the unit production is determined by hand, the value does not signi cantly depend on the function to be approximated. The threshold value for the unit deletion is set at a small value in comparison to the reciprocal of the unit number . In order to deal with a singular input distribution, a regularization for 60 (t) is introduced as follows: i h (13a) 60 (t) = hhxx0ii (t) 0 (t)0 (t)hh1ii (t) + hh1 ii (t)I =hh1ii (t) 0 (13b) hh1 ii (t) = hhjxj ii (t) 0 j (t)j hh1ii (t) =N; where I is the N -by-N identity matrix and is a small regularization constant. The corresponding 3~ (t) can be calculated in an on-line manner using a similar equation to (9b). It should be noted that the regularized covariance (13) does not introduce any bias on the estimation of the linear regression matrix, (9c) [23]. We have shown that the on-line EM algorithm with an appropriate scheduling for the discount factor (t) is faster and more accurate than the batch EM algorithm, especially for a large amount of data [23, 25]. Therefore, it is advantageous to use the on-line EM algorithm for approximating the target function from a large amount data. In addition, the on-line EM algorithm is ecient for dynamic environments, i.e., the input and output distribution changes over time [23]. What our on-line EM algorithm is actually doing is the maximization of the discounted loglikelihood for the input-output joint probability [23, 25]. It is dierent from the usual function approximation in two points: (a) many function approximation methods do not care about the input distribution, while our method does; and (b) our method estimates the output variance parameter , which is not used in the function approximation (1). The estimation of the input distribution is useful especially when data appear in a localized region in the input space. This often occurs in approximation problems of dynamical systems, which is considered in this study. In addition, the estimation of the input distribution enables us to allocate the normalized Gaussian units eciently and dynamically. The output variance parameter corresponds to the residual error of the linear tting in the i-th local region. By incorporating this parameter in the estimation, our learning algorithm with the NGnet balances the approximation of the input distribution and the linear regression by the local units. Then, our learning method can obtain the optimal parameter for the function approximation, considering the noise level involved in the training data, the roughness new
new
new
i
1
2
i
i
2
i
1
i
i
2
i
i
i
i
i
2
2
i
1
1
i
N
i
i
N
i
2
i
2
i
1 Bayesian methods will provide a more theoretical model selection framework [24]. 2 We set the threshold at 1004 in experiments in this paper. Typical unit number is
100
1000.
7
Reconstruction of chaotic dynamics
of the target function, and so on. In addition, since the output variance corresponds to the error of the local regression with respect to the training data, the magnitude of noise in the training data can be estimated from it without any prior knowledge. Later we will show an experimental result on the noise level estimation. Furthermore, when of the i-th unit is too large in comparison to the other units, it is considered that the i-th linear regression is not good. The i-th unit can be split into two units in such a case [23], which will provides another unit manipulation mechanism. 2
i
3
Reconstruction from full observation
3.1 Lorenz system
As a chaotic dynamical system, we consider the Lorenz system [14], which is de ned by the following dierential equations: x_ = a(y 0 x) (14a) y_ = 0y + (b 0 z )x (14b) z_ = 0cz + xy; (14c) where x_ denotes the time derivative of the state variable x. The parameters are set at their usual values: a = 10:0; b = 28:0, and c = 8=3. Let X_ = F (X ) be the vector notation of the Lorenz equation (14), where X (x; y; z)0. An orbit of the Lorenz equation is obtained by the fourth order Runge-Kutta method with the integration time step t = 0:001. Figure 1 shows the phase portrait of the Lorenz attractor, when the sampling time step is t = 10 1 t = 0:01. In the following experiments, the sampling time step is xed at t . R
O
R
O
3.2 Discretized vector eld
We [21, 22] formerly showed that an RNN with sigmoidal units was able to generate a chaotic attractor, which had a similar topological structure to the Lorenz attractor, after learning of an orbit of the Lorenz system. In the present study, we employ the NGnet to approximate the discretized vector eld of the dynamical system and show that this new approach is able to learn the chaotic dynamics faster and better than the former method. We assume that the system variable X (t) is observed (sampled) at discrete time steps, i.e., t = n 1 t (n = 0; 1; :::). The discretized vector eld F~ (X ) for the sampling time step t is then de ned by F~ (X ) (8 (t ) 0 X )=t ; (15) where 8 (t ) is the solution of the dynamical equation 8_ (t) = F (8 (t)) under the initial condition 8 (0) = X . The discretized vector eld is dependent on the sampling time step t and is dierent from the original vector eld of the continuous-time system. If the sampling time is xed, however, the discretized vector eld contains all the information of the discretized dynamics whose evolution is described by (16) X ((n + 1) 1 t ) = X (n 1 t ) + t F~ (X (n 1 t )): O
O
X
X
O
O
O
X
X
X
O
O
O
O
O
8
Reconstruction of chaotic dynamics
Alternately, the discretized dynamics can be described by X ((n + 1) 1 t ) = K (X (n 1 t )) 8 1 O (t ); (17) where K is called the discrete mapping. In Section 5, we will compare these two methods. O
O
X (n t )
O
3.3 Learning vector eld
In order to train the NGnet so as to approximate the discretized vector eld (15), the input and the target output are the state vector and the discretized vector eld at that state vector, respectively. Namely, the input is X (n 1 t ) and the target output is (X ((n + 1) 1 t ) 0 X (n 1 t ))=t , for n = 0; 1; ::: When the trained system is running, the discretized vector eld at the current state X (n 1 t ) is approximated by using the trained NGnet. The next time-step state X ((n + 1) 1 t ) is then calculated from equation (16) by using the approximated discretized vector eld F~ . By continuing this process, a trajectory is reproduced. The above approach is dierent from our former approach using an RNN [21, 22]. The RNN approach tried to approximate the original vector eld of the continuous-time system even if the system was discretized. This could be done using the \back-propagation through time" algorithm [19] for the discretized dynamical system. In that approach, the quality for estimating the original vector eld degrades as the sampling time step enlarges. On the other hand, the new approach tries to approximate the discretized vector eld of the discretized system. Therefore, the estimation quality is less dependent on the sampling time step t . Accordingly, our current objective is to approximate the discretized dynamics of the unknown dynamical system rather than to estimate the original continuous-time dynamical system from the discrete sampling. The solid lines in Figures 4(a), 4(b) and 4(c) show the learning process. The abscissa denotes the amount of learned data, which are supplied one after another from an orbit of the Lorenz system. The ordinate in Figure 4(a) denotes the normalized mean squared error (nMSE) of the discretized vector eld. The nMSE is evaluated using 10,000 test data that are randomly sampled on the Lorenz attractor, and it is normalized by the variance of the true vector eld on the Lorenz attractor. The ordinate in Figure 4(b) or 4(c) denotes the log-likelihood discounted with the discount coecient (t) as in equation (7) or the averaged log-likelihood for the 10,000 test data above, respectively. Our on-line EM algorithm maximizes the discounted log-likelihood (Figure 4(b)) for the input-output joint probability [23, 25] and converges to a local maximum of the expected log-likelihood on the attractor. Figures 4(b) and 4(c) show that both the discounted log-likelihood for the training data and the averaged log-likelihood for the test data increase prosperously as the amount of training data increases. The averaged log-likelihood approximates the expected log-likelihood on the attractor. On the other hand, the on-line EM algorithm does not directly minimize the nMSE. This may be the reason for occasional up and down in nMSE learning curves (Fig.4(a)). After 100,000 data are learned, the number of units becomes 135 and the nMSE on the attractor becomes 0.015%. Figure 2 shows the phase portrait of the reproduced dynamics, whose discretized vector eld is given by the trained NGnet, after the 100,000 data have been learned. In our former study [22], 3,000,000 data points were needed to train the RNN with sigmoidal units in order to well approximate the Lorenz dynamics. Namely, weight parameters of the RNN O
O
O
O
O
O
O
9
Reconstruction of chaotic dynamics
were updated 30,000 times and 100 orbit points were presented to the RNN for each weight update. The characteristics of the Lorenz dynamics are better approximated in our new approach than in the RNN method as will be shown below. Therefore, our new approach is faster and more accurate than the former one. This is partly because the NGnet is a local model, and partly because the on-line EM algorithm is faster than the gradient descent algorithm used in the RNN. On the other hand, since the NGnet is a local model, it has a weak extrapolation ability in comparison with global models such as the RNN with sigmoidal units. In the outside region of the attractor, the NGnet does not approximate the vector eld well. The Lorenz attractor has a two-dimensional sheet-like structure consisting of a large number of closely packed sheets [14]. Its fractal dimension is slightly larger than two [12]. Therefore, the local covariance matrix, 6 , becomes nearly singular and the learning process becomes unstable, if the regularization method introduced in Section 2 is not employed. When the regularization method with = 0:01 is used, the largest, the second largest, and the smallest eigenvalues of the covariance matrices become 5:44(mean) 6 4:67(standard deviation), 0:91 6 0:92, and 0:06 6 0:26, respectively, for the 135 units after the learning of the 100,000 data. Our regularization method prevents the smallest eigenvalue from becoming zero. Figure 3 shows the receptive elds of the 135 units. In the gure, we can see that the 135 units cover the whole attractor. Here, the receptive eld of the i-th unit is depicted by an ellipse whose axes correspond to the two principal components of the local covariance matrix 6 . The ellipse center is set at the unit center . i
i
i
3.4 Robustness to noise
In this subsection, the robustness of our approach to noise is examined. We consider two kinds of noise processes: system noise and observation noise. System noise disturbs the system dynamics, namely, the system dynamics X_ = F (X ) is disturbed by the white Gaussian noise (t): X_ = F (X (t)) + (t): (18) The white Gaussian noise (t) satis es E [(t)(s)] = I 1 (t 0 s), where E [1] denotes the expectation value, I is a three-dimensional identity matrix, (1) is the Dirac's delta function, and is the system noise variance. The disturbed state variable X (t) becomes a stochastic process. By applying the Euler method, the stochastic dierential equation (18) can be discretized [15] with the integration time step t as follows: X ((n + 1) 1 t ) = X (n 1 t ) + t 1 F (X (n 1 t )) + t (n 1 t ): (19) The noise whiteness in the continuous system, E [ (t)(s)] = I 1 (t 0 s) becomes E[ (n 1 t ) (m 1 t )] = I 1 =t in the discretized system. Here, is the Kronecker's delta and P R! t ( =t ) 0! the coecient 1 =t guarantees the normalization of the delta function: 1 = R 1 = ds(t 0 s). Since the Euler method is not very precise for the integration of a chaotic dynamical system, we use the following method. The integration of the noiseless equation (14) is done by the Runge-Kutta method with the integration time step t . At each integration 2
S
2
S
R
R
R
R
R
R
R
2
R
R
2
S
S
nm
R
nm
R
m
R
R
nm
R
t
0
10
Reconstruction of chaotic dynamics
time step, the discretized system noise (n 1 t ) t (n 1 t ) is independently added to the system variable: X ((n + 1) 1 t ) = X (n 1 t ) + t 1 F (X (n 1 t )) + (n 1 t ); (20) where F (1) is the eective discretized vector eld obtained by the Runge-Kutta method for the noiseless equation (14) with the integration time step t . From the de nition, (n 1 t ) t 1 (n 1 t ), the discretized system noise satis es E [ (n 1 t ) (m 1 t )] = I 1 t , namely, the variance of the discretized system noise (n 1 t ) is given by t . This method (20) obtains an approximated solution of the stochastic dierential equation (18). On the other hand, when the state variable is observed, observation noise is added: Y (t) = X (t) + (t): (21) In experiments, Y (n 1 t )(n = 0; 1; :::) are supplied to the learning system. At each observation time step, observation noise is assumed to be independently generated according to the Gaussian distribution whose variance is given by (t =2). Then, the eective noise variances in the discretized vector eld for the sampling time step t , i.e., (Y ((n + 1) 1 t ) 0 Y (n 1 t ))=t , become ( =t ) for the system noise and ( =t ) for the observation noise. The magnitudes of these two types of noise q are given in proportion to the standard deviation of the state variable on the attractor, Var(X )=3 = 8:52, which is experimentally obtained from a lot of Lorenz orbits. For example, the variance of system noise is determined as t = t () . In the following experiments, we represent the noise level using the percentage of parameter . Figures 4(a), 4(b) and 4(c) show the learning processes, i.e., the nMSE in 4(a), the discounted log-likelihood for the training data in 4(b) and the average log-likelihood for the test data in 4(c), for various system noise levels: 0% (solid), 10% (dashed), 20% (dash-dotted), and 30% (dotted). No observation noise is added in these experiments. As the noise level becomes large, the nMSE on the attractor increases, the log-likelihood (LL) for both the training data and the test data becomes small, and the amount of data needed to approximate the dynamics becomes large. If a suciently large amount of data is supplied, our method is able to reproduce a chaotic attractor that is similar to the original Lorenz attractor, even when system noise increases up to 40% or observation noise increases up to 70%. In order to measure the similarity between the Lorenz attractor and the reconstructed chaotic attractor, the dynamical characteristics of the attractor are examined. Table I summarizes the characteristics of the attractors reconstructed by the NGnet after the learning of 2,000,000 data in various noise conditions. Twelve conditions are prepared: no noise, 10%, 20%, 30% and 40% system noise, and 10%, 20%, 30%, 40%, 50%, 60% and 70% observation noise. \NU" denotes the number of NGnet units after the learning. As the system/observation noise level increases, the NU becomes large, due to the variation of input Y (n 1 t ) and the target output (Y ((n + 1) 1 t ) 0 Y (n 1 t ))=t . Although the NU depends on the threshold values for the unit production and deletion mechanisms, the comparison in Table I is meaningful, because the threshold values are the same for all of the noise conditions. In general, the NU is likely to gradually enlarge as the number of training data increases. In each condition, the nMSE and the LL on the attractor are evaluated on a xed test data set consisting of 100,000 data points that are randomly sampled from the Lorenz attractor. S
R
R
R
R
R
R
RK
R
S
R
RK
R
R
R
S
S
S
R
R
S
2
R
2
S
S
R
R nm
R
O
O
O
O
2
O O
O
O
2
S
2
O
O
O
O
O
2
R
R S
2
O
O
O
O
Reconstruction of chaotic dynamics
11
The nMSE and the LL around the attractor are evaluated on a xed test data set consisting of 100,000 data points that are randomly sampled from Lorenz trajectories with 20% system noise. The purpose of the latter condition is to see how well the NGnet approximates the vector eld in the neighborhood of the attractor. In general, the nMSE and the LL on the attractor becomes worse as the noise level becomes large. When system noise is added, on the other hand, the nMSE around the attractor becomes smaller than that in the noiseless case. When moderate system/observation noise is added, the LL around the attractor is also improved. Although our on-line EM algorithm maximizes the discounted log-likelihood, the accuracy of the dynamics reproduced by the NGnet depends on the approximated discretized vector eld. In the following experiments, therefore, we often use the nMSE to evaluate the approximation ability for the discretized vector eld. It should be noted that the absolute value of the nMSE and the LL cannot be compared. From the correlation dimension (CD) [12] and the maximum Lyapunov exponent (MLE) [7, 27] in Table I, we can see that the chaotic attractor reproduced by the NGnet well approximates the complexity and the instability of the original Lorenz attractor for noise levels below 40% system noise or 30% observation noise. In our former study [21, 22], the MLE value in the noiseless case was 0.85, which was worse than the present results. When the observation noise is 40% or larger, the CD becomes smaller than the true value and the MLE becomes larger than the true value. Namely, the detailed fractal structure of the chaotic attractor is smeared by the large noise, so that the complexity of the attractor measured by the CD decreases. The increase of the MLE implies that the large noise increases the instability of the obtained dynamics. When observation noise is larger than 75% or system noise is larger than 45%, the orbits produced by the NGnet often stop or go into limit cycles after long chaotic transients. The above results can be interpreted as follows. System noise disturbs the system trajectory and it helps the learning system explore the states around the attractor. Therefore, the data with the system noise contain information about the vector eld around the attractor, although the information includes the noise. When noiseless data are provided, there is no information on the vector eld outside the attractor, but accurate information on the vector eld is available on the attractor. Observation noise also disturbs the observed trajectory. However, the dynamics is not disturbed by the observation noise, so the data with the observation noise contain no information on the vector eld outside the attractor. These facts may be a part of the reason why system noise improves the nMSE around the attractor and observation noise does not. There may be other reasons for the good performance of our method under noise conditions. In the NGnet, the parameters of each unit are determined by the weighted mean (7) in the neighborhood of the unit center. Because of the ergodicity of the chaotic attractor, much data can be found in the neighborhood of any point on the attractor if one observes the system long enough. Therefore, the noise eects can be smoothed by the averaging process. We can also consider a situation where both of the two noise processes occur. Figure 5 shows the reproduced chaotic attractor after learning of 3,000,000 data with both 30% system noise and 30% observation noise. In this case, the nMSE on the attractor becomes 0:178%, which is 12 times as large as that of the noiseless case in Figure 2. Nevertheless, the reproduced chaotic attractor looks similar to the Lorenz attractor, and has similar characteristics, too. Table I contains the characteristics of the reproduced attractor. In the experiments above, the regularization parameter in equation (13a) is xed at 0.01 for every noise condition. In general, the regularization becomes less important in the presence
12
Reconstruction of chaotic dynamics
of large noise, because the noise enlarges the distributed region of the input data. One of the advantages of our method is that the noise level included in the observed data can be estimated without prior knowledge. The output variance parameter determined by (9d) estimates the mean squared error of the linear tting in the i-th local region. Therefore, the global noise level can be estimated from the average of . The estimated noise levels normalized by the vector eld variance in various noise conditions are given in Table II. The results show that the noise levels are successfully estimated by our method. The prediction ability of the trained NGnet in the above experiments is also examined. If one were to need future predictions for only small time steps, a much smaller amount of data would be needed. We here examine the predictive performance of the NGnet trained by a large amount of data in the presence of the two types of noise. Figure 6(a) shows a typical example of a time-series for x(t) predicted by the NGnet, which is trained with no noise, 10% system noise, and 10% observation noise. The prediction is accurate until about t = 5:2 in the noiseless and the 10% system noise cases. In the 10% observation noise case, the prediction starts to deviate at t = 3:5 but continues to follow the true orbit until about t = 5. Figure 6(b) shows the average prediction error. Each line denotes the squared error between the three dimensional state variable predicted by the NGnet and the true Lorenz orbit at each time. The error is averaged over trials starting from 100 dierent initial states and is normalized by the variance of the state variable, . The prediction is accurate until about t = 4:2 on average in the noiseless and the 10% system noise cases. In the 10% observation noise case, the prediction starts to deviate at t = 3:5 but continues to follow the true orbit until about t = 4:2 on average. Even with the perfect model, the prediction error would grow as exp(MLE 1 t) on average due to the instability of the chaotic dynamics. The above prediction times are fairly long compared to the characteristic time, 1=MLE = 1:04, which is determined by the MLE of the Lorenz attractor. One should note that t = 1 corresponds to 100 time steps in these experiments. These experiments show our approach based on the on-line EM algorithm is robust to system and observation noises. The chaotic attractor reproduced by our method well approximates the dynamical characteristics of the original Lorenz attractor even in the presence of the two types of noise. Prediction by the trained NGnet is also good under the two types of noise. The experiments also show that the system noise does not degrade the prediction ability but the observation noise degrades it slightly. In addition, we have found that the system noise can help the learning of the vector eld around the attractor, but can be harmful to the learning when it is too large. 2
i
2
i
2
4
Reconstruction from partial observation
In real problems, part of dynamical variables are often observed. In this section, it is assumed that one of the dynamical variables, e.g., x(t) of the Lorenz system (14), is observed. This situation was also investigated in our former study [22], where the RNN was trained in order to learn the Lorenz dynamics with hidden variables. Although the trajectory of the observed variables was reproduced fairly well, it remained unclear whether the dynamics of the trained RNN was equivalent to the original dynamics, because of coordinate transformation ambiguity
13
Reconstruction of chaotic dynamics
[22]. In order to avoid this problem, we employ a delay coordinate embedding in the present study. 4.1 Time delay embedding
The delay coordinate for variable x(t) is de ned by Z (t) (x(t); x(t 0 ); x(t 0 2 ); :::; x(t 0 (d 0 1) ))0 ; (22) where d and are called the reconstruction dimension and the lag time, respectively [29, 26]. In experiments, we set d = 3 and = n 1 t = 0:15 (, i.e., n = 15), and x(n 1 t ) (n = 0; 1; :::) are provided to the learning system. From this trajectory, the delay coordinate trajectory (22) is obtained. Figure 7 shows the phase portrait of Z (t), where a chaotic attractor can be seen. The lag time has been determined to be the rst minimum of the mutual information between x(t) and x(t0 ) [9]. This prescription guarantees that x(t) and x(t0 ) are independent enough of each other to be useful as coordinates in a time delay vector but not so independent as to have no connection with each other at all. The embedding theorem [29, 26] states that, if the reconstruction dimension d is larger than 2s, where s is the attractor's dimension, possibly fractal, the attractor can be embedded in the delay coordinate, namely, there is a smooth vector eld for the delay coordinate: Z_ = H (Z ); (23) and the vector eld H preserves the topological structure of the original vector eld F [26]. Since the fractal dimension of the Lorenz attractor is slightly larger than two [12], one can safely embed the Lorenz attractor with d 5. However, this is not a necessary dimension for the embedding. With the lag time = 0:15, the Lorenz attractor can be embedded in the delay coordinate for d 3. This can be con rmed by checking the false neighbors in the reconstruction space [16]. When distant points in the original dynamical space are projected to neighboring points in the reconstruction space, these points are called false neighbors. The absence of false neighbors implies that the reconstructed attractor is completely unfolded and is safely embedded in the reconstruction space. If neighbor points in the d-dimensional reconstruction space become far apart in the (d + 1)-dimensional reconstruction space, they are regarded as false neighbors. It has been experimentally con rmed that there are no false neighbors for d 3 with the lag time = 0:15. Therefore, d = 3 is sucient for the embedding of the Lorenz attractor. There are several approaches for making future prediction using the delay coordinate and each method has its own advantages and disadvantages. Conventionally, the delay coordinate is used to predict x(t + ) [8, 28], namely, some function approximator is trained in order to learn mapping from the delay coordinate Z (t) to the future value for the observed variable x(t + ). In this approach, the prediction is done for every time period and only a stroboscopic trajectory fx(t + ); x(t + 2 ); :::g is produced. In order to predict a full trajectory fx(t + t ); x(t + 2t ); :::g, one can use x(t + t ) as the target value for the predictor [30]. In this approach, the initial state should be speci ed by the d (= n (d 0 1) + 1) dimensional vector, fx(0); x(t ); :::; x((d 0 1) )g. This implies that the eective embedding dimension becomes d ( d). If the observation involves noise, the initial state is disturbed by the d -dimensional noise, which may introduce an excessive amount of noise for prediction.
O
O
O
O
O
R
O
R
R
14
Reconstruction of chaotic dynamics
In our approach, the NGnet is trained to learn the discretized vector eld corresponding to the original vector eld H (Z ) in the delay coordinate space. Then, the full trajectory fx(0); x(t ); x(2t ); :::g can be obtained from the d-dimensional initial state Z (0). The embedded vector eld H (Z ) has a global constraint: H (Z (t + )) = H (Z (t)) (k = 1; :::; d 0 1) (24) along the system trajectory. Here, H (1) denotes the k-th element of the vector eld H . In order to impose this constraint explicitly on the vector eld in the learning process, the backpropagation through time algorithm [19] is required, and the on-line EM algorithm cannot be used. Therefore, we discard this constraint in the learning process. Since the noiseless training data satisfy this constraint, one can expect that the trained NGnet approximately satis es this constraint if the NGnet well approximates the observed data. By employing the on-line EM algorithm, the nMSE of the discretized vector eld on the attractor rapidly decreases, as the amount of learned data increases. After learning of 100,000 data, the nMSE becomes 0:016%. Figure 8 shows the phase portrait produced by the trained NGnet. One can see the similarity between Figures 7 and 8. When the trained system is running, the discretized vector eld at the current state Z (n1t ) is approximated by using the trained NGnet. The next time-step value Z ((n + 1) 1 t ) is then calculated by using the approximated discretized vector eld. Although constraint (24) is ignored in this process, we expect that the trained NGnet approximately satis es the constraint. Actually, the correlation coecients of Z (t + ) vs. Z (t), and Z (t + ) vs. Z (t), on the attractor reproduced by the NGnet after the learning of the 100,000 data, are found to be almost equal to 1. O
O
k+1
k
k
O
O
2
4.2 Robustness to noise
1
1
0
In order to consider noise processes in a partial observation case, we assume that system noise disturbs all of the dynamical variables, X (t), and observation noise disturbs the observed variable x(t). Further details of the noise processes are the same as in the full observation case discussed in Section 3.3. In the partial observation case, the two types of noise introduce several diculties to our learning scheme. In the full observation case, system noise helps the learning system explore states around the attractor. In addition, the eect of system noise and observation noise in the discretized vector eld at one observation time is independent of that at another time. Then, the learning of a lot of data from the attractor is expected to work as temporal smoothing for the discretized vector eld so as to remove the two types of noise. In the partial observation case, on the other hand, the noise eect in the embedded vector eld is dependent on that at another time. Namely, it introduces temporally correlated noise. Therefore, the temporal smoothing does not work well, so the learning scheme becomes more sensitive to noise than in the full observation case. Moreover, a delay coordinate trajectory with two types of noise does not satisfy the global constraint (24). Even with these diculties, the NGnet is able to learn the chaotic dynamics in the delay embedding space. If system noise is less than 20% or observation noise is less than 25%, the chaotic attractor can be reproduced by the NGnet after learning of 2,000,000 data. If the noise level becomes larger than these values, the reproduced trajectory converges to a limit-cycle after
15
Reconstruction of chaotic dynamics
a long chaotic transient whose topological structure is similar to that of the Lorenz attractor. Table III summarizes the characteristics of the reproduced chaotic attractors under eight noise conditions. As the noise level increases, the CD becomes smaller than the true value and the MLE becomes larger than the true value. Figure 9 shows the phase portrait of the chaotic attractor produced by the NGnet after learning of 2,000,000 data which include 15% system noise and 15% observation noise. Its complexity seems to be slightly smaller than that of Figure 7. The prediction ability of the trained NGnet using the delay embedding is also examined. Figure 10(a) shows a typical example of a time-series for x(t) predicted by the NGnet, which is trained with no noise, 10% system noise, and 10% observation noise. The prediction is accurate until about t = 5:5 in the noiseless and the 10% system noise cases. In the 10% observation noise case, the prediction starts to deviate at t = 1:5 but continues to follow the true value until about t = 5:5. Figure 10(b) shows the average prediction error. Each line denotes the squared error between the state variables predicted by the NGnet and the true state variables in the delay coordinate. The error is averaged over 100 dierent initial conditions and is normalized by the variance of the state variables in the delay coordinate. The prediction starts to deviate at about t = 1 but closely follows the true value until about t = 3 on average, in all cases. The prediction still follows the true value until about t = 4:5 in the noiseless case, t = 4 in the 10% system noise case, and t = 3:8 in the 10% observation noise case. These prediction times are fairly long compared to the characteristic time, 1=MLE = 1:04, but they are shorter than those in the full observation case. 5
Learning discrete mapping
In this section, we discuss an alternative learning method. The NGnet is trained in order to approximate the discrete mapping for the sampling time step t , i.e., X ((n + 1) 1 t ) = K (X (n 1 t )) (n = 0; 1; 2; :::) [1]. Mathematically, this method is equivalent to the previous method that approximates the discretized vector eld. However, the two methods treat noise eects dierently in numerical calculations. As explained in Section 3.3, the noise variance for the discretized vector eld is ( =t ) for the observation noise. On the other hand, the noise variance for the discrete mapping K is (t 1 =2), which is much smaller than that of the discretized vector eld for small t . The diculty of the discrete map method is that map K is very close to the identity map for small t . Thus, the task of the NGnet is to estimate the small deviation from the identity map which is of order t . Table IV summarizes the characteristics of the reproduced attractor by the NGnet after learning of 2,000,000 data under various noise conditions. Twelve conditions are prepared: no noise, 10%, 20%, 30%, 40% and 50% system noise, and 10%, 20%, 30%, 40%, 50% and 60% observation noise. Although the NGnet approximates the state point at the next observation time, \nMSE" in the table denotes the nMSE of the discretized vector eld like in Table I. Thus, Table IV can be compared easily with the results in Table I. The performances for small noise up to the 30% level are comparable with those of the discretized vector eld (VF) method. The nMSE for the system noise case in the map method is slightly better than that of the VF method. However, the nMSE for the observation noise O
O
2
O
O
O
2
O
O
O
O
O
16
Reconstruction of chaotic dynamics
case in the map method is slightly worse than that of the VF method. The NGnet trained with noiseless data in the map method poorly approximates the true mapping function around the attractor as can be seen in Table IV, and trajectories generated by this NGnet often stop after long chaotic transients. This result may come from the fact that the NGnet in the map method tends to approximate an identity map. The CD and the MLE in Table IV for the noiseless case are calculated from data in the chaotic transient period. There are remarkable dierences for large noise cases. When observation noise increases more than 40%, the MLE in the map method decreases, while that in the VF method increases. In addition, as the noise level increases, the number of units decreases in the map method but increases in the VF method. This can be interpreted as follows. In the map method, the function approximated by each local unit is close to the identity function as stated above. Consequently, the approximated function becomes simpler than in the VF method, so that the partition in charge of each local unit is likely to be larger than that in the VF method. This tendency becomes more prominent when the noise becomes large, because the data are distributed in an enlarged region when there is more noise. This enlargement of the receptive eld acts as the spatial smoothing for the mapping function such that it leads to a decrease in the complexity and instability of the chaotic attractor. On the other hand, the discretized vector eld looks random under the large noise. As a result, the estimated vector eld by the NGnet becomes complex and the number of units increases. Table V shows the average volume of receptive elds and the average output variance for various noise conditions. In order to obtain the average volume of receptive elds, we calculate the weighted mean for the determinant of the input covariance 6 . The value is normalized by . Each \RF" value in Table V denotes the percentage of the cubic-root of the normalized value. Therefore, the table entry indicates the relative length of the receptive elds in comparison to the input standard deviation, . The de nition of the average output deviation is the same as in Table II. From Table V, we can see that the receptive eld is larger in the map method than in the VF method, especially when the noise level is large. On the other hand, the average output variance is much smaller in the map method than in the VF method. This implies that a larger region can be approximated by a linear map in the map method than in the VF method so that the receptive elds become larger in the map method than in the VF method. It should be noted that the comparison of the unit number between the VF method and the map method is meaningful. Although the unit number depends on the threshold values for the unit production and deletion mechanisms, those values are the same between the two methods. Each threshold value corresponds to the probability which does not directly depend on the local linearity of the input-output relationship nor the eective noise involved in the training data. The map method is also applied to the partial observation case. Map K (Z ) satis es the same constraint as (24): K (Z (t + )) = K (Z (t)) (k = 1; :::; d 0 1): (25) Table VI summarizes the characteristics of the attractors reproduced by the NGnet after learning of 2,000,000 data in various noise conditions. Comparing Tables III and VI, the performances in the map method seem to be slightly worse than those in the VF method. As in the i
3
k+1
k
Reconstruction of chaotic dynamics
17
full observation case, the number of units decreases in the map method and increases in the VF method when the noise level increases. 6
Conclusion
In this paper, we have discussed the reconstruction of chaotic dynamics by using the NGnet. The NGnet is trained by the on-line EM algorithm. Although the chaotic dynamics is unstable on its attractor, the discretized vector eld of the chaotic dynamics can be well approximated by the NGnet. By investigating the characteristics of the reproduced attractors, we have found that our method well approximates the complexity and instability of the original chaotic attractor. Furthermore, our scheme is shown to be fairly robust to two types of noise processes, i.e., system and observation noises. The trained NGnet also shows good prediction performance. When only part of the dynamical variables are observed, the delay embedding method is used. The NGnet is trained in order to learn the discretized vector eld in the delay coordinate space. It has been shown that the chaotic dynamics is able to be learned with this method in the delay embedding space under the two types of noise, although the performances are lower than those in the full observation case. We have recently proposed a new embedding method [32] that is more robust to noise than the delay embedding method.
18
Reconstruction of chaotic dynamics Table I
Characteristics of chaotic attractors produced by NGnet after learning of 2,000,000 data under various noise conditions. \NU" denotes the number of units after the learning, whose upper limit is set at 1000. \nMSE" and \LL" denote the normalized mean squared error of the discretized vector eld in percentage and the averaged log-likelihood, respectively. The \on" and \around" denote the nMSE or the LL evaluated on the attractor and around the attractor, respectively. The test data set is the same in all of the noise conditions. The \on" test data set consists of 100,000 data point that are randomly sampled from noiseless Lorenz trajectories, and the \around" test data set consists of 100,000 data point that are randomly sampled from Lorenz trajectories involving 20% system noise. \CD" denotes the correlation dimension averaged over four sets of 50,000 points taken from the reproduced attractor. Its standard deviation is calculated from the deviation among the four sets and the linear tting error for each data set. \MLE" denotes the maximum Lyapunov exponent, which is calculated along a single trajectory consisting of 10,000 points. The Jacobian matrix for a trajectory point is estimated by a secondorder polynomial approximation using 32 data points in the neighborhood of the current data. The Gram-Schmidt orthogonalization is used for the MLE calculation. Its standard deviation is calculated for a period between t = 80 and t = 100. NU nMSE LL CD MLE on around on around True Lorenz 2:054 6 0:006 0:958 6 0:014 Noiseless 183 0.004 0.400 4:04 09:87 2:053 6 0:008 0:940 6 0:028 10% 80 0.007 0.016 05:17 05:81 2:049 6 0:007 0:998 6 0:005 Sys. noise 20% 148 0.010 0.018 06:90 07:23 2:045 6 0:007 0:994 6 0:013 30% 322 0.046 0.048 07:63 07:80 2:025 6 0:011 1:172 6 0:036 40% 569 0.059 0.078 08:14 08:25 1:862 6 0:034 0:966 6 0:029 10% 114 0.024 1.258 05:48 07:60 2:040 6 0:013 1:004 6 0:002 20% 224 0.054 2.539 06:90 08:43 2:027 6 0:015 1:073 6 0:015 30% 391 0.154 3.161 07:69 08:74 1:998 6 0:010 1:659 6 0:043 Obs. noise 40% 606 0.291 3.543 08:28 09:01 1:857 6 0:041 4:181 6 0:187 50% 885 0.421 3.871 08:71 09:26 1:586 6 0:084 4:688 6 0:312 60% 1000 0.385 3.866 09:24 09:66 1:892 6 0:029 3:629 6 0:244 70% 1000 0.530 4.123 09:83 010:18 1:666 6 0:041 2:589 6 0:189 Sys. and Obs. 30% 917 0.178 0.365 08:23 08:41 1:967 6 0:033 1:202 6 0:038
19
Reconstruction of chaotic dynamics Table II
Actual noise levels for the vector eld and the noise levels estimated by the trained NGnet. InP order to calculate the latter value, we obtain the weighted mean of , i.e., hh1ii . The weighted mean is normalized by the vector eld variance, . The table entry denotes the percentage of the square-root of the normalized weighted mean. The noise conditions are the same as those in Table I. Actual (%) Estimated (%) Noiseless 0.0 0.83 10% 13.4 8.9 Sys. noise 20% 26.8 17.9 30% 40.2 27.1 40% 53.6 36.5 10% 13.4 11.8 20% 26.8 22.4 30% 40.2 33.2 Obs. noise 40% 53.6 44.7 50% 67.0 56.6 60% 80.4 64.9 70% 93.7 74.1 2
i
2
M i=1
2
i
i
20
Reconstruction of chaotic dynamics Table III
Characteristics of chaotic attractors in the delay coordinate space reproduced by NGnet after learning of 2,000,000 data under eight noise conditions. De nitions of table entries are the same as those in Table I. The number of units is limited to 1000. NU nMSE LL CD MLE on around on around True embedded 2:021 6 0:007 0:976 6 0:012 Noiseless 387 0.003 2.117 7.67 -16.95 1:923 6 0:063 0:882 6 0:001 5% 292 0.006 0.118 -0.28 -1.45 1:972 6 0:020 0:968 6 0:004 Sys. noise 10% 450 0.020 0.115 -2.38 -2.80 1:940 6 0:024 1:032 6 0:004 15% 813 0.075 0.173 -3.39 -3.60 1:865 6 0:034 1:480 6 0:066 5% 290 0.025 0.733 -0.89 3.43 1:857 6 0:047 2:821 6 0:199 Obs. noise 10% 504 0.060 1.022 -2.53 4.08 1:729 6 0:050 4:208 6 0:132 15% 943 0.146 1.458 -3.36 -4.33 1:732 6 0:040 4:250 6 0:272 20% 1000 0.120 1.584 -4.28 -4.94 1:603 6 0:044 3:997 6 0:200
21
Reconstruction of chaotic dynamics Table IV
Characteristics of chaotic attractors reproduced by NGnet after learning of 2,000,000 data under various noise conditions. The target output for NGnet is state value at the next observation time. De nitions of table entries are the same as those in Table I. z The number of learned data is 3,000,000, because reproduced trajectory after learning 2,000,000 data often goes into a limit-cycle after a long chaotic transient. NU nMSE LL CD MLE on around on around Noiseless 181 0.055 450.4 16.70 -6.20 2:039 6 0:020 1:039 6 0:002 10% 138 0.005 0.012 9.27 8.53 2:048 6 0:010 0:968 6 0:006 20% 118 0.007 0.010 6.71 6.39 2:052 6 0:006 0:895 6 0:011 Sys. noise 30% 88 0.014 0.016 4.86 4.70 2:047 6 0:010 0:914 6 0:004 40% z 78 0.016 0.019 3.59 3.53 2:045 6 0:007 0:909 6 0:002 50% 65 0.036 0.039 2.48 2.46 2:042 6 0:011 0:890 6 0:018 10% 139 0.036 2.111 7.45 5.55 2:034 6 0:011 1:177 6 0:010 20% 96 0.117 2.766 4.70 3.83 2:017 6 0:011 1:293 6 0:048 Obs. noise 30% 84 0.232 3.172 3.18 2.65 1:984 6 0:016 1:094 6 0:022 40% 82 0.412 3.525 2.10 1.73 1:942 6 0:013 0:765 6 0:020 50% 86 0.657 3.874 1.33 1.05 1:878 6 0:038 0:716 6 0:020 60% 76 1.059 4.439 0.52 0.29 1:564 6 0:052 0:855 6 0:011
22
Reconstruction of chaotic dynamics Table V
Average volume of receptive elds (\RF") and average output deviation (\Output dev.") in the vector eld method (VF) and the map method (Map). In order to obtain the average volume of receptive elds, we calculate the weighted mean for the determinant of the input covariance 6 . The value is normalized by . Each table entry of RF denotes the percentage of the cubic-root of the normalized value. The de nition of the average output deviation is the same as in Table II. VF Map RF (%) Output dev. (%) RF (%) Output dev. (%) Noiseless 3.9 0.83 5.1 0.076 10% 11.4 8.9 9.2 0.67 20% 13.9 17.9 15.3 1.34 Sys. noise 30% 15.8 27.1 23.5 2.01 40% 16.3 36.5 31.8 2.67 50% 40.7 3.35 10% 8.1 11.8 9.8 1.20 20% 9.0 22.4 15.6 2.37 30% 9.5 33.2 19.9 3.49 Obs. noise 40% 10.0 44.7 25.2 4.61 50% 9.5 56.6 29.3 5.71 60% 9.8 64.9 35.1 6.79 70% 10.3 74.1 3
i
23
Reconstruction of chaotic dynamics Table VI
Characteristics of chaotic attractors reproduced by NGnet after learning of 2,000,000 data under eight noise conditions. The target output for NGnet is the next point in the delay coordinate. De nitions of table entries are the same as those in Table I. NU nMSE LL CD MLE on around on around Noiseless 183 0.131 18.79 18.07 8.19 1:838 6 0:025 2:445 6 0:159 5% 203 0.009 0.102 12.73 12.00 1:927 6 0:040 0:960 6 0:002 Sys. noise 10% 142 0.017 0.107 9.98 9.69 1:685 6 0:049 0:834 6 0:016 15% 126 0.030 0.122 8.42 8.24 1:268 6 0:066 0:863 6 0:007 20% 114 0.048 0.138 7.23 7.12 1:605 6 0:038 0:599 6 0:029 5% 139 0.038 0.331 11.34 10.36 1:750 6 0:067 2:068 6 0:095 Obs. noise 10% 129 0.093 0.660 9.19 8.53 1:791 6 0:027 1:874 6 0:049 15% 120 0.260 1.158 7.82 7.33 1:657 6 0:067 2:712 6 0:124
Reconstruction of chaotic dynamics
24
References
[1] Abarbanel, H. D. I, Brown, R., Sidorowich, J. J., & Tsimring, L. Sh. (1993). The analysis of observed chaotic data in physical systems. Reviews of Modern Physics, 65, 1331-1392. [2] Allie, S., Mees, A., Judd, K., & Watson, D. (1997). Reconstructing noisy dynamical systems by triangulations. Physical Review E, 55, 87-93. [3] Allie, S., & Mees, A. (1997). Finding periodic points from short time series. Physical Review E, 56, 346-350. [4] Casdagli, M. (1989). Nonlinear prediction of chaotic time series. Physica D, 35, 335-356. [5] Casdagli, M., Eubank, S., Farmer, J. D., & Gibson, J. (1991). State space reconstruction in the presence of noise, Physica D, 51, 52-98. [6] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B, 39, 1-22. [7] Eckmann, J.-P., Kamphorst, S. O., Ruelle, D., & Ciliberto, S. (1986). Liapunov exponents from time series. Physical Review A, 34, 4971-4979. [8] Farmer, J. D., & Sidorowich, J. J. (1987). Predicting chaotic time series. Physical Review Letters, 59, 845-848. [9] Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33, 1134-1140. [10] Froyland, G., Judd, K., Mees, A. I., & Watson, D. (1995). Constructing invariant measures from data. International Journal of Bifurcation and Chaos, 5, 1181-1192. [11] Froyland, G., Judd, K., & Mees, A. I. (1995). Estimating of Lyapunov exponents of dynamical systems using a spatial average. Physical Review E, 51, 2844-2855. [12] Grassberger, P., & Procaccia, I. (1983). Characterization of strange attractors. Physical Review Letters, 50, 346-349. [13] Grassberger, P., Schreiber, T., & Scharath, C. (1991). Nonlinear time sequence analysis. International Journal of Bifurcation and Chaos, 1, 521-547. [14] Guckenheimer, J., & Holmes, P. (1983). Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, New York: Springer-Verlag. [15] Ito, K. (1975). Stochatic dierentials. Applied Mathematical Optimization, 1, 347-381. [16] Kennel, M. B., Brown, R., & Abarbanel, H. D. I. (1992). Determining minimum embedding dimension using a geometrical construction. Physical Review A, 45, 3403-3411. [17] Lapedes, A., & Farber, R. (1987). Nonlinear signal processing using neural networks: prediction and system modelling. Technical Report LA-UR-87, 2662, Los Alamos: Los Alamos National Laboratory.
Reconstruction of chaotic dynamics
25
[18] Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281-294. [19] Sato, M. (1990). A learning algorithm to teach spatiotemporal patterns to recurrent neural networks, Biological Cybernetics, 62, 259-263. [20] Sato, M., Joe, K., & Hirahara, T. (1990). APOLONN brings us to the real world. In Proceedings of 1990 International Joint Conference on Neural Networks (pp. 581-587). [21] Sato, M., Murakami, Y., & Joe, K. (1990). Learning chaotic dynamics by recurrent neural networks. In Proceedings of International Conference on Fuzzy Logic & Neural Networks (pp. 601-605). [22] Sato, M., & Murakami, Y. (1991). Learning nonlinear dynamics by recurrent neural networks. In Proceedings of Symposium on Some Problems on the Theory of Dynamical Systems in Applied Science (pp. 49-63), Singapore: World Scienti c. [23] Sato, M., & Ishii, S. (2000). On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12, 407-432. [24] Sato, M. (2000). On-line model selection based on the variational Bayes. Neural Computation, 13(7). [25] Sato, M. (2000). Convergence of on-line EM algorithm. In Proceedings of 7th International Conference on Neural Information Processing (pp. 476-481). [26] Sauer, T., Yorke, J. A., Casdagli, M. (1991). Embedology. Journal of Statistical Physics, 65, 579-616. [27] Shimada, I. & Nagashima, T. (1979). A numerical approach to ergodic problem of dissipative dynamical systems. Progress of Theoretical Physics, 61, 1605-1616. [28] Sugihara, G. & May, R. (1990). Nonlinear forecasting as a way of distinguishing chaos from measurement error in forecasting. Nature, 344, 734-741. [29] Takens, F. (1981). Detecting strange attractors in uid turbulence. In Dynamical Systems and Turbulence, Berlin: Springer-Verlag. [30] Tokuda, I. (1993). Deterministic prediction and speech signals of the Japanese vowel /a/. Master Thesis for University of Tsukuba. [31] Xu, L., Jordan, M. I., & Hinton, G. E. (1995). An alternative model for mixtures of experts. In Advances in Neural Information Processing Systems 7 (pp. 633-640), Cambridge: MIT Press. [32] Yoshida, W., Ishii, S., & Sato, M. (2000). Reconstruction of chaotic dynamics using a noiserobust embedding method. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, I, 181-184.
26
Figures Lorenz attractor
50 40
z(t)
30 20 10 0 40 20
20 10
0 0
−20
−10 −40
y(t)
−20
x(t)
Figure 1
The Lorenz attractor. 10,000 points from a single trajectory are plotted. #data=100,000, noiseless, nMSE=0.015%
50 40
z(t)
30 20 10 0 40 20
20 10
0 0
−20 y(t)
−10 −40
−20
Figure 2
x(t)
The attractor reproduced by NGnet after learning of 100,000 data. The nMSE of the discretized vector eld is 0.015% and the number of units is 135.
27
Figures Receptive fields (No=135)
50 40
z(t)
30 20 10 0 40 20
20 10
0 0
−20
−10 −40
y(t)
−20
x(t)
Figure 3
Receptive elds of the 135 units after the learning of 100,000 data. The ith receptive eld is depicted by an ellipse whose axes correspond to the two principal components of the local covariance matrix 6 . The ellipse center is set at the unit center . i
i
Normalized mean squared error
0.6 0.5 0.4 0.3 0.2 0.1 0 0
2
4 6 No. of data
Figure 4(a)
8
10 5 x 10
28
Figures 4
Weighted log−likelihood
2 0 −2 −4 −6 −8 −10 −12 −14 0
2
4 6 No. of data
8
10 5 x 10
8
10 5 x 10
Figure 4(b)
Log−likelihood for test data
5
0
−5
−10
−15 0
2
4 6 No. of data
Figure 4(c)
Learning curves of the on-line EM algorithm. In training, no noise (solid), 10% system noise (dashed), 20% system noise (dash-dotted), and 30% system noise (dotted) are added to the dynamical variables. (a) The nMSE of the discretized vector eld, which is evaluated using 10,000 test data randomly taken from the Lorenz attractor. (b) The discounted log-likelihood for the training data. Our on-line EM algorithm maximizes the discounted loglikelihood. (c) The averaged log-likelihood, which is evaluated for the same test data set with the nMSE. Figure 4
29
Figures #data=3,000,000, SN=30%, ON=30%, nMSE=0.178%
50 40
z(t)
30 20 10 0 40 20
20 10
0 0
−20
−10 −40
y(t)
−20
x(t)
Figure 5
The attractor reproduced by NGnet after learning of 3,000,000 data with both 30% system noise and 30% observation noise. The nMSE of the discretized vector eld is 0.178% and the number of units is 917. True delay attractor
20
x(t+2*TAU)
10
0
−10
−20 20 10
20 10
0 0
−10 x(t+TAU)
−10 −20
−20
Figure 7
x(t)
The Lorenz attractor embedded to a delay coordinate for = 0:15.
30
Figures 20
x(t)
10
0
−10
−20 0
1
2
3
4
5
6
7
8
(a)
Prediction Error
1
0 0
1
2
3
4
5
Time(t) (b) Figure 6
(a) The time-series predicted by NGnet, which is trained with no noise (dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted). The solid line denotes the true Lorenz time-series. (b) The squared error between the true Lorenz time-series and the time-series predicted by NGnet. The error is averaged over 100 trials and is normalized by the variance of the state variables. The lines correspond to no noise (dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted).
31
Figures #data=100,000, noiseless, nMSE=0.016%
20
x(t+2*TAU)
10
0
−10
−20 20 10
20 10
0 0
−10
−10 −20
x(t+TAU)
−20
x(t)
Figure 8
The attractor in the delay coordinate space reproduced by NGnet after learning of 100,000 data. The nMSE of the discretized vector eld is 0.016% and the number of units is 285. #data=2,000,000, SN=15%, ON=15%, nMSE=0.037%
20
x(t+2*TAU)
10
0
−10
−20 20 10
20 10
0 0
−10 x(t+TAU)
−10 −20
−20
Figure 9
x(t)
The attractor in the delay coordinate space reproduced by NGnet after learning of 2,000,000 noisy data. 15% system noise is added to all dynamical variables, and 15% observation noise is added to the observed variable x(t). The nMSE of the discretized vector eld in the delay coordinate is 0.037% and the number of units is 1000.
32
Figures 20
x(t)
10
0
−10
−20 0
1
2
3
4 5 Time(t)
6
7
8
(a)
Prediction Error
1
0 0
1
2
3
4
5
Time(t) (b) Figure 10
(a) The time-series predicted by NGnet, which is trained with no noise (dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted). The solid line denotes the time-series of the observed variable x(t) for the true Lorenz system. (b) The squared error between the true Z (t) trajectories of the Lorenz system and the trajectories predicted by the NGnet. The error is averaged over 100 trials and is normalized by the variance of the state variable in the delay coordinate. The lines correspond to no noise (dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted).