Incremental acquisition of multiple nonlinear forward models based on ...

Report 2 Downloads 12 Views
Neural Networks 21 (2008) 13–27 www.elsevier.com/locate/neunet

Incremental acquisition of multiple nonlinear forward models based on differentiation process of schema model Tadahiro Taniguchi a , Tetsuo Sawaragi b,∗ a Graduate School of Informatics, Kyoto University, Japan b Graduate School of Engineering, Kyoto University, Japan

Received 30 November 2005; accepted 23 October 2007

Abstract We introduce the schema model as an alternative computational model representing multiple internal models. The human central nervous system is believed to obtain multiple forward-inverse models. The schema model enables agents to obtain multiple nonlinear forward models incrementally. This model is based on hypothesis testing theory whereas most modular learning methods are based on a Bayesian framework. As a specific example, we describe a schema model with a normalized Gaussian network (NGSM). Simulation revealed that NGSM has two advantages over MOSAIC’s learning method: NGSM can obtain multiple models incrementally and does not depend on the initial parameters of the forward models. c 2007 Elsevier Ltd. All rights reserved.

Keywords: Multiple internal models; Schema; Modular learning; MOSAIC

1. Introduction It has recently come to be believed that the cerebellar cortex obtains internal models of environmental dynamics and/or tool dynamics, and the notion of multiple internal models has become dominant. Wolpert et al. proposed a conceptual model, called “modular selection and identification for control (MOSAIC)” (Wolpert, Ghahramani, & Jordan, 1995; Wolpert & Kawato, 1998). Evidence supporting its modular architecture has been found in several fMRI experiments (Imamizu et al., 2003, 2004). On the other hand, computational understanding of multiple internal models has been provided (Doya et al., 2000; Kawato, 1999; Wolpert & Kawato, 1998). In the computational approach, the main questions are how the cerebellum obtains multiple internal models and how it utilizes these internal models for various control tasks.

∗ Corresponding address: Department of Precision Engineering, Graduate School of Engineering, Kyoto University, Yoshida Honmachi, Sakyo-ku, 6068501 Kyoto, Japan. Tel.: +81 75 753 5266; fax: +81 75 753 5266. E-mail address: [email protected] (T. Sawaragi).

c 2007 Elsevier Ltd. All rights reserved. 0893-6080/$ - see front matter doi:10.1016/j.neunet.2007.10.007

1.1. MOSAIC: Computational models for multiple internal models A series of reports on computational models for obtaining multiple internal models has provided several useful ways of using multiple internal models. Two basic frameworks have been proposed and advanced. One is the multiple paired forward-inverse model (MPFIM) (Wolpert & Kawato, 1998), and the other is multiple model-based reinforcement learning (MMRL) (Doya et al., 2000). In the MPFIM framework, two kinds of internal models, forward and inverse, are obtained. Once appropriate multiple inverse models have been obtained, an agent can realize desired state variables by using them. On the other hand, the MMRL framework makes it possible to calculate optimal controllers for each forward model under the condition that a reward function for each forward model can be locally approximated using a quadratic function. Doya provided an overview explaining how the MMRL architecture is realized in the human cerebellum and basal ganglia (Doya, 1999). These computational directions for using multiple forward models have worked well in several simulation tasks, so they are considered reasonable computational models to some extent.

14

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

In most of the computational research on MOSAIC, each forward model has been defined as a linear system (Doya et al., 2000; Haruno, Wolpert, & Kawato, 2001). As we discuss Section 2, this linear assumption is reasonable. Wolpert et al. proposed an online learning method for multiple forward models based on the gradient descent method and on an online switching method based on Bayes estimation (Wolpert et al., 1995). Doya et al. extended this formulation by adding two ideas, spatial locality and temporal continuity, to make the learning method less unstable in nonlinear and nonstationary task domains (Doya et al., 2000). However, in nonlinear and nonstationary environments, MOSAIC’s method for learning and switching has two shortcomings. One is that most learning results from random initial conditions are captured by abundant local minima. The other is that the number of forward models is fixed. 1.2. Bayesian approaches to multiple internal models There are many other computational models that can represent multiple internal models. For example, the switching Kalman filters (Murphy, 1998), the mixture of experts (Jacobs et al., 1991), and the hidden Markov model (Haruno et al., 2001). However, all these approaches are based on a Bayesian framework. In a Bayesian framework the probability P(m|x, y) with which the m-th forward model f m is selected is calculated by P(x, u, y|m)P(m) P(m|x, u, y) = P P(x, u, y|i)P(i)

(1)

i

  1 1 T −1 e Σ exp − e m 2 m m (2π )n |Σm | em = y − f m (x, u), P(x, u, y|m) = √

People learn how to use new tools, and they obtain forward models corresponding to the tools incrementally as distributed motor memory. However, our brains cannot always obtain internal models corresponding to all task and tools. If the tasks changes frequently without any additional cues, our brains can not acquire separated internal models (BrashersKrug et al., 1996; Osu et al., 2004). To illustrate the process of consolidation in human motor memory computationally, a computational model that can generate internal models depending on the intervals of the environmental changes should be proposed. Most Bayesian approaches to internal models do not include incremental acquisition of forward models. Based on this background, we propose a novel modular learning architecture called the schema model. This selects and generates function approximators based on testing statistical hypotheses through online learning processes. Additionally, we describe a “schema model with a normalized Gaussian network (NGSM)” as a specific example. In Section 2, we discuss the differences between spatial modularity and temporal modularity as background of our proposed model. Thereafter, we focus on temporal modularity in this paper. In Section 3, we describe the schema model based on hypothesis testing theory. In Section 4, we describe an NGSM as an example of the schema model. Finally, we describe the results of testing the performance of the NGSM in a nonlinear, nonstationary, multiple forward models acquisition task.

(2)

2. Modular learning architecture for temporal modularity

(3)

In many reports, two types of modularity are often introduced without any distinction. We call them spatial modularity and temporal modularity. An abstract image of these modularities is shown in Fig. 1. These two types of modularity are not considered to have different origins in many modular learning architectures including MOSAIC. However, they should be distinguished in the context of human multiple forward internal models.

where x is a state vector, y is an observed output vector, and u is a motor input vector.1 Eq. (2) assumes that the prediction error has a Gaussian distribution. Imamizu et al. took MOSAIC and the mixture of experts into consideration when they studied the brain mechanism of switching multiple internal models (Imamizu et al., 2004). Wolpert and Ghahramani concluded that people use the Bayesian rule to select internal models in their review (Wolpert & Ghahramani, 2000). Among studies on multiple internal models, other approaches that are not based on a Bayesian framework have rarely been taken into consideration. However, it has not been proved that our brains use a Bayesian framework when they switch their internal models. If another computational model that is not based on Bayesian theorem can describe by which the process our brains acquire the multiple internal models, then that computational model should be taken into consideration. In this paper, we describe a schema model that is based not on the Bayesian rule, but on hypothesis testing theory. It can acquire multiple nonlinear internal models incrementally. 1 In observable cases, y can be replaced by x. ˙

1.3. Computational expression of consolidation in human motor memory

2.1. Spatial modularity: Decomposition of nonlinear dynamics into several linear dynamics Nonlinear dynamics is difficult to treat computationally and to control, so nonlinear systems are often spatially decomposed into several linear systems. For example, a 2-DOF arm has nonlinear dynamics. Therefore, a modular learning architecture can be utilized to decompose the nonlinear dynamics. This is because linear systems have several features that make the target system easy to control. They are thus often useful to obtain modular linear internal models as local linear models that approximate global nonlinear dynamics locally. We call this kind of modularity “spatial modularity”. However, in the context of cognitive studies on multiple internal models, this kind of local model is not usually discussed. Usually, occasional changes in system parameters are taken into consideration.

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

15

guarantees to minimize kuk2 among these inverse models. This is due to the characteristics of the MP inverse matrix. However, the existence of such an optimal inverse model is not guaranteed when F is nonlinear. On top of that, an inverse model sometimes becomes a multiple-valued function under a nonlinear condition. Therefore, for the MPFIM framework, a reasonable strategy seems to be to obtain spatially decomposed, multiple linear models.

Fig. 1. Abstract image of hierarchical relationships between two types of modularity. Each curve represents nonlinear dynamics, and each straight line represents linear dynamics. Nonstationary and nonlinear dynamics are hierarchically decomposed into stationary linear dynamics.

On the other hand, from the computational viewpoint, spatial modularity that decomposes target dynamics into linear systems is beneficial in control problems. We next show why elemental local modules should be linear in both MPFIM and MMRL, which are subclasses of the MOSAIC architecture. 2.1.1. MPFIM: Multiple paired forward-inverse model MPFIM (multiple paired forward-inverse model) is the original MOSAIC architecture proposed by Wolpert and Kawato (1998). This architecture has multiple forward and inverse models. Generally, forward and inverse models for continuous-time are written as x˙ = F(x, u)

(4)

u = I (x, x˙ ),

(5)

d

where x (∈X ) means a state vector in state space X (=R n ), and u (∈U ) means a control input vector in action space U (=R m ), and x˙ d represents the desired state change. An appropriate inverse model I corresponding to forward model F is sought to minimize residual error e between the actual state change x˙ and the desired state change x˙ d . e2 = kx˙ d − xk ˙ 2 = kx˙ d − F(x, I (x, x˙ d ))k2 .

(6)

Inverse model I has to be estimated so as to minimize e. Otherwise, the inverse model could not realize the desired trajectory. As long as forward model F is a linear model as below, the corresponding inverse model (controller) can be easily obtained. u = I (x, x˙ ) =

Fu# (x˙ d

ri (x, u) = −(x − xi )T Q i (x − xi ) − u T Ri u.

(9)

By solving the Riccati equation, we can obtain local optimal controller u i∗ . This is a basic concept in MMRL. Therefore, the condition that elemental forward models are linear is a fundamental assumption in MMRL. We thus need spatial modularity to exploit multiple internal models. 2.1.3. Spatial modularity need not be a fundamental principle of multiple internal models Spatial modularity is meaningful in control problems, as we mentioned. However, it does not seem to be a unit of human motor memory. When multiple internal models are evaluated through experiments in many studies of cognitive neuroscience, what the authors discuss is not such a kind of decomposition of a single nonlinear system. If anything, the spatial modularity should be solved by nonlinear function approximators whose locally models are linear, e.g. RFWR (Nakanishi et al., 2005; Schaal & Atkeson, 1997), LWPR (Atkeson et al., 2000; Vijayakumar & Schaal, 2000), and CAN2 (Kurogi & Ren, 1997). In many realistic systems, their dynamics are nonlinear to some extent. In the next subsection, the other modularity is described. We believe this is the modularity that should be regarded as a fundamental principle of multiple internal models. 2.2. Temporal modularity

(7)

x˙ = F(x, u) = Fx x + Fu u d

2.1.2. MMRL: Multiple model-based reinforcement learning Doya et al. described a modular reinforcement learning architecture for nonlinear control tasks (Doya et al., 2000). This is another way to exploit obtained multiple internal forward models for control tasks. In MMRL, a key idea is multiple linear quadratic controllers (MLQC). If the local dynamics is linear and reward function ri is a quadratic function as below, a linear quadratic controller can be used in the local domain.

− Fx x − K (x˙ − x)). ˙ d

(8)

Here, the Fu# is the Moore–Penrose (MP) inverse matrix of matrix Fu . When the dimensions of the input and output spaces are the same, Fu can be a square matrix and has an inverse matrix Fu−1 (For example, see Nakanishi, Farrell, and Shaal (2005)). In other cases, the MP inverse matrix minimizes e2 , whereas the dimension of U is smaller than that of X and e cannot always be equal to zero. On the other hand, when the dimension of U is larger than that of X , there are many inverse models that make e zero. The inverse model we presented above

People have to cope with nonstationary environmental dynamics. For example, when a person starts to use a tool that is different from one used before, he or she must quickly learn the tool’s internal model and shift from one model to another. Without the shift, the forward model would have to be overwritten whenever the environmental dynamics changed. If it were not for the modular architecture used for nonstationary environmental dynamics, people would not be able to store environmental models for future use. Therefore, when a person uses a tool that they previously used, he or she must recall the tool’s forward dynamics or construct internal models

16

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

Fig. 2. Overview of schemata model.

from a zero base. The modularity is closely related to longterm memory. We call this kind of modularity “temporal modularity”. Temporal modularity corresponds to the occasional switching of environmental dynamics. From the viewpoint of the computational model, the dynamics is not restricted to a linear system but can be a nonlinear system. Temporal modularity divides the time series of interaction into sets of several stationary nonlinear dynamics. If we bear these things in mind, the modular learning architecture representing multiple internal models acquired by the human central nervous system should treat temporal modularity rather than spatial modularity. The problem in spatial modularity is to construct a nonlinear function approximator. In contrast, the problem in temporal modularity is to allocate and switch nonlinear function approximators as distributed motor memories. In this paper, we especially focus on temporal modularity. Among studies on MOSAIC, Hierarchical MOSAIC addresses temporal modularity adaptively by using a hierarchical organization (Haruno, Wolpert, & Kawato, 2003; Wolpert, Doya, & Kawato, 2003). 3. Schema model 3.1. Schema system The overall architecture of our schema model in shown in Fig. 2. “Schema” is a term introduced by Piaget (Flavell, 1963) in the context of genetic epistemology. The idea of schema was introduced to the field of human motor skill by Schmidt (1975). The schema system was originally characterized as a dynamic distributed memory architecture that adapts to our environment to enable us to perform in a more natural way. The idea is quite similar to multiple internal models nowadays. Through interactions with the environment, the schema system changes itself based on the fundamental mechanism consisting of two successive phases: assimilation and accommodation. According to the schema theory, a schema assimilates a new experience into the preexisting structure if the experience is consistent with itself, and it modifies the structure to accommodate itself. In terms of schemata, assimilation is the phase during which a schema brings the outside world into the inner world, and

accommodation is the phase during which a schema modifies itself so as to better describe the outside world. The decision of whether a schema will assimilate the incoming experience mainly depends on the schema’s individual characteristics and not on the characteristics relative to other schemata. This cycle of adaptation does not always work well because it may encounter something unexpected. When this happens, the schema system can differentiate its structure dynamically to adapt to the unexpected encounter. In the context of symbol organization in autonomous robots, Taniguchi et al. proposed an incremental modular learning architecture that they call the “dual-schemata model” (Taniguchi & Sawaragi, 2003a, 2003b, 2004a, 2004b) based on the idea. The key difference between most modular learning architectures (Haruno et al., 2001; Jacobs et al., 1991; Murphy, 1998; Tani & Nolfi, 1999; Wolpert & Kawato, 1998) and Taniguchi’s schema model (Taniguchi & Sawaragi, 2004b, 2006) described in this paper is how the system selects an adequate module and makes the module assimilate incoming experiences. Most conventional modular learning architectures (Jacobs et al., 1991; Murphy, 1998; Wolpert & Kawato, 1998), including MOSAIC, use a Bayesian framework to calculate the responsibility signal for each module. This means that the modular learning architecture selects the most appropriate module among the existing modules in a relative manner. In contrast, the schema model uses hypothesis testing theory to calculate appropriate schema (module). This means that each identical schema decides if it can assimilate the incoming experience by itself. The specific algorithm is detailed below. Owing to the key difference, the schema model has different features to conventional Bayesian-based modular learning architectures. 1. The schema model achieves incremental acquisition of modules without any additional rules. 2. It is less sensitive to the initial parameters of each forward model than conventional modular learning architectures. These characteristics of the schema model are evaluated in Section 5. In contrast to the MOSAIC architecture, one of the basic ideas of the schema model is to distinguish spatial modularity and temporal modularity explicitly and to obtain temporal modules incrementally based on hypothesis testing

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

theory. The schema model itself does not have the capability to decompose a nonlinear system to several linear models. Thus, the multiple temporally decomposed modules, each of which obtains a nonlinear forward model and which copes with nonstationary dynamics as a whole, are called schemata. 3.2. Basic assumptions In this subsection, we describe how the schema model acquires forward models incrementally to realize temporal modularity. Generally, environmental and tool dynamics are not linear. In the real world, a linear system is an idealized system. Forward models should therefore be obtained as nonlinear functions. In the context of temporal modularity, there is no need for elementary modules to be linear. This strategy for incremental temporal decomposition was proposed by Taniguchi et al. in their dual-schemata model (Taniguchi & Sawaragi, 2003a, 2003b, 2004a, 2004b). The key ideas are assimilation, accommodation and differentiation. In the general case, the target dynamics is described as below. The dynamics is assumed to be nonlinear and nonstationary.2 x˙ = F(x, u, t) + n(t),

(10)

where n(t) ∼ N (0, Σ ) is a noise term. A schema has a forward model Fˆm and its variance– covariance matrix Σˆ m . In modular learning problems, a forward model represents an environmental dynamics. Usually, a different dynamics has a different noise term having a different and unknown variance–covariance matrix. Therefore, the variance–covariance matrix should be estimated through Σ, interactions. Fˆm and Σˆ m have parameter vectors wmF and wm respectively, and ˆ· means estimated value. In the simplest cases, Σˆ m can be defined to be a constant matrix. However, Σ depends on x and u in several realistic cases. In such cases, Σˆ m is defined as a function outputting a variance–covariance matrix. xˆ˙ m = Fˆm (x, u; wmF )

(11)

Σ T ˆ m (x, u; wm (e[ ), m em ) = Σ

(12)

where m (∈ M) is the schema index, and em is the residual error of the m-th forward model. em = x˙ − x˙m = F(x, u, t) − Fˆm (x, u; wmF ).

(13)

Both Fˆm and Σˆ m gradually change through interactions with the environmental dynamics. The objective functions for Fˆm and Σˆ m are defined below. Fˆm and Σˆ m are modified to minimize the corresponding objective functions, while the schema including these functions assimilates the corresponding environmental dynamics. X Jm = kx(t) ˙ − Fˆm (x(t), u(t); wmF )k2 (14)

17

where Tm is a set of times t when incoming experiences were assimilated by the m-th schema. These optimization problems are obviously reduced to least squares estimation problems. That is, if we prepare general linear approximators as below, the least squares method guarantees the existence of optimal Σ )∗ , where ∗ means optimal value. parameters (wmF )∗ and (wm X F F wmi bmi (x, u) (16) Fˆm = i

Σˆ m =

X

Σ Σ wmi bmi (x, u).

(17)

i

Using one of several solutions for the online least squares estimation problem, we can obtain an adequate parameter of w. These solutions include the recursive least squares method and the steepest descent method. However, the forward models F and bΣ that Fˆm should obtain are nonlinear functions. The bmi mi are basis functions of Fˆm and Σˆ m . There are standard methods in function approximation, e.g. Gaussian functions, normalized Gaussian functions (Sato & Ishii, 2000), polynomial functions (Okada, Tatani, & Nakamura, 2002), CMAC (Albus et al., F and bΣ . If 1975), and other basis functions can be used as bmi mi appropriate basis functions cannot be prepared, several adaptive nonlinear function approximator can be employed, e.g. RBF (Poggio & Girosi, 1990), RFWR (Nakanishi et al., 2005; Schaal & Atkeson, 1997), LWPR (Atkeson et al., 2000; Vijayakumar & Schaal, 2000), and CAN2 (Kurogi & Ren, 1997). In this paper, we focus not on the nonlinear function approximator itself, but on the construction of multiple nonlinear internal models. In other words, we focus on temporal modularity rather than on spatial modularity. 3.3. Switching and creating based on hypothesis testing Here, we describe the schema model’s algorithm for switching and creating. The algorithm corresponds to the assimilation and differentiation dynamics of the schema model (see Fig. 2). First, a subjective error3 Rm is defined as a dimensionless vector: Rm (t) = L −1 m em (t)

(18)

T Σm = L m L m ,

(19)

where L is an upper triangular matrix obtained by Cholesky decomposition (Golub & Van Loan, 1996). Sometimes, we can assume that every element of residual vector em has a normal distribution. Therefore, Rm has a multidimensional standard normal distribution in stationary environments. This assumption provides a specific criterion for judging whether the m-th schema assimilates an observed sample vector or not based on hypothesis testing theory.4 When the

t∈Tm

Jm(2)

=

X

T kem (t)em (t) − Σˆ m (x(t), u(t); wmF )k2 ,

(15)

t∈Tm

2 In this paper, “nonstationary” means occasionally switching dynamics.

3 kRk2 is sometimes called the Mahalanobis distance (Mahalanobis, 1936). 4 If the residual vector does not have a normal distribution, the definition of truth value function µ (see Appendices A and B) should be modified based on the assumed probability distribution. In a practical use, χc2 can be replaced by an appropriate monotonically decreasing function, and α should be adjusted in concordance with the function.

18

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

acceptance region is set to be square with a significance parameter α, a schema activity parameter Vm is defined as: Vm (t) = χc2 (n p · kRm (t)k2∞ , n p ),

(20)

p 2 2 where n p = 1+ 1− p and χc (x, n) = 1 − χ (x, n) (see Appendix A). n p is corresponding to a degree of freedom of χ 2 , and that is calculated by using persistency parameter p(0 ≤ p ≤ 1) which represents short-term memory. The schema activity corresponds to the p-value in statistical hypothesis testing. When schema activity Vm approaches 1, the target dynamics at t is well represented by m-th schema. In contrast, if it is nearly zero, the target dynamics at t is assumed to be irrelevant to the m-th schema. When the acceptance region is set to be circular with significance parameter α, Vm is defined as:

Vm (t) = χc2 (n p · kRm (t)k2 , n p · n)

(21)

(see Appendix B). Based on the schema activity, a criterion for judging whether or not the schema assimilates the environment is derived: µ(Hm ) = sgn(Vm (t) − α).

(22)

This judgement of acceptance can be fuzzified by making the step function a continuous function, like a sigmoid function; however, in this paper we discuss only the case where the truth value is binary. If the dynamics does not change, the schema rejects the incoming sample vector with a probability of α. If α is too big, the schema rejects incoming samples produced by the same dynamics. Therefore, the schema creates a new schema even though there is no change in the environment. In contrast, if α is too small, a schema assimilates every incoming sample and acts as a usual function approximator. Since this criterion allows not only one schema but several schemata to be accepted, we must also define a schema switching mechanism. As described above, the schema model is a modular learning architecture designed to express the incremental acquisition of modules. Therefore, the schema switching rule has to include a differentiation process, which means creating a new schema. We define a schema switching and creating rule using only one equation. To make the equation simple, we define the 0-th schema as a null schema whose activity is always zero. P(m) is the probability that the m-th schema is selected, where ∗ is logical negation. ! m−1 m−1 ^ Y P(m) = P Hk ∧ Hm = µ(Hm ) µ(Hk ). (23) k=0

k=0

This rule means that the m-th schema will be selected if all schemata indexed from the 1st to the (m −1)-th are not accepted and if the m-th schema is accepted. If none of the existing schemata (∀m ∈ M) are accepted, the schema system creates a new schema and adds an m 0 = (#(M) + 1)-th index to the schema. When the new schema is created, its Σˆ m 0 is set large enough for the schema to assimilate unexpected samples, and Fˆm 0 is set randomly. This algorithm is independent of the specific nonlinear function approximators inside each schema.

Fig. 3. Spatial modularity inside a schema.

We call this incremental modular learning architecture, which is based on statistical hypothesis testing theory, the “schema model” (Taniguchi & Sawaragi, 2003a, 2003b, 2004a, 2004b). From the viewpoint of function approximation, it is important that the target dynamics can be represented as a single-valued function during a certain period. Applying a function approximator to samples that came from several different functions or a multivalued function is usually meaningless. To make function approximation meaningful, sample data should be output consistently from a single valued target function except for white noise: i.e. a target dynamics corresponding to a schema should be a single-valued function. Therefore, what the switching rule in the schema model is aiming at is to keep consistency by rejecting incoming samples that are inconsistent with the preexisting schema based on hypothesis testing theory. 4. Schema model with a normalized Gaussian network This section describes the schema model with a normalized Gaussian network (NGSM). As we pointed out in Section 2.1, forward models, or “modules”, should be locally linear to increase their utility. In the MOSAIC architecture, Doya et al. introduced the idea of spatial locality, which determines the prior probability of modules depending on present state x in state space X (Doya et al., 2000). This idea is similar to locally weighted regression (LWR) (Atkeson, Moore, & Schaal, 1997). In LWR, local models are trained only by spatially local data. When each weighting function is a normalized Gaussian function and each local model is linear, the LWR architecture is called a normalized Gaussian network (NGnet) (Moody & Darken, 1989; Sato & Ishii, 2000; Xu et al., 1995, chap. 7). In other words, a normalized Gaussian network spatially decomposes a nonlinear function into several linear functions by using normalized Gaussian gating functions. NGnet is one of the simplest LWR methods. In the NGSM architecture, we deal with spatial modularity by using a normalized Gaussian network as a function approximator (Fig. 3).5 Gating function gmk (x) is defined based on Gaussian function G, where k is the local model index and K is the 5 Other LWR methods can be used in the framework of the schema model, e.g. RFWR, LWPR, and CAN2. In particular, an incremental adaptive nonlinear learning method like RFWR is effective when the dimension of the problem grows (e.g. 7 DOF arm).

19

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

number of gating functions. gmk (x) =

G(x; cmk , Σmk ) K P

(24)

G(x; cml , Σml )

l=1

G(x; c, Σ ) = (2π )−N /2 |Σ |−1/2   1 T −1 × exp − (x − c) Σ (x − c) . 2

(25)

The cmk is a central vector, and Σmk is a variance–covariance matrix. Forward model Fˆm and its variance covariance matrix Σm are decomposed into several spatially local models, fˆmk , Σˆ mk , inside each schema. Fˆm (x, u; wmF ) =

K X

f

gmk (x) fˆmk (x, u; wmk )

Fig. 4. Pendulum.

(26)

k=1 f f fˆmk (x) = Aˆ mk (x − cmk ) + Bˆ mk u F F F wmF ≡ [wm1 , wm2 , . . . , wmk ] f f F wmk = [ Aˆ mk , Bˆ mk ].

(27)

In this section, each local model of Σˆ mk is instead defined as a constant matrix. Fˆm and Σˆ m share the same gating functions and the same central vectors. Σ Σˆ m (x, u; wm )=

K X

gmk (x)Σˆ mk

(28)

k=1

Σ wm

≡ [Σˆ m1 , Σˆ m2 , . . . , Σˆ m K ].

(29)

While LWR can be considered a novel approach to function approximation if we have a gating function to be fixed, it works as a traditional general linear function approximator. Fˆm (x, u; wmF ) =

K X

  f f gmk (x) Aˆ mk (x − cmk ) + Bˆ mk u

(30)

k=1

=

K h X

f Aˆ mk

f Bˆ mk

i

k=1

=

K X

(31)

k=1 F bmk (x, u) ≡ [gmk (x)(x − ckλ ), gmk (x)u]T .

(32)

These equations address the fact that an NGnet whose gating functions are not adaptive is a simple general linear model with M basis functions. As we described in Section 3.2, general linear models are guaranteed to have optimal parameters. We use the gradient descent method for their basic learning dynamics, following MOSAIC and the mixture-of-experts model. Of course, we could also use the recursive least squares method or some other online learning method to minimize its objective function. When (x, u, x) ˙ are observed, each parameter is updated: τF

F dwmk F = em · (bmk (x, u))T dt

5. Simulation We evaluated our NGSM architecture by simulation. The pendulum swing-up task is often used for various nonlinear control tasks. Doya et al. applied the MMRL architecture to this task (Doya et al., 2000) and obtained good simulation results. However, reapplication of the architecture reveals that its strategy for obtaining forward models often works badly in the nonstationary task domain without appropriately designed initial conditions of the learning modules and metaparameters. We thus focused on the acquisition of multiple forward models in a nonstationary pendulum control task environment. 5.1. Environment

 gmk (x)(x − ckλ ) gmk (x)u

F F wmk bmk (x, u)

dΣˆ mk T Σ − Σˆ m (x, u)) · gmk = (em em (x, u), (34) dt where τ F and τΣ are learning time constants. This results in an architecture that can adaptively create and switch nonlinear forward models and whose local models are linear. τΣ

(33)

We simulated an agent swinging a single pendulum and sometimes exchanging it for another one after a certain period. We defined three types of pendulums. Fig. 4 gives an overview of the pendulum. g µ 1 θ¨ = − sin(θ ) − θ˙ + u 2 l mgl mgl 2 pendulum 1 = {l = 2, m = 0.5, µ = 0.1}

(35) (36)

pendulum 2 = {l = 1, m = 1, µ = 0.1}

(37)

pendulum 3 = {l = 0.5, m = 4, µ = 0.1}.

(38)

We set the acceleration of gravitation g to 10 and designed a controller for exploration a priori. du(t) dBt = −cu(t) + k . (39) dt dt The agent’s controller was defined as a simple coloured noise generator, an Ornstein–Uhlenbeck process. To explore the pendulum’s dynamics and to become skilled in the use of the

20

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

Fig. 5. Top: ratio in which each schema was selected in each 10 s timestep; bottom: time course of schema activities. Schema 4 was created by sudden noises which come in when the system switches, but the schema was never used after the creation.

tools, the agent initially swings the pendulum randomly. We set c = 1 and k = 1, so the variance of this stochastic process was σu = 1. We also limited driving torque u to the range [−T max , T max ] and set T max = 10. After acquiring good multiple forward models, the agent obtained a new controller by exploiting the multiple forward models. The usage of these is discussed elsewhere (Doya et al., 2000; Taniguchi & Sawaragi, 2005), so here we do not describe how an agent obtains controllers. The state space was 3-dimensional, i.e. x = (θ, θ˙ , 1). The third term was retained as a constant in the linear regression, so essentially the state space was 2dimensional. θ is restricted within [−3π, 3π]. When θ left this area, it was reset to an initial point decided randomly within [−π, π]. We set the gating function parameters to ckλ = (k − 3)π and σkλ = diag(1, 1, 1), where k = (0, . . . , 6). These settings characterize the basis functions of a normalized Gaussian network as a general function approximator. The learning time dt constants6 were τ F = τΣ = 20 and p = 0.98 (τ p = 1− p = 6 In this study, we focus not on the adaptive nonlinear function approximator, but on incremental acquisition of multiple internal models. Therefore, we set adequate gating functions a priori.

5 (See Eq. (52).)). The difference in time constants between modular acquisition and modular selection is important in the general modular learning architecture. Significance parameter α was set to 0.005. The agent used the first pendulum for 2000 s and then exchanged it for another one. After 4000 s, it started using a third pendulum. After learning these three pendulums, the agent selected each pendulum randomly (see Fig. 5). 5.2. Results Fig. 5 shows the time course of schema activities (defined in Eq. (20)). Schema activity shows how well the schema suits the environmental dynamics. First, pendulum 1 was shown. While the agent controlled pendulum 1, the first schema assimilated the input–output data into the schema, and modified itself (accommodation). The forward model Fˆ1 became accurate and the variance and covariance matrix Σˆ 1 converged to that of the noise of the target system during the learning phase. Then, pendulum 1 was switched to pendulum 2. The prediction error became bigger than that usually output from pendulum 1. The usual error output from pendulum 1 was estimated by Σˆ 1 . Therefore, observing V1 , which was calculated from shortterm averaged Rm , made the first schema recognize that the target dynamics had changed. Then, the second schema was

21

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

Fig. 6. Forward models obtained for each schema: dotted curves are three target dynamics, middle curve is Fˆm , and other curves represent Fˆm ± σˆ m . The horizontal axes represent θ rad, and the vertical axes θ¨ rad/s2 .

created. Similar processes occurred in the second and third schemata after that. After the learning phases, pendulum 1 was shown again (t = 6000 s). At that time, R1 became small; then, the first schema was activated again according to Eqs. from (20) to (23). Each schema was activated alternately, as above described. It was observed that the activities were clearly driven by using the pendulums. Fig. 5 also shows the results of schemata differentiation and which schemata were selected in each timestep. The schemata were clearly differentiated into four parts. Each schema corresponds to a pendulum or a sudden noise. When system dynamics change, a sudden noise which is not modeled by each schema corresponding to each stationary environment comes in. All existing schemata rejected the noise, so new schema (schema 4) was generated. However, the schema did not correspond to the existing stationary environments. Therefore, the schema was not used after the creation. Fig. 5 shows that the schema model allocated a reasonable number of modules for the temporally changeable environment. The forward models obtained in each timestep in the section ˙ u) = (0, 0) are shown in Fig. 6. The local experts obtained (θ, within the first schema are shown in Fig. 7.

5.3. Comparison with several MOSAIC architectures

Here, we compare NGSM with several modular learning architectures based on MOSAIC. MOSAIC is a representative online modular learning architecture in which an adequate module is selected based on a Bayesian rule. It is assumed to be a computational model of human multiple internal models. MOSAIC has various learning algorithms. In these architectures, global forward prediction is given by a weighted sum of linear prediction models fˆm . The weights λm , termed the responsibility signal, are calculated based on a Bayesian rule. xˆ˙ =

#(M) X

f

λm (t) fˆm (x(t), u(t); wm )

(40)

m=1

λm (t) =

  2 λˆm (t) exp − kem2σ(t)k 2 #(M) P i=1

(41)

  2 i (t)k λˆi (t) exp − ke2σ 2

f em (t) = x(t) ˙ − fˆm (x, u; wm ),

(42)

22

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

Fig. 7. Local experts f 1k models obtained for the 1st schema: a fine dotted curve is target dynamics, the broad dotted curve is obtained F1 , and solid lines are the local experts ( f k ). The top is the schema at t = 0, and the bottom is the schema t = 9000.

where λˆ m (t) is a prior probability. We examine three algorithms in this section. One was proposed by Doya et al. in Doya et al. (2000) (MOSAIC). They defined the prior probability as t Q

λˆ T m (t) = λˆ Sm (t) =

k=1

  2 exp −α k |emσ(t)| 2 Z T (t)

exp(−(x(t) − cm )T M −1 (x(t) − cm )) Z S (t)

λˆ m (t) = λˆ Sm (t)λˆ T m (t),

(43)

Name

Prior probabilities

HMOSAIC

λˆ m = λˆ Sm λˆ H m

HMOSAIC w/o spatial prior probability

λˆ m = λˆ H m

MOSAIC

λˆ m = λˆ Sm λˆ T m

MOSAIC(NG)

λˆ m = λˆ T m

(44) (45)

ˆS where λˆ T m is a temporal prior probability, λm is a spatial prior S T probability, and Z and Z are normalizing constants. Another algorithm is Hierarchical MOSAIC (HMOSAIC) proposed by Haruno et al. (2003). HMOSAIC predicts λˆ m (t) based on λm (t − 1) by using an inverse model in a higher level of the hierarchy. H λˆ (t) = Ih (λ(t − 1); whI ),

Table 1 Modular learning architectures and their prior probabilities

(46)

H H H where λˆ = {λˆ1 , λˆ2 , . . . λˆ H #(M) } is a prior probability vector output from the higher level, and λ = {λ1 , λ2 , . . . λ#(M) } is the vector of responsibility signals of lower-level modules. The other is a MOSAIC whose unit of the internal models is considered to be a nonlinear function approximator. As an example, and as the third algorithm, we define MOSAIC with a normalized Gaussian network (MOSAIC(NG)) as a MOSAIC having NGnets as forward models f m . We can naturally define responsibility signal λmk in NGSM corresponding to responsibility signal λm in MOSAIC:

λmk (t) = gmk (x)P(m).

(47)

This shows that the computational difference between NGSM and MOSAIC is how the responsibility signal is calculated. The next question is how the difference between NGSM and MOSAIC affects the process through which multiple forward models are obtained. Table 1 shows the prior probabilities assumed by each learning architecture. It shows that MOSAIC(NG) does not have a spatial prior probability. However, NGnet itself has spatial gating functions gmk . Additionally, we define a suffix * that means that the initial parameters of forward models are set to be almost optimal, e.g. MOSAIC*, HMOSAIC*, and MOSAIC(NG)*. Initial parameters of the forward models without suffix * are set to random values. We tested several MOSAIC architectures in the same environment using 28 modules (4 × 7), which is equal to the number of local linear modules obtained in the NGSM simulation. We introduced spatial locality similar to that in the NGSM simulation to the modules in the MOSAIC. We set the spatial prior probabilities in MOSAIC parameters to cm = (m (7) − 3)π and Mm = diag(1, 1, 1), where m =

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

23

Fig. 8. Time courses of smoothed squared prediction errors with each modular learning architecture. Each dotted line represents the standard deviation.

(1, . . . , 28), m (7) ≡ m (mod 7).7 MOSAIC(NG) has four modules and they have NGnets same as NGSM. Fig. 8 shows the smoothed time course of the averaged prediction errors of six experiments with NGSM, MOSAIC*, HMOSAIC*, and MOSAIC(NG)*. It shows that MOSAIC* and HMOSAIC* could not obtained accurate multiple forward models in nonlinear and nonstationary environments. MOSAIC(NG)* outperformed the other architectures. Although NGSM is not as accurate as MOSAIC(NG)*, it is better than MOSAIC* and HMOSAIC* composed of linear models. The averaged errors in the testing phase (6000 < t < 9000) are shown in Fig. 9 with respect to each pendulum. These errors also show that NGSM and MOSAIC(NG) could cope with nonlinear and nonstationary environment. This shows that acquiring multiple linear internal models in nonlinear and nonstationary environments is difficult without adequate 7 These spatial prior probabilities are almost the same as the parameters acquired in Doya et al. (2000). They are almost optimal values.

initial parameters in MOSAIC and HMOSAIC. Without spatial prior probabilities, it was almost impossible for HMOSAIC to cope with nonlinear and nonstationary environments. Second, we compared the differentiation and adaptation processes of NGSM and MOSAIC. In MOSAIC, we set p = 0.1 and σ = 0.1. We initially set p = 0.98, the same as in the NGSM experiment, but MOSAIC could not obtain properly separated multiple internal models. This was due to the MOSAIC architecture managing spatial modularity in the same way as temporal modularity although they have different time constants. When σ was set higher, all modules sharing the same spatial prior probability, aggregated into the same local forward model. We thus searched for good metaparameters for this target dynamics. In our search, the suitable parameter range was very narrow and depended on the range of x˙ produced by the target dynamics because metaparameter σ is not a dimensionless number. In contrast, metaparameter α of the schema model, which characterizes the schema differentiation process, is easy to tune because

24

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

Fig. 9. Averaged prediction errors calculated by each modular learning architecture during the testing phase.

it is a dimensionless number. How large α should be set is independent of the range of x˙ and other task-dependent ¨ parameters. The change in the position gain ({ Aˆ 21 } = ∂∂θθ ) of the four prediction models for each architecture is shown in Fig. 10. We focused on the dynamics near (θ, θ˙ ) = (0, 0); the MOSAIC parameters of fˆ4 , fˆ11 , fˆ18 ,and fˆ25 are shown in the figure. This figure shows that NGSM acquired multiple internal models incrementally. MOSAIC also acquired differentiated internal models if adequate initial parameters and σ were given beforehand. Next, we evaluated how well the learning of multiple internal models discriminated pendulums. To quantify the discrimination, we introduce the number of channels M based on mutual information I (X ; Y ).

Fig. 10. Differentiation and adaptation processes in NGSM (top) and MOSAIC (bottom).

M = exp(I (X ; Y )) = exp

X x∈X,y∈Y

 P(x, y) log

P(x, y) P(x)P(y)

!

,

(48)

where P(x) is the probability that the x-th pendulum is selected, P(y) is the probability that the y-th module in MOSAIC or the y-th schema in NGSM is selected, and P(x, y) is the probability that the x-th pendulum and yth module/schema are selected at the same time. They were calculated from the experimental results. In the next experiment, pendulums were switched cyclically every Tc s during 6000 s. If a modular learning architecture discriminated the shifts among the pendulums completely, M became 3 in this case. In contrast, if it could not distinguish the shifts or if it discriminated completely at random, M became 1. Therefore, the number of channels M represents how many pendulums the modular learning architecture distinguished properly. Six experiments were executed for each condition, and M was calculated. Fig. 11 shows averaged M with error bars for each condition. NGSM and MOSAIC(NG)* were able to discriminate the pendulums. However, MOSAIC(NG), whose initial parameters were set at random, could not discriminate

Fig. 11. Number of channels that emerged through the interactions with the cyclically switching environment.

the pendulums. In contrast, NGSM whose initial parameters were set at random could discriminate the pendulums. It was also difficult for HMOSAIC* to distinguish the pendulums. Additionally, Fig. 11 shows that fewer channels are emerged in MOSAIC(NG)* and NGSM when the environment changed more frequently. It was reported that frequent switching of the target dynamics without any presentations of audio and/or visual cues prevents people acquiring multiple internal models (Brashers-Krug et al., 1996; Karniel & Mussa-Ivaldi, 2002; Osu

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

25

Fig. 13. Desired trajectories xd and the time series produced by NGSM (top) and HMOSAIC* (bottom) using inverse models.

Fig. 12. Controller structure used in the control task.

et al., 2004). Fig. 11 shows that the schema model also has this property. Finally, the controllers in NGSM and HMOSAIC* were tested. The controller structure is shown in Fig. 12. When the schema activities or responsibility signals are calculated, the adequate forward model F can be obtained as a weighted sum of Fm . To track the desired trajectory xd , inverse model I is calculated by using Eq. (8). The desired trajectory xd = (xd1 , xd2 ) ≡ (xd1 , x˙d1 ) was defined as (π (1 + cos(2πt)) if sin(π t) > 0, (49) xd1 = 2 1 if sin(π t) ≤ 0. 10 We set the feedback gain K = 10 10 10 . In this experiment, the pendulum was switched from 1 to 2 at t = 150 s. Fig. 12 shows the desired trajectories and the time series produced by NGSM and HMOSAIC*. Both modular learning architectures coped with the environmental change and continued to track the desired trajectories (Fig. 13). In this experiment, no significant differences were observed in this experiment in spite of the differences with respect to prediction errors. As a result, the results in Fig. 11 shows that the schema model is effective at keeping consistency for a schema and distinguishing several unknown sets of dynamics. However, Fig. 8 shows that tuned MOSAIC(NG) is the most effective at reducing immediate prediction errors. 6. Conclusion The schema model is an alternative computational model for online acquisition of multiple internal models. As a specific example of a schema model, we described a schema model

with a normalized Gaussian network (NGSM). NGSM retains several essential points of MOSAIC: a module’s responsibility signal is determined mainly by the prediction error of the modules, and local forward models are linear. NGSM has two novel characteristics. It achieves incremental acquisition of nonlinear forward models based on hypothesis testing theory rather than on a Bayesian framework. The schema model is not a modular learning architecture that decomposes a nonlinear dynamics into several local linear models, but one that decomposes a nonstationary dynamics into several temporally stationary dynamics. Schema model itself does not have the capability to cope with nonlinear dynamics. In NGSM, a normalized Gaussian network, which is a basic locally weighted regression method, play the role of decomposing nonlinear dynamics to linear models. Several experiments showed that decomposing nonstationary and nonlinear systems at the same time is difficult for MOSAIC and HMOSAIC, which contain local linear models. In our experiments, NGSM and MOSAIC(NG) could cope with nonstationary and nonlinear dynamics. These modular learning architectures decompose a nonstationary and nonlinear dynamics into several stationary nonlinear dynamics, i.e. several pendulums. Additionally, it was shown that the acquisition of multiple internal models in the schema model does not depend on the initial parameters of the forward models, in contrast to MOSAIC(NG) (see Fig. 11). However, the duration in which the variance–covariance matrix S is estimated is critical in the schema model. If the environment changes before S converges, the hypothesis testing will not correctly detect the change in dynamics. To avoid creating new schemata, initially S must be set to a large value so that incoming data can be assimilated. If the stationary environments switches with rapidity, the schemata will not differentiate, multiple internal models will not be obtained, and the two dynamics are tried to be assimilated into a single schema. Therefore, the length of the presentation of the dynamics is a critical issue. However, how the differentiation process depends on the length of the presentation of dynamics has not been revealed mathematically. It must be analysed in a future work. In the context of motor memory consolidation,

26

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

it is difficult for the human central nervous system to obtain multiple internal models in frequently switching environments (Brashers-Krug et al., 1996; Karniel & Mussa-Ivaldi, 2002; Osu et al., 2004). It was shown that the schema model also has this characteristic. The notion of modular learning with a control architecture has been proposed as a computational model of the cerebellum. Although this idea is supported by fMRI-based experiments (Imamizu et al., 2000, 2003), such neural imaging experiments have been unable to restrict the specific computational architecture of the learning architecture of multiple forward models. While there are several possible computational models for computing multiple internal models, it was shown that the schema model could be one of the alternatives. Acknowledgments This work was supported in part by the Centre of Excellence for Research and Education on Complex Functional Mechanical Systems (The 21st Century COE programme of the Ministry of Education, Culture, Sports, Science and Technology, Japan). We thank K. Doya, anonymous reviewers for helpful discussions and advice.

the hypothesis (F = Fˆm ) based on a χ 2 -test, and it does not assimilate the experience if the hypothesis is rejected. We define Hm as the hypothesis that the current environment does not contradict the m-th schema’s prediction and Hmi as the hypothesis that the current environment does not contradict the m-th schema’s i-th dimensional prediction. The relationship between Hm and Hmi is thus (53)

Hm = Hm1 ∧ Hm2 ∧ · · · ∧ Hmn Hmi

⇐⇒ Fi = Fˆmi

(54)

2 (t) = 0. ⇐⇒ Rmi

(55)

In a hypothesis-testing framework, the acceptance of a hypothesis is decided based on significance level α. We can obtain truth value function µ, whose codomain is a set of truth values, by using a simple step function sgn to express whether or not the hypothesis is accepted under the condition that the significance level is α. ! n ^ µ(Hm ) = µ (56) Hmi i=1 n

= min(µ(Hmi ))

(57)

i=1

Appendix A In this appendix, we derive a criterion for a schema to decide if it assimilates incoming samples based on a square acceptance region. When stochastic variable X i has a standard normal Pn distribution, Z = i=1 X i2 has a χ 2 distribution with n degrees of freedom. Therefore, P Nthe distribution of the averaged squared subjective errors N1 i=1 Rm (ti ) can be calculated. However, in the context of online learning, weighted averaged errors are often preferred. We therefore introduce a weighted averaged subjective error with persistency parameter p (0 ≤ p < 1): 2 (t) = (1 − p) Rmi

∞ X

2 p k · Rmi (t − kdt)

(50)

k=0 2 2 (t − dt), = (1 − p) · Rmi (t − dt) + p · Rmi   Z ∞ 1 s 2 ∼ exp − Rmi (t − s)ds, τ τ p p 0

    n 2 (t), n = min sgn χc2 n p · Rmi p −α i=1   n  2 (t), n ) − α = sgn min χc2 (n p · Rmi p i=1

 = sgn

χc2



n

n p · max i=1

2 (t), n Rmi p



−α

    = sgn χc2 n p · kRm k2∞ (t), n p − α  1 if x > 0 sgn(x) = 0 otherwise.

(58) (59)

 (60) (61) (62)

1+ p 2 2 Here, n p = 1− p and χc (x, n) = 1 − χ (x, n). Therefore, we define a schema activity parameter:

(51)

Vm (t) = χc2 (n p · kRm k2∞ (t), n p ).

(52)

The final result is a criterion for judging whether or not the schema assimilates the environment:

where dt is the observation timestep. In continuous time, we can obtain the third equation by making dt approach zero. Since this weighted averaged subjective error can be calculated incrementally, as shown above, there is no need to store past ones. This averaging operation corresponds to the concept of temporal continuity in MMRL (Doya et al., 2000). The 2 (t) multiplied by 1+ p has an averaged subjective error Rmi 1− p p 8 approximate χ 2 distribution with 1+ 1− p degrees of freedom. We 2 denote a χ cumulative distribution function with n degrees of freedom as χ 2 (x, n). In our framework, each schema tests

8 The approximation is derived by fitting their first and second moments.

µ(Hm ) = sgn(Vm (t) − α).

(63)

(64)

The criterion is based on dimensionless vector Rm , so it does not depend on the scale or on the amount of its noise, unlike simple thresholding of the prediction error. Appendix B In this section, we derive a criterion for a schema to decide whether of not it assimilates incoming samples based on a circular acceptance region. Rm has a multidimensional standard normal distribution, so kRm k2 has a χ 2 distribution with n degrees of freedom. We therefore introduce a weighted

T. Taniguchi, T. Sawaragi / Neural Networks 21 (2008) 13–27

averaged squared subjective error with persistency parameter p (0 ≤ p < 1) in the similar way as Appendix A: kRm k2 (t) = (1 − p)

∞ X

p k · kRm k2 (t − kdt).

(65)

k=0

By using the same approximation as in Appendix A, we find that kRm k2 multiplied by n p has a χ 2 distribution with n p · n degrees of freedom, where n is the dimension of Rm . In this case, a schema activity parameter is defined as Vm (t) = χc2 (n p · kRm k2 (t), n p · n).

(66)

The final result is a criterion for judging whether or not the schema assimilates the environment: µ(Hm ) = sgn(Vm (t) − α).

(67)

References Albus, J. S., et al. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement and Control, 97(3), 220–227. Atkeson, C. G., et al. (2000). Using humanoid robots to study human behavior. Intelligent Systems and Their Applications, IEEE, 15(4), 46–56. [see also IEEE Intelligent Systems]. Atkeson, C. G., Moore, A., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73. Brashers-Krug, T., et al. (1996). Consolidation in human motor memory. Nature, 3382, 252–255. Doya, K. (1999). What are the computations of the cerebllum, the basal ganglia, and the cerebral cortex? Neural Networks, 12, 961–974. Doya, K., et al. (2000). Multiple model-based reinforcement learning. Neural Computation, 14, 1347–1369. Golub, Gene H., & Van Loan, Charles F. (1996). Matrix computations (3rd ed.). Johns Hopkins University Press. Haruno, M., Wolpert, D. M., & Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13, 2201–2220. Haruno, M., Wolpert, D. M., & Kawato, M. (2003). Hierarchical mosaic for movement generation. International Congress Series, 1250, 575–590. Imamizu, H., et al. (2000). Human cerebellar activity reflecting an acquired internal model of a new tool. Nature, 403, 192–195. Imamizu, H., et al. (2003). Modular organization of internal models of tools in the human cerebellum. Proceedings of the National Academy of Sciences USA, 100, 5461–5466. Imamizu, H., et al. (2004). Functional magnetic resonance imaging examination of two modular architectures for switching multiple internal models. The Journal of Neuroscience, 24(5), 1173–1181. Jacobs, R. A., et al. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87. Flavell, J. H. (1963). The developmental psychology of jean piaget. Van Nostrand Reinhold. Karniel, A., & Mussa-Ivaldi, F. A. (2002). Does the motor control system use multiple models and context switching to cope with a variable environment? Experimental Brain Research, 143, 520–524. Kawato, M. (1999). Internal models for motor control and trajectory planning. Current Opinion in Neurobiology, 9, 718–727. Kurogi, S., & Ren, S. (1997). Competitive associative networks for function approximation and control of plants. In Proc. NOLTA97 (pp. 775–778).

27

Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences India, 2(1), 49–55. Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281–294. Murphy, K. (1998). Learning switching Kalman-filter models. Compaq Cambridge Research Lab Tech Report, pp. 98–10. Nakanishi, J., Farrell, J., & Shaal, S. (2005). Composite adaptive control with locally weighted statistical learning. Neural Networks, 1, 71–90. Okada, M., Tatani, K., & Nakamura, Y. (2002). Polynomial design of the nonlinear dynamics for the brain-like information processing of whole body motion. In International conference on robotics & automation. Osu, R., et al. (2004). Random presentation enables subjects to adapt to two opposing forces on the hand. Nature Neuroscience, 7(2), 111–112. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497. Sato, M., & Ishii, S. (2000). On-line em algorithm for normalized gaussian network. Neural Computation, 12, 407–432. Schaal, S., & Atkeson, C. G. (1997). Constructive incremental learning from only local information. Neural Computation, 10(8), 2047–2084. Schmidt, R. A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82(4), 225–260. Tani, J., & Nolfi, S. (1999). Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Networks, 12, 1131–1141. Taniguchi, T., & Sawaragi, T. (2003a). An approach of self-organizational learning system of autonomous robots by grounding symbols through interaction with their environment. In SICE annual conference 2003 proceedings (pp. 2259–2264). Taniguchi, T., & Sawaragi, T. (2003b). Assimilation and accommodation for self-organizational learning of autonomous robots: Proposal of dualschemata model. In IEEE international symposium on CIRA 2003 proceedings (pp. 277–282). Taniguchi, T., & Sawaragi, T. (2004a). Design and performance of symbols self-organized within an autonomous agent interacting with varied environments. In IEEE international workshop on RO-MAN proceedings in CD-ROM. Taniguchi, T., & Sawaragi, T. (2004b). Self-organization of inner symbols for chase: Symbol organization and embodiment. In IEEE international conference on SMC 2004 proceedings in CD-ROM. Taniguchi, T., & Sawaragi, T. (2005). Adaptive organization of generalized behavioral concepts for autonomous robots: Schema-based modular reinforcement learning. In IEEE international symposium on CIRA 2005 proceedings in CD-ROM. Taniguchi, T., & Sawaragi, T. (2006). Incremental acquisition of behavioral concepts through social interactions with a caregiver. In Artificial life and robotics (AROB 11th ’06) proceedings in CD-ROM. Vijayakumar, S., & Schaal, S. (2000). Fast and efficient incremental learning for high-dimensionalmovement systems. In Proceedings of ICRA20000. International conference on robotics and automation (pp. 1894–1899), vol. 2. Wolpert, D. M., Doya, K., & Kawato, M. (2003). A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society of London Series B, 358, 593–602. Wolpert, D. M., & Ghahramani, Z. (2000). Computational principles of movement neuroscience. Nature Neuroscience Supplement, 3, 1212–1217. Wolpert, D. M., Ghahramani, Z., & Jordan, M. I. (1995). An internal model for sensorimotor integration. Science, 269, 1880–1882. Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for motor control. Neural Networks, 11, 1317–1329. Xu, L., et al. (1995). An alternative model for mixtures of experts. In Advances in neural information processing system (pp. 640–663). Cambridge: MIT Press.