Tuning positive feedback for signal detection in ... - Semantic Scholar

Journal of Theoretical Biology 309 (2012) 88–95

Contents lists available at SciVerse ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Tuning positive feedback for signal detection in noisy dynamic environments Anders Johansson a,c, Kai Ramsch b, Martin Middendorf b, David J.T. Sumpter a,n a b c

Mathematics Department, Uppsala University, Uppsala, Sweden Institut f¨ ur Informatik, Universit¨ at Leipzig, Leipzig, Germany Mathematics Department, University of G¨ avle, G¨ avle, Sweden

H I G H L I G H T S c c c

We study algorithms used by decentralised biological systems to track signals. We show that these algorithms can detect weak signals in the presence of noise. This work has applications in biologically inspired algorithms for optimisation.

a r t i c l e i n f o

a b s t r a c t

Article history: Received 19 December 2011 Received in revised form 15 May 2012 Accepted 22 May 2012 Available online 31 May 2012

Learning from previous actions is a key feature of decision-making. Diverse biological systems, from neuronal assemblies to insect societies, use a combination of positive feedback and forgetting of stored memories to process and respond to input signals. Here we look how these systems deal with a dynamic two-armed bandit problem of detecting a very weak signal in the presence of a high degree of noise. We show that by tuning the form of positive feedback and the decay rate to appropriate values, a single tracking variable can effectively detect dynamic inputs even in the presence of a large degree of noise. In particular, we show that when tuned appropriately a simple positive feedback algorithm is Fisher efficient, in that it can track changes in a signal on a time of order LðhÞ ¼ ð9h9=sÞ2 , where 9h9 is the magnitude of the signal and s the magnitude of the noise. & 2012 Published by Elsevier Ltd.

Keywords: Self-organisation Ant foraging Bandit problems Biochemical systems

1. Introduction Many biological systems have evolved to detect and respond to noisy signals (Green and Swets, 1966; van Drongelen, 2006; Couzin, 2009; Smith and Ratcliff, 2004; Brandman and Meyer, 2008; Brandman et al., 2005; Sumpter and Pratt, 2009; Marshall et al., 2009). Neuronal assemblies integrate noisy sensory information to make decisions about how to respond to stimuli (Gold and Shadlen, 2007; Feng et al., 2009; Bogacz et al., 2006; Brown et al., 2005; Simen et al., 2006). Ant colonies use pheromones to integrate information about the location of food and other resources (Detrain and Deneubourg, 2008; Bonabeau et al., 1997). These systems accumulate information based on past experience and continuously learn the best action to take. While one can argue whether, for example, biochemical systems can be said to be ‘learning’ or ‘making decisions’, the function of these systems is similar at many different levels of biological organization (Trewavas, 2009). Given weak, noisy evidence for one of a number of options these systems attempt to choose the best available option.

n

Corresponding author. E-mail address: [email protected] (D.J.T. Sumpter).

0022-5193/$ - see front matter & 2012 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.jtbi.2012.05.023

A striking observation about the above systems is the similarities in the mechanisms underlying decision-making (Couzin, 2009; Marshall et al., 2009). The process by which a system learns can be described in terms of one or more tracking variables which encode the system’s propensity to choose a particular option. Furthermore, the dynamics of these tracking variables usually involve three key components: (1) an increase in response to the noisy input signal; (2) a sharp increase at a threshold of the input signal or the tracking variable itself; and (3) an exponential decay in the absence of external signal. We now describe three examples with these components and to which the models now developed in this paper should apply: ant pheromone trails, neuronal assemblies and biochemical reactions. Many species of ants when foraging for food leave pheromone trails which track the existence and quality of food at various locations. When offered the choice between n options the amount of pheromone at time step t þ 1 leading to option i can be modelled as a difference equation: ðC t ðiÞ þ KÞa C t þ 1 ðiÞ ¼ ð1lÞC t ðiÞ þ hi Pn a j ¼ 1 ðC t ðjÞ þ KÞ

ð1Þ

where l is the evaporation rate, hi is the input signal indicating the quality of option i, and a and K determine the shape of the

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95

response (Nicolis and Deneubourg, 1999). When a 42 the second term in Eq. (1) provides a sigmoidal-like response, whereby if one of the Ct(i) exceeds all the other Ct(j) then it becomes disproportionally likely to be chosen on future time steps. Similar responses, sometimes referred to as quorums, have been observed in the decision-making of a wide range of other group-living animals (Ame et al., 2006; Sumpter et al., 2008; Sumpter and Pratt, 2009). This model has been widely studied and validated against experimental data (Deneubourg et al., 1990; Goss et al., 1989; Dussutour et al., 2005; Bonabeau et al., 1997). Here the variables Ct(i) are deterministic, but a corresponding stochastic model is easily constructed (Nicolis et al., 2003). This model forms the basis for the feedback in the ACO metaheuristic (Dorigo and ¨ Stutzle, 2004) and other applications in swarm intelligence (Bonabeau et al., 1999). The leaky competing accumulator model for two competing neuronal assemblies x1 and x2 is a stochastic differential equation with the form (Usher and McClelland, 2001; Feng et al., 2009; Brown et al., 2005): dx1 ¼ ðlx1 bf ðx2 Þ þ h1 Þ dt þ sdW 1

ð2Þ

dx2 ¼ ðlx2 bf ðx1 Þ þ h2 Þ dt þ sdW 2

ð3Þ

where l is the rate of memory decay, hi is the input signal, and b is a factor controlling the influence of the competing neuron. The noise is now explicitly represented as standard Brownian motion or a Wienner process with standard deviation of size s. As in Eq. (1), the function f(x) is defined to be sigmoidal, but instead of x1 having an excitatory effect on itself, x1 laterally inhibits x2 and vice-versa. As a result the tracking variable for which hi is greater becomes disproportionally stronger. For parameter values where decision accuracy is optimised, this model fits experimental data in which monkeys attempt to ascertain the direction of a noisy random dot stimuli (Feng et al., 2009). Sigmoidal positive feedback responses are a ubiquitous feature of biochemical systems (De Jong, 2002). Indeed, it is difficult to write down a model of gene regulation or molecular dynamics that does not involve some form of sigmoidal or Hill response function. A basic function of cells during, for example, organism development is to track and respond to signals emitted by other cells (Lewis, 2008). When interacting with each other and their environment, these systems employ quorum-like responses to filter out noise in the signals they receive (Tanouchi et al., 2008; Andrews et al., 2006). Here we investigate tracking variables for decision-making in a dynamic noisy environment. We first present a dynamic problem of tracking the difference between two variable and propose a learning system based on a tracking variable which encodes an agent’s memory. We identify the optimal parameters for this model and show that they depend on the strength of the signal, implying that the system is scale dependent. We extend this result to the case where there are two alternatives given and we wish to determine which gives the highest expected reward at the present point in time. We show how this result may be used to tune positive feedback algorithms to respond optimally in dynamic environments.

2. Tracking reward differences We consider the following setup for decision-making in a noisy environment. On each time step t an agent chooses one of two actions, At A f1; 1g, on the basis of observations of a single signal (all symbols used in the paper are summarised in Table 1). For example, in chemotaxis some bacteria species set tumbling frequency on the basis of noisy information about the nutrient gradient they experience (Andrews et al., 2006; Barkai and

89

Table 1 List of parameter, variables and symbols used and their interpretation in the tracking systems. Param.

Description

At ht N t ,N t,At

bop

Action an agent takes at time t Hidden signal to be detected at time t Gaussian noise at time t Standard deviation of Gaussian noise Observed signal at time t Constant absolute value used for ht Lower bound for detection of a signal h Tracking variable at time t Function describing the dynamics of yt þ 1 Step size of memory decay Strength of non-linearity Stable fixed point of yt Operational value of l Performance of simulation run Reward an agents obtains from action At at time t Constant reward obtained Pheromone values for action þ1 or  1 at time t Operational value of b

l~

Parameter values for l maximising the performance s

s xt h L(h) yt Fðyt ,xt Þ

l b yn

lop s r t ðAt Þ V z þ 1,t ,z1,t

x(t)

h(t)

t

L

y(t)

Fig. 1. The input signal ht, the observed signal xt and an idealised yt for detecting signal differences. The time until the action changes in response to the input signal, Lðht Þ, is always of the order ðht =sÞ2 .

Leibler, 1997). For some hidden signal ht, the observed signal is xt ¼ ht þN t

ð4Þ 2

where Nt  Nð0, s Þ are Gaussian random variables of mean zero and variance s2 . We make the problem of learning the correct action difficult by making the magnitude 9ht 9 small relative to the magnitude of the noise s. Furthermore, over time ht changes its sign and we allow the agent no a-priori knowledge of the pattern of these changes. Our aim is to design a system which detects changes in ht of a particular magnitude as rapidly as possible. Typically, we assume s to be known and constant. We will also generally assume that the signal ht is a piecewise constant process as in Fig. 1. These assumptions are not essential to what follows, but it does allow us to simplify the discussion. We note that if ht is equal P to a constant h in the interval ½0,tÞ then the average x t ¼ 1t ts ¼ 0 xs 1 2 has distribution Nðh, t s Þ. There are proven limits to how fast one can detect a signal subjected to a certain level of noise. The Cramer (1946) and Radhakrishna Rao (1945) inequality and its generalizations, such as the Chapman and Robbins (1951) bound give a fundamental lower bound on the time necessary to sample in order to detect the sign of ht. In our scenario the Chapman–Robbins bound shows that in order

90

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95

to infer, with a small error probability, the sign of h ¼ Eðxt Þ, we always need to observe the process for a time of order L(h), where LðhÞ ¼ ð9h9=sÞ2 . We call the ratio 9h9=s the signal-to-noise ratio since ht and s can be identified with the signal and noise amplitudes, respectively. Amari (1998) uses the term Fisher efficient to describe systems that learn within a time given by the Crame´r–Rao bounds. In this paper, we use the term Fisher efficient to describe systems that learn within a small constant multiplied by L(h). In the dynamic setting the lower bound of Fisher efficiency implies that we must always expect a delay in our knowledge of the sign of ht in order of a small constant multiplied by Lðht Þ. When ht ¼ h does not change in time there is a Fisher efficient and scale-robust methodology for estimating h for all h 4 0. Under the classical stochastic approximation paradigm one would take the sign of the average input signal At þ 1 ¼ signðx t Þ, and obtain that a:s: At !signðhÞ (see, e.g., Borkar, 2008). Calculating the log-likelihood of the hypotheses h 40 and h o 0 to the sequence signðx t Þ provides confidence intervals for this estimate. However, these methods cannot be applied when ht is changing dynamically. For example, if ht oscillates symmetrically around zero then the stochastic approximation will, due to a failure to unlearn its previous samples, get locked in to the one alternative chosen first. Various extensions of the stochastic approximation paradigm attempt to adjust the time-scale (i.e., the constant step-size l in (5)) to the case of slowly varying signals (see Kushner and Yin, 1997, Chapter 3 for references). We now take inspiration from the models of biological systems discussed in Introduction, to construct a means of tracking the signal using a single ‘pheromone’ or memory variable. To avoid being too specific about the type of biological system we refer to, we call this ‘pheromone’ or memory a tracking variable and denote it by yt. The variable yt is updated on the basis of the difference between the rewards, i.e. xs ¼ hs þ Ns , s o t, and transmits information about the sign of ht, although not necessarily information about the exact value of ht. We stipulate that yt has threshold-like positive feedback on itself, so that when yt is positive it increases and when yt is negative it decreases. Specifically, yt þ 1 ¼ Fðyt ,xt Þ ¼ ð1lÞyt   expðbyt Þ expðbyt Þ  þ xt þl expðbyt Þ þ expðbyt Þ expðbyt Þ þ expðbyt Þ

This tracking variable is designed to have the same sign as ht. The term ð1lÞyt is the tracking signal decay, the function tanh determines the degree to which a signal is reinforced and xt is the signal itself. b is the strength of the non-linearity. The action or choice of the agent can be obtained by the tracking variable simply as At ¼ signðyt Þ. We now investigate properties of our proposed tracking variable, yt. When b 4 1 and h¼0 the function Fðy,0Þ in (5) has two stable fixed points and one unstable fixed point at 0 (see Fig. 2). In this case, the addition of a bias, i.e., h 4 0, does not necessarily lead to a switch to the steady state associated with the higher rewards. For example, if b ¼ 1:1 and h o 0:019 then the suboptimal choice remains a stable fixed point when the bias is added. For b r 1 there exists a single stable fixed point yn for Fðyn ,hÞ ¼ yn . In order to track a signal h 40 with maximum effectiveness this fixed point should be as far away as possible from zero. The fixed point is the (in this case unique) solution to yn ¼ tanhðbyn Þ þh

1=3

yn  31=3 h

ð5Þ

ð7Þ

The rate of convergence to this fixed point is determined by 2

F 0 ðyn ,hÞ ¼ 1l tanh ðyn Þ ¼ 1lðyn hÞ2  r :¼ 1ðyn hÞ2 l32=3 h

−1.6

−1.2

−0.8

In its linearised form this model is equivalent to the standard Uhlenbeck and Ornstein (1930) process. Fluctuations around yn thus have variance 1r 2



ls2 23

2=3 2=3

h

o

ls2 2=3

4h

0.8

0.8

0.4

0.4

−0.4

0.0 0.0

0.4

0.8

1.2

1.6

2.0

−2.0

−1.6

−1.2

−0.8

−0.4

0.0

0.4

0.8

1.2

1.6

2.0

−0.4

−0.4

−0.8

ð8Þ

yt þ 1 yn ¼ Fðyt ,hþ Nt Þyn ¼ Fðyt ,hÞyn þ lNt  Fðyn ,hÞ þ rðyt yn Þyn þ lNt ¼ rðyt yn Þ þ lNt

0.0 −2.0

2=3

For y near to yn we can approximate Eq. (5), this time accounting for the noise Nt, with the linear stochastic process:

l2 VarðNt Þ

:¼ ð1lÞyt þ lðtanhðbyt Þ þ xt Þ

ð6Þ

Since tanhðyÞ is monotone increasing, yn is maximised for b ¼ 1. Furthermore, for all y oyn , Fðy,hÞ is maximised when b ¼ 1. We thus conclude that, for arbitrary h, b ¼ 1 is the only parameter which guarantees a single fixed point, while providing the strongest tracking of that signal. When b ¼ 1 and h 4 0 is small we, using a Taylor expansion, find an approximate solution to Eq. (6),

f(y)=y f(y)=tanh(0.5 y) f(y)=tanh(1.5 y)

−0.8

f(y)=y f(y)=tanh(y) f(y)=tanh(y)+h

Fig. 2. (a) The function tanhðbyÞ with the stable fixed point at zero for b ¼ 0:5 and two stable and one unstable fixed points for b ¼ 1:5; (b) tanhðyÞ with a neutral fixed point 1=3 at zero and the translated function tanhðyÞþ h with h ¼ 0.1 having fixed point yn ¼ ð1 þ oð1ÞÞ31=3 h .

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95 4=3

In a purely deterministic system (as in Eq. (8)), r determines the rate of convergence of yt yn and in order to detect the signal h we would like r to be as small as possible. Hence l should be as large as possible. However, in the stochastic system l also appears in the variance of the process. Too large a value of r will lead to fluctuations which take us away from yn. In particular, detecting a signal requires the standard deviation of the fluctuations around the steady state yn to be smaller than yn. We want to identify an operational value of lop such that 1=3

31=3 h

(denoted by lop ) should be proportional to h . By operational we mean that the system in (5) is characterised by two properties. First, the fluctuations yt yn of the system around the fixed point yn should be small enough to make a typical output to be close to yn. Second, the system should move, with high probability, from any initial value y0 A ½1; 1 to one close to yn in a time of order OðLÞ where L ¼ LðhÞ ¼ ðs=hÞ2 . As discussed above, this is the order of time within which the signal h can be detected at all. From (5) it is clear that, excluding extremely rare events, we can assume that yt stays within the bounded interval ½1; 1.

l1=2 op s

¼ pffiffiffi 1=3 1=3 23 h

Lemma 1. Assume that ht ¼ h4 0 and that y0 A ½1; 1. For every E 40 and d 4 0, there is a c ¼ cðd, EÞ 40, such that

which gives

lop 

2h

91

4=3

Prfyt 4ð1dÞyn g 4 1E

ð9Þ

s2

ð10Þ

if

as the optimal value for detecting a signal of size h. As an example of how our approximation works in practice we simulate Eq. (5) for varying values of l, and an input signal for 2 which ht changes sign every 2LðhÞ ¼ 2s2 =h , where h is the magnitude of ht. Since L(h) is the theoretical minimum time to detect a signal, the signal is at very best detectable for the last L(h) time steps before the sign of ht is changed. Fig. 3 shows yt for h¼0.0156 and s ¼ 1. The performance varies greatly depending on the value of l. When l is too large, yt fluctuates strongly and often takes the incorrect action (Fig. 3a). When l is too small yt responds slowly to changes in ht (Fig. 3c). For l ¼ lop ¼ 0:0085 (as prescribed by Eq. (9)), the sign of yt (and thus At) is the same as that of ht. Our main mathematical result, which we make more precise in the lemma below, is that we predict that the operational value of l

l ¼ c  s2 h4=3

ð11Þ

and tZ

1 1 LðhÞ  ¼ 2 c lh2=3 c

ð12Þ

for a sufficiently small c o1. This lemma is proved in Appendix. We make no attempt to compute the best possible bound on c. To test these predictions against numerical simulations, we simulated Eq. (5) for different levels of a fixed bias 9h9 and decay rate l. We made 500 runs for three different levels of 9h9, each run consisting of 2LðhÞ iterations of Eq. (5). Since the theoretical minimum for signal detection is L(h), this gives the algorithm a comparatively

1 0.5 0 −0.5

+



+



+

+



+



+

Tracking variable: yt

−1 1 0.5 0 −0.5 −1 1 0.5 0 −0.5 −1

+

− 1.1

1.2

+ 1.3

1.4

+

− 1.5 1.6 Time (t)

1.7

1.8

1.9

2 x 105

Fig. 3. Simulation of the signal tracking system (Eq. (5)) for 9h9 ¼ 0:0156 and (a) l ¼ 0:0338, (b) l ¼ 0:0085 and (c) l ¼ 0:0021. The þ and  indicate the sign of h for that 2 particular part of the simulation. When signðyt Þ ¼ signðht Þ then the signal is successfully tracked. The sign of h is switched every 2s2 =h time steps.

92

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95

short time to detect the signal. The first run had ht ¼ h and consequent runs changed the sign of ht, i.e., even runs had ht ¼ h and odd runs had ht ¼ h. At the start of a new run y0 was initialised with the value yt established at the end of the previous run, so as to simulate dynamic changes in the bias. The performance of the system was measured by the mean performance over the last L(h) steps of each run. Specifically,



2LðhÞ X 1 signðyt  ht Þ LðhÞ t ¼ LðhÞ þ 1

ð13Þ

measures the performance of a run. Fig. 4(a–c) shows the performance of the first 40 runs as a function of l. As in Fig. 3, large values of l produce a sequence of yt which are oversensitive to noise and thus fluctuate around zero (seen as yellow/green on the right hand side of panels a– c). Very small values of l produce a situation where yt fails to detect when ht changes, so that yt is correct every second time ht changes sign, but nearly always incorrect for just a single change (seen as alternating rows of red and blue on the left hand side of the figure). The optimal value of l is determined by Eq. (9). Here, the signal is correctly identified 75% of the time (Fig. 4d–f).

3. Exploration vs exploitation In the above setup the agent can observe xt directly on every time step. In many of the examples discussed in the Introduction, decision-making agents can only sample one of the available options on each time step. This case is relevant to situations where an agent must balance exploration with exploitation. For example, the agent might establish that one of the options is more profitable, but if it does not sample the other one from time to time then it cannot detect changes in relative profitability. Social insect colonies face such challenges in deciding where to forage for food (Seeley, 1995; Bonabeau et al., 1997). In this case, the agent only has information about the reward chosen on the previous time step. To model this we stipulate that dependent on its action the agent obtains the reward: r t ðAt Þ ¼ V þ

1 1 At ht þ pffiffiffi N t,At 2 2

ð14Þ

where V is a constant. The underlying signal ht again takes positive, or respectively negative, values this time corresponding to action 1, or respectively 1, being the most rewarding choice. Nt,At  Nð0, s2 Þ are independent and identically distributed

Number of sign changes

1

5

5

5

10

10

10

15

15

15

20

20

20

25

25

25

30

30

30

35

35

35

40

−10

−5

−15

0

sign(yt.ht)

log2 (λ)

−10

−5

0

0.2

0.75

0.7

0.7

0.7

0.65

0.65

0.65

0.6

0.6

0.6

0.55

0.55

0.55

0.5

0.5

0.5

0

−15

−10 −5 log2 (λ)

−10

−5

0

0

log2 (λ)

log2 (λ) 0.75

−10 −5 log2 (λ)

0.4

−15

0.75

−15

0.6

40

40 −15

0.8

0

−15

−10 −5 log2 (λ)

0

Fig. 4. Performance of the signal tracking system (Eq. (5)) for varying h and l. Proportion of time steps that yt has the same sign as ht for the first 40 runs of the simulation for varying l and (a) h ¼ 7 0:0884, (b) h ¼ 7 0:0156 and (c) h ¼ 7 0:001. The value of s (as defined in Eq. (13)) is more red when performance is high, i.e. s near to 1, and blue when performance is low, i.e. s near to 0. The average performance over 500 runs is given in (d) for h ¼ 0.0884, (e) for h ¼0.0156 and (f) for h ¼ 0.0156. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95

Gaussian random variables of mean zero and variance s2 . The agent’s goal is to maximise its long term reward. Since EðNt,1 Þ ¼ EðNt,1 Þ ¼ 0, the reward is maximised when At has the same sign as ht. We measure agent success with the score, At  signðht Þ, which is one whenever the signal has the same sign as ht. The question for the agent is how to choose the actions so as to maximise this score. To investigate such a setup we let z1,t and z1,t be the ‘‘pheromone’’ placed on taking actions 1 and 1 respectively. On each time step we set the probability of taking action 1 to PðAt þ 1 ¼ 19z1,t ,z1,t Þ ¼

expðbz1,t Þ expðbz1,t Þ þ expðbz1,t Þ

ð15Þ

and PðAt þ 1 ¼ 19z1,t ,z1,t Þ ¼ 1PðAt þ 1 ¼ 19z1,t ,z1,t Þ. The update rules for z1,t and z1,t are z1,t þ 1 ¼ ð1lÞz1,t þ l

ðAt þ 1 þ 1Þ ð1 þ r t þ 1 ð1ÞÞ 2

z1,t þ 1 ¼ ð1lÞz1,t þ l

ð1At þ 1 Þ ð1 þr t þ 1 ð1ÞÞ 2

ð16Þ

ð17Þ

so that each pheromone is only increased if the action is taken. Here r t þ 1 ðAÞ refers to the reward of action A in Eq. (14) We can approximate the above system to a one dimensional system based on the difference yt ¼ z1,t z1,t between the pheromones: yt þ 1 ¼ ð1lÞyt þ lðAt þ 1 þ ht þ Nt þ At þ 1 VÞ

ð18Þ

From this we can use the results in the previous section to set the parameters. First, we note that   expðbyt Þ1 b ¼ tanh  yt : EðAt þ 1 Þ ¼ expðbyt Þ þ1 2 Thus, based on the argument illustrated in Fig. 2, bop ¼ 2. Second, we note that Eq. (18) is identical to Eq. (5) apart from an additional term At þ 1 V. Thus provided V in Eq. (14) is small (V 5 1), then the sampling scheme presented here should work in a manner similar to the system for the difference tracking scenario (i.e., Eq. (5)). We thus predict that for b ¼ 2, lop should be proportional to 4=3 h . To test this prediction, we simulated the single variable tracking system for the sampling scenario with b ¼ 2, V¼0 and l A f41 ,42 , . . . ,420 g for 50 runs, each of length 20LðhÞ. We examined the performance: s¼

20LðhÞ X 1 signðyt  ht Þ 20LðhÞ t ¼ 1

averaged over all simulation runs for differing combinations of l and h. Fig. 5 shows the values l~ which gave the largest s averaged 2 over all 20  ð1=h Þ iterations. These observations confirm our prediction for each bias h tested. Furthermore l~ is also robust

93

with respect to changes of the interval of iterations over which performance was evaluated. For instance, selecting only the iterations of a subinterval, e.g., s¼

12LðhÞ X 1 signðyt  ht Þ 10LðhÞ t ¼ 2LðhÞ þ 1

yields a vertical shift of the l~ values in Fig. 5, but the relationship 4=3 to the bias (l~ ph ) remains unchanged. Are there other combinations of l and b which outperform those for b ¼ 2? Fig. 6 shows the mean performance s for different combinations of b and l for two biases. Here again we see that b ¼ 2 corresponds to high success rates for suitable values of l 4=3 and there is a local maximum close to lop ph . However, there exist other maxima for b slightly greater than 2. In particular, b ¼ 2:3 and l ¼ 27 (  0:0078) perform best when h ¼ 102 , and bop ¼ 2:2 and lop ¼ 29 (  0:0019) perform best when h ¼ 103 . This pattern is repeated for other values of h (results not shown). These parameter values correspond to the existence of two stable states (as in Fig. 2a). However, since b is not much greater than 2 when h gets closer to 0, the stability of these states is marginal and fluctuations usually allow an escape to the better option. The distribution of decisions shows that the system converges less often to the correct choice for b 4 2, but when it does it makes correct choices more often. There are several ways of improving the performance of the above tracking system if one would like to apply this problem to optimisation problems. For example, we can set   1 þ At þ 1 z1,t þ 1 ¼ ð1lÞz1,t þ l þ r~ 1,t þ 1 2 z1,t þ 1 ¼ ð1lÞz1,t þ l

  1At þ 1 þ r~ 1,t þ 1 2

with ( r~ j,t þ 1 ¼

r j,t þ 1 : At þ 1 ¼ i r kj : At þ 1 ¼ i

and r kj being the mean of the last k rewards sampled from decision i. This eliminates a sampling bias whereby the reward of an action degrades simply because it has not been chosen. In order to concentrate on our method for tracking ht in the presence of large noise, we have skipped over two problems that will arise in practical applications signal tracking. First, as mentioned above, the technique depends on having V small, otherwise the agent must determine the magnitude of the two rewards, determined by V in Eq. (14). This can be done by standard methods of gain control, removing the average of rt from the signal. Second, in setting l the agent must also estimate

Fig. 5. Optimal decay rate l~ of the tracking system for the sampling scenario (Eqs. (15)–(17)) having b ¼ 2 fixed and h A f104 ,103:8 , . . . ,101 g, simulated for 50 runs of 2 length 20  ð1=h Þ each having alternating bias.

log10 (λ)

94

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95

−1

0.20

−2

0.15

−3

0.10

−4

0.05

−5 0.00

−6 1

2

3

4

5

6

β

−1

0.08

log10 (λ)

−2

three alternatives. Even for three alternatives, the system corresponding to the function Fðy,xÞ is not immediately identifiable. Addressing these limitations is not straightforward, but we believe the techniques outlined here are a valuable start. 4=3 The relationship between l and h implies that any fixed value of l will not be optimal for all possible signal strengths, and thus fails to be robust to changes in scale. Biological systems which track signals are likely to be either tuned to particular signal to noise ratios or contain an additional aspect, not captured in previous models, to allow tuning to signal strength. One possible way to allow scale-free tuning to signal strength is to couple different tracking variables, each of which decays on a different time scale. There may be parallels to be drawn here to the idea of biochemical and genetic networks which allow the production of robust patterns in spite of intrinsic noise (Barkai and Leibler, 1997; Brandman et al., 2005). Future work will develop this multiple tracking variable system, in which one particular tracking variable catches the input signal and passes it up to higher-level variables.

0.06

−3 0.04

−4 0.02

−5

Acknowledgements This work was funded by the Human Frontiers Science Program grant RGP51/2007.

0.00

−6 1

2

3

4

5

6

Appendix

β Fig. 6. Fraction of successful choices of the tracking system for the sampling scenario (Eqs. (15)–(17)) having different parameter values b and l simulated for 2 500 runs of length 20  ð1=h Þ each having alternating bias. (a) shows the case h ¼ 102 , (b) the case h ¼ 103 . For b ¼ 2 the optimal decay rate l~ took values 4=3 4=3 102:71 for h ¼ 102 (h  102:67 ), and 104:21 for h ¼ 103 (h ¼ 104 ) respectively.

s (which we have set to 1). This can be addressed by setting s^ t to be the standard deviation of the recent rewards.

Proof of Lemma 1. The typical fluctuations yt yn are determined by the magnitude of the noise s and the rate of contraction to the fixed point. Consider the Taylor-approximation: ðtanhðxh

1=3

1=3

Þxh

2=3 5

þ hÞ=h  113x3 þ Oðh

x Þ

A simple calculation shows that for yo yn and as long as this approximation is valid, i.e., for y ¼ oð1Þ, we have yn F l ðy,h þN t Þ o r  ðyn yÞ þ lNt ,

y oyn

ð19Þ

2=3

4. Discussion

where r r 12lh , say. Thus we can deduce that yn yt is majorised by a discrete linear diffusion process: ut þ 1 ¼ rut þ lNt ,

We have shown how systems used to model pheromone trails, neuronal interactions and biochemical kinetics can be tuned to solve dynamic learning problems. By setting the steepness of the reaction threshold b and the stochastic evaporation or forgetting rate l to appropriate values we can tune these systems to quickly react to a particular input h even in the presence of a large degree of noise, i.e., s b 9h9. The tuning of b has already been appreciated under the rather grand title of tuning to the ‘‘edge of chaos’’ (Bonabeau, 1996, 1997). The tuning of the stochastic forgetting rate l proportional to 4=3 h is a more subtle result. It can only be recovered through the stochastic analysis presented here. The key result is that the optimal l depends on the magnitude of the signal to be detected. Equations similar to (16) and (17) are widely used, for example, ant algorithms, to solve optimisation problems (Dorigo and ¨ Stutzle, 2004). It is notoriously difficult to set parameters of these algorithms to values which allow them to search effectively (Pellegrini et al., 2006, and references therein). The results we discuss here are directly applicable to parameter setting in these situations. In such applications adjustments will have to made to account for aspects, such as, V b1 and the existence of more than

ut Z0

ð20Þ

with a reflecting barrier in u¼ 0. Then ut ¼ 9W t þ r t u0 9, where Wt is Gaussian with zero mean and variance v2t bounded by

l2 s2 =ð1rÞ2r

ls2 2=3

4h

r

ls2 2=3

4ch

for c o 1. We can choose c to determine the proportion of time which yt spends within some distance of yn. Thus, 1=3 Prfyt 4ð1dÞyn g o1E=2 provided that u0 ¼ yn y0 ¼ Oðh Þ and that c is chosen small enough. By the Markov property of yt, it remains to show that for any 1=3 initial state y0 ¼ Oð1Þ, we have yt 4 h , with probability greater than 1E for t ¼ OðLÞ. The reason for this is that y-Fðy,hÞ is a 1=3 contractive mapping for y oh . Let y~ 0 ¼ y0 and define the deterministic process y~ t þ 1 ¼ Fðy~ t ,hÞ ¼ y~ t þ lðtanhðy~ t Þy~ t þ hÞ and let zt ¼ yt y~ t . The convexity of Fðy,hÞ, y o0, implies that yt þ 1 ¼ Fðy~ t þ zt ,hÞ þ lNt Z y~ t þ 1 þ lN t þ rðy~ t Þzt

ð21Þ

A. Johansson et al. / Journal of Theoretical Biology 309 (2012) 88–95 1=3

where, for y o h rðyÞ ¼

, !

@F 1 1 2=3 ðy,hÞ ¼ 1l 1 or :¼ 1 lh : 2 @y 3 cosh y

Since tanhðuÞu þh Z 

2  u3 , 15

1 o u o0,

we can approximate y~ t  uðsÞ with the differential equation: du 2 Z  uðsÞ3 ds 15 where s ¼ lt. The solution 1 uðsÞ Z pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4suð0Þ 1=3

shows that for the deterministic system y~ t rh =2 if, say, 2=3 2=3 , i.e., t Z 2=lh . s Z 2h 1=3 It thus remains to show that Prfzt 4 h =2g 41E, for some 2 t ¼ L=c . But, from (21) above, it follows that zt is bounded below by the linear diffusion ut þ 1 ¼ rut þ lN t , ut o 0, reflected in ut ¼0. Hence, zt 4 9W t 9, where Wt is Gaussian mean zero with variance bounded 2 2=3 by l s2 =ð1rÞ2 ¼ 3ls2 =h . Thus it is clear that adjusting c small 1=3 enough gives the sought for bound on Prfzt 4 h =2g. & References Amari, S.-i., 1998. Natural gradient works efficiently in learning. Neural Comput. 10, 251–276. Ame, J.M., J., H., Rivault, C., Detrain, C., Deneubourg, J.L., 2006. Collegial decision making based on social amplification leads to optimal group formation. Proc. Natl. Acad. Sci. USA 103, 5835–5840. Andrews, B.W., Yi, T.M., Iglesias, P.A., 2006. Optimal noise filtering in the chemotactic response of escherichia coli. Plos Comput. Biol. 2 (11), 1407–1418, e154. Barkai, N., Leibler, S., 1997. Robustness in simple biochemical networks. Nature 387 (6636), 913–917. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., Cohen, J.D., 2006. The physics of optimal decision making: a formal analysis of models of performance in twoalternative forced-choice tasks. Psychol. Rev. 113 (4), 700–765. Bonabeau, E., 1996. Phase transitions in instigated collective decision making – comment. Adaptive Behav. 5 (1), 99–105. Bonabeau, E., 1997. Flexibility at the edge of chaos: a clear example from foraging in ants. Acta Biotheor. 45 (1), 29–50. Bonabeau, E., Theraulaz, G., Deneubourg, J.L., Aron, S., Camazine, S., 1997. Selforganization in social insects. Trends Ecol. Evol. 12 (5), 188–193. Bonabeau, E., Theraulaz, G., Dorigo, M., 1999. Swarm Intelligence From Natural to Artificial Systems. Oxford University Press. Borkar, V.S., 2008. Stochastic Approximation. Cambridge University Press/Hindustan book agency, Cambridge, UK. Brandman, O., Ferrett, J.E., Li, R., Meyer, T., 2005. Interlinked fast and slow positive feedback loops drive reliable cell decisions. Science 310 (5747), 496–498. Brandman, O., Meyer, T., 2008. Feedback loops shape cellular signals in space and time. Science 322 (5900), 390–395.

95

Brown, E., Gao, J., Holmes, P., Bogacz, R., Gilzenrat, M., Cohen, J.D., 2005. Simple neural networks that optimize decisions. Int. J. Bifurcation Chaos 15 (3), 803–826. Chapman, D.G., Robbins, H., 1951. Minimum variance estimation without regularity assumptions. Ann. Math. Statist. 22, 581–586. Couzin, I.D., 2009. Collective cognition in animal groups. Trends Cognitive Sci. 13 (1), 36–43. Cramer, H., 1946. Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ. De Jong, H., 2002. Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9 (1), 67–103. Deneubourg, J., Aron, S., Goss, S., Pasteels, J., 1990. The self-organizing exploratory pattern of the argentine ant. J. Insect Behav. 3, 159–168. Detrain, C., Deneubourg, J.L., 2008. Collective decision-making and foraging patterns in ants and honeybees. In: Advances in Insect Physiology, vol. 35 of Advances in Insect Physiology. pp. 123–173. ¨ Dorigo, M., Stutzle, T., 2004. Ant Colony Optimization. The MIT Press. Dussutour, A., Deneubourg, J.L., Fourcassie, V., 2005. Amplification of individual preferences in a social context: the case of wall-following in ants. Proc. R. Soc. B – Biol. Sci. 272 (1564), 705–714. Feng, S., Holmes, P., Rorie, A., Newsome, W.T., 2009. Can monkeys choose optimally when faced with noisy stimuli and unequal rewards? Plos Comput. Biol. 5 (2), e1000284. Gold, J.I., Shadlen, M.N., 2007. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574. Goss, S., Aron, S., Deneubourg, J.L., Pasteels, J.M., 1989. Self-organized shortcuts in the argentine ant. Naturwissenschaften 76 (12), 579–581. Green, D., Swets, J., 1966. Signal Detection Theory and Psychophysics. Wiley, New York. Kushner, G., Yin, G., 1997. Stochastic Approximation and Recursive Algorithms and Applications. Lewis, J., 2008. From signals to patterns: space, time, and mathematics in developmental biology. Science 322 (5900), 399–403. Marshall, J.A.R., Bogacz, R., Dornhaus, A., Planque, R., Kovacs, T., Franks, N.R., 2009. On optimal decision-making in brains and social insect colonies. J. R. Soc. Interface 6 (40(November)), 1065–1074. Nicolis, S., Deneubourg, J., 1999. Emerging patterns and food recruitment in ants: an analytical study. J. Theor. Biol. 198, 575–592. Nicolis, S.C., Detrain, C., Demolin, D., Deneubourg, J.L., 2003. Optimality of collective choices: a stochastic approach. Bull. Math. Biol. 65 (5), 795–808. (times cited: 0 Article English Cited References Count: 31 714jk). Pellegrini, P., Favaretto, D., Moretti, E., 2006. On optimal parameters for ant colony optimization algorithms. In: Ant Colony Optimization and Swarm Intelligence. Radhakrishna Rao, C., 1945. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91. Seeley, T., 1995. The Wisdom of the Hive. Belknap Press of Harvard University Press, Cambridge, Massachusetts. Simen, P., Cohen, J.D., Holmes, P., 2006. Rapid decision threshold modulation by reward rate in a neural network. Neural Networks 19 (8), 1013–1026. Smith, P.L., Ratcliff, R., 2004. Psychology and neurobiology of simple decisions. Trends Neurosci. 27 (3), 161–168. Sumpter, D.J.T., Krause, J., James, R., Couzin, I.D., Ward, A.J.W., 2008. Consensus decision making by fish. Curr. Biol. 18 (22), 1773–1777. Sumpter, D.J.T., Pratt, S.C., 2009. Quorum responses and consensus decision making. Philos. Trans. R. Soc. B – Biol. Sci. 364 (1518), 743–753. Tanouchi, Y., Tu, D., Kim, J., You, L., 2008. Noise reduction by diffusional dissipation in a minimal quorum sensing motif. Plos Comput. Biol. 4 (8), e1000167. Trewavas, A., 2009. What is plant behaviour? Plant Cell Environ. 32 (6), 606–616. Uhlenbeck, G.E., Ornstein, L.S., 1930. On the theory of Brownian motion. Phys. Rev. 36, 823–841. Usher, M., McClelland, J., 2001. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108 (3 (July)), 550–592. van Drongelen, W., 2006. Signal Processing for Neuroscientists: An Introduction to the Analysis of Physiological Signals. Academic Press.