This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only
Dither Re-use in Nash Equilibrium Seeking Ronny J. Kutadinata, William H. Moase, and Chris Manzie Abstract—The Nash equilibrium seeking (NES) scheme consists of a number of decentralised extremum-seeking (ES) “agents”, each controlling an input such that an associated (selfish) cost is regulated to its steady-state Nash equilibrium. A non-local stability result for the NES scheme is provided which allows two agents to use the same dither signal if the effect of each agent on the other’s steady-state cost function is sufficiently weak. Its application to plants with quadratic cost functions is presented as an example. It is then demonstrated in simulation that, by reducing the number of distinct dither signals, the proposed scheme still has acceptable convergence properties, while the design effort is reduced. Index Terms—Adaptive control, extremum-seeking, Nash equilibrium.
I. INTRODUCTION Extremum-seeking (ES) is a non-model-based method of regulating the inputs of a plant to within a small region of the values that, in the steady-state, minimise the system’s output. Typically, ES schemes involve the addition of a small dither to the plant’s input. By observing its effect on the plant’s output, it is possible to estimate the gradient of the steady-state relationship between the plant’s inputs and output. This estimate can then be used in a gradient-based scheme to adapt the plant’s input such that the steady-state output is minimised or maximised. There are many previous ES results in the literature: local stability [1], semi-global practical asymptotic (SPA) stability [2], and many other varieties of ES schemes [3]–[8]. The analysis in [1], [2] decomposes the closed-loop dynamics into three distinct time-scales. The order from the fastest to the slowest is: the plant dynamics, the dither and gradient estimation, and the adaptation of the plant input. However, the majority of the results in ES theory address only single-input single-output (SISO) systems. ES results that consider multi-input single-output (MISO) systems include [9]–[11]. A SPA stability result presented in [11] requires two important properties of a MISO ES scheme: the closed-loop must have the same three time-scale tuning as that discussed in [2]; and the number of distinct frequencies used in the dithers must increase in proportion to the number of plant inputs. As the number of inputs to a given system is increased, it is reasonable to expect that, in order to achieve acceptable convergence properties, the assignment of the frequencies to the dithers becomes an increasingly arduous task. Additionally, slow adaptation may also be caused by the increasing difficulty of the optimisation problem as its dimension grows. One potential solution to these problems involves the use of decentralised ES techniques. Decentralised ES schemes such as those discussed in [12]–[14] feature many of the same characteristics as SISO and MISO ES schemes. However, instead of optimising a single output, groups of inputs are controlled by ES “agents”, each of which has its own output to optimise. Hence, the problem becomes a non-cooperative game, the solution of which is a Nash equilibrium. [15] provides a local stability result for a simple Nash-equilibrium-seeking (NES) scheme that is similar to the SISO one in [2]. It is noteworthy that [15] considers a Manuscript received April 28, 2013; revised March 13, 2014 and July 31, 2014; accepted August 6, 2014. Recommended by Associate Editor Dario Bauso. R. J. Kutadinata, W. H. Moase and C. Manzie are with the Department of Mechanical Engineering, The University of Melbourne, 3010, Victoria, Australia (email:
[email protected];
[email protected];
[email protected]).
1
family of systems where each agent’s cost is influenced by all agents’ inputs but is most strongly dependent upon its own. However, the result still requires each input to the plant to be perturbed with a sinusoidal dither of a unique frequency. Thus, when applied to largescale systems, the NES scheme can be expected to suffer similar limitations to traditional MISO ES approaches. One NES approach that addresses this issue is discussed in [12], which allows dithers to be re-used between agents who do not affect each other’s output/cost. However, such a result cannot be exploited in systems where each agent’s output/cost is potentially dependent upon all of the inputs. This paper considers a simple NES scheme acting on a family of systems where there is a measure on the upper bound of the effect of an input on any output, such that it can be guaranteed that some of the interconnections among these agents are weak. Such systems may arise from urban traffic, sensor and irrigation networks. In a sense, the family of systems considered lies between those considered in [15] and [12]. It is shown that two agents who are sufficiently weakly affecting each other may use the same dither without adversely impacting on stability. By re-using dither, practical benefit can be gained by simplifying the task of assigning frequencies. In [16], the stability of the NES scheme with dither re-use was established. This paper extends [16] by providing: specific conditions which guarantee that a certain amount of dither re-use is possible without compromising SPA stability; and more practical formulae for the application of the scheme to plants with quadratic costs. II. SYSTEM DESCRIPTION A. Plant The plant is described by the dynamical system: x˙ = f (x, u), Nx
y = g(x, u),
(1)
Nu
where x ∈ R and u, y ∈ R . For algebraic simplicity, define S = {1, 2, . . . , Nu }. For each i ∈ S, there is an agent (or player) that is allowed to choose the ith input, ui , based only on its measurement of the ith output, yi . Assumption 1: There exists a function h : RNu → RNx such that x = h(u) is a globally asymptotically stable equilibrium of x˙ = f (x, u), uniformly in u. Let J(u) := g (h(u), u) be the input-to-output steady-state mapping of the system. The goal of each agent is to minimise its steady-state measurement. Assumption 2: There exists a unique Nash equilibrium point u∗ which, for all i ∈ S, satisfies ∂Ji ∗ (u ) = 0, ∂ui
∂ 2 Ji ∗ (u ) > 0. ∂u2i
(2)
The fact that an agent’s output is affected by other agents’ inputs at differing strength is going to be exploited to create a “neighbourhood”. The neighbourhood of agent i consists of all other agents whose input has a sufficiently strong influence on agent i’s output. Let Ni (R) denote the neighbours set of agent i. The number of the neighbouring agents can be increased through the parameter R. Thus, Ni (R1 ) ⊆ Ni (R2 ) for all 0 ≤ R1 ≤ R2 , and Ni (R) = S for sufficiently large R. For many physical systems, the influence of an input decays spatially. In such a case, it is sensible to choose R to be related to a physical distance. Assumption 3: There exists α ∈ K∞ such that for any R ∈ R>0 , there exists γ1 (R) ∈ R>0 such that for all i ∈ S, X ∂Ji ∂Ji ∗ ∗ (u) − (u ) (3) ∂uj ≤ γ1 (R)α(ku − u k). ∂uj j ∈N / i (R)
c 2015 IEEE 0000–0000/00$00.00
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only
2
Assumption 4: For any R, there exists γ2 (R) ∈ R>0 such that X ∂Ji ∗ (4) ∂uj (u ) ≤ γ2 (R) ∀i ∈ S. j ∈N / i (R)
γ1 (R) and γ2 (R) will decrease to zero as R is increased. This decrease will be particularly strong for systems where many of the interconnections between the agents are weak. Remark 1: Assumption 3 is not restrictive since it only requires continuous differentiability of J(u). Also, since u∗ is a constant, the LHS of (4) can always be upper bounded, which implies that Assumption 4 can always be satisfied. Note that throughout this paper, the arguments of γ1 and γ2 are dropped for notational compactness. Remark 2: Assumptions 3 and 4 do not require specific knowledge of the bounding functions γ1 , γ2 and α. They only require the bounding functions to exist. The convergence result for the proposed NES scheme is proven by showing that the NES scheme can be designed such that the behaviour of the full closed-loop system approximates de ui ∂Ji =− (e u + u∗ ), dτ ∂e ui
i∈S
(5)
e and τ are the auxiliary system’s with arbitrary accuracy. Note that u states and time scale respectively. The following assumption provides global asymptotic stability of the origin of (5), which underpins the NES stability result. Assumption 5: There exist (c1 , c2 , c3 , c4 ) ∈ R4>0 and a radially e ∈ RNu , the unbounded V (·) : RNu → R, such that for all u following hold 2
2
c1 α (ke uk) ≤ V (e u) ≤ c2 α (ke uk) ,
(6a)
Nu X ∂V ∂Ji (e u) (e u + u∗ ) ≥ c3 α (ke uk)2 , ∂e u ∂u i i i=1
(6b)
k∇V (e u) k ≤ c4 α (ke uk) .
(6c)
Note that α(·) is from Assumption 3. Assumption 5 also ensures (5) has an asymptotically stable solution close to the origin when subjected to sufficiently small perturbations.
B. NES Scheme with Dither Re-use The proposed Nash equilibrium seeking scheme is given by: ¯ + as(ωt)) , x˙ = f (x, u ¯˙ = −kω diag (s(ωt)) g(x, u ¯ + as(ωt)), u where s(ωt) is both the dither and demodulation signal sin (ω [ω1 t + φ1 ]) sin (ω [ω2 t + φ2 ]) s(ωt) = , .. . sin (ω [ωNu t + φNu ])
(7a) (7b)
(8)
diag(·) is a diagonal matrix where the diagonal elements are given by its arguments, and ωi ∈ Q≥ωmin for all i ∈ S and some ωmin > 0. The dither frequencies inside the neighbourhood of each agent must be distinct, that is for all i ∈ S and j ∈ Ni (R), ωj 6= ωi . This property is a relaxation of the requirement in [15] of unique frequencies for all i ∈ S. The stability of the NES scheme subject to this relaxation is provided in the next section.
III. STABILITY RESULTS A. Reduced System First, a new time scale is defined τ = ωt and the closed loop system (7) can be expressed in this new time scale dx ¯ + as(τ )) = f (x, u (9a) dτ d¯ u ¯ + as(τ )) = −k diag (s(τ )) g(x, u (9b) dτ which is in the standard form for singular perturbation analysis. For small ω, the plant states x will quickly settle to their equilibrium x = h(¯ u + as(τ )) and the reduced system is ω
d¯ ur = −k diag (s(τ )) J(¯ ur + as(τ )). dτ
(10)
B. Periodic Average of Reduced System For small k, a conclusion regarding the stability of the reduced system can be made by investigating the periodic average of the reduced 2π 2π 2π , , . . . , system. Let T be the least common multiple of ω ωNu 1 ω2 which always exists since the dither frequencies are rational multiples of ω. Specifically, T is defined as follows: t T = min t ∈ N, ∀i . (11) 2π/ωi The average system is d¯ uav k =− dτ T
T
Z
diag (s(τ )) J(¯ uav + as(τ )) dτ .
(12)
0
Consider the Taylor series expansion of J(¯ uav + as(τ )): J(¯ uav + as(τ )) = J(¯ uav ) + a∇J(¯ uav )s(τ ) + O(a2 ) (13) T where ∇J = ∇J1 ∇J2 . . . ∇JNu . Substituting (13) into (12), Z k T d¯ uav,i =− sin(ωi τ )Ji (¯ uav ) dτ T 0 X ∂Ji (¯ uav ) sin(ωj τ ) + a sin(ωi τ ) ∂uj j + O(a2 ) dτ .
(14)
The average of the first term is zero. Exploiting orthogonality properties of sinusoidal functions for the second term, Z X ∂Ji 1 T (¯ uav ) sin(ωj τ )dτ sin(ωi τ ) T 0 ∂uj j X 1 Z T ∂Ji = sin(ωi τ ) sin(ωj τ ) (¯ uav )dτ T ∂u j 0 j 1 X ∂Ji = (¯ uav ) 2 j:ω =ω ∂uj j i =
X ∂Ji 1 ∂Ji (¯ uav ) + (¯ uav ) . 2 ∂ui ∂uj j:ω =ω j
(15)
i
j6=i
For notational compactness, define X ∂Ji ∂Ji ∗ Γi (¯ uav ) = (¯ uav ) − (u ) , ∂uj ∂uj j:ω =ω j
(16)
i
j6=i
Λi =
X j:ωj =ωi j6=i
∂Ji ∗ (u ). ∂uj
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
(17)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only
Now, substitution of (15)–(17) into (14) leads to the result: ka ∂Ji d¯ uav,i =− (¯ uav ) + Γi (¯ uav ) + Λi + O(a) . dτ 2 ∂ui
3
Similarly, after using (6b) from Assumption 5 for the first term and (6c) for the third term of (21), it follows that (18)
α(ke uk)2 1 dV ≤ a(−c3 + γ1 c4 k1k) k dτ 2
From Assumptions 3 and 4, |Γi (¯ uav )| and |Λi | are upper bounded by γ1 and γ2 respectively, and can be decreased by increasing R. Thus, for sufficiently small a, and sufficiently large R, the averaged system can be expected to approximate (5).
0 ≥ a(−c3 + γ1 c4 k1k) C. Stability Analysis of Averaged and Reduced Systems Before introducing the main result, it is useful to show the stability analysis of the averaged (12) and the reduced (10) systems. However, these stability results are only intended as stepping stones for proving the main result, and are of limited use practically. Since stability of the reduced system requires stability of the averaged system, the result for the reduced system is presented after the result for the averaged system. Lemma 1 and Lemma 2 state the stability of the averaged system (12) and the reduced system (10) respectively. Lemma 1: Under Assumptions 2–5, there exists βav ∈ KL such that for any (∆, ν) ∈ R2>0 there exist (a∗ , γ2∗ ) ∈ R2>0 such that for all (a, k, γ1 , γ2 ) ∈ (0, a∗ ] × R>0 × (0, c3 /(c4 k1k)) × (0, γ2∗ ], the solutions of (12) with the initial condition k¯ uav (0) − u∗ k ≤ ∆, will satisfy k¯ uav (τ ) − u∗ k ≤ βav (k¯ uav (0) − u∗ k, kaτ ) + ν,
∀τ ≥ 0.
Proof: The Lyapunov function from Assumption 5 can be used to prove stability of the average system. First define Z 1 T Fav (¯ uav ) = diag (s(τ )) J(¯ uav + as(τ )) dτ , (19) T 0 a ∂Ji (¯ uav ) + Γi (¯ uav ) + Λi . (20) Gi (¯ uav ) = 2 ∂ui Now (12) can be expressed as:
1 dV = − ∇V (e u)T G(e u + u∗ ) k dτ − ∇V (e u)T (Fav (e u + u∗ ) − G(e u + u∗ )) =− −
Nu ∂Ji a X ∂V (e u) (e u + u∗ ) 2 i=1 ∂e ui ∂ui
a 2
Nu X i=1
∂V (e u) (Γi (¯ uav ) + Λi ) ∂e ui
∂e ui
(22)
(e u) (Γi (¯ uav ) + Λi )
Nu X ∂V ≤ (e u ) ∂e ui i=1
X j ∈N / i (R)
Nu X ∂V + (e u ) ∂e ui i=1
Therefore, (26) indicates the region in which dV /dτ is negative, given any (a, γ2 ) and γ1 satisfying c3 . (27) γ1 < c4 k1k Equation (27) can always be satisfied by selecting sufficiently small R. Also note that the terms in the numerator of (26) are of O(γ2 ) and O(a) respectively. uk ≤ c} denote a ball of radius Now let B(c) := {e u ∈ RNu : ke c and let D := {e u ∈ RNu : V (e u) ≤ c2 α(∆)2 }. Using (6a) and e is to leave D realising that B (∆) ⊆ D, V (e u) must increase if u after being initialised in B (∆). Thus, for a sufficiently small a, γ1 and γ2 , V is guaranteed to decrease with time until ke uk ≤ ε, in which case c2 α(ε)2 ≥ c2 α(ke uk)2 ≥ V (e u) ≥ c1 α(ke uk)2
j ∈N / i (R)
∂Ji ∗ (u ) ∂uj
≤ (γ1 α(ke uk) + γ2 ) k1kc4 α(ke uk).
(29)
Lemma 1 follows after noting that decreasing (a, γ1 , γ2 ) makes the e converges arbitrarily small. region to which u Lemma 2: Under Assumptions 2–5, there exists βr ∈ KL such that for any (∆, ν) ∈ R2>0 there exist (a∗ , k∗ , γ2∗ ) ∈ R3>0 such that for all (a, k, γ1 , γ2 ) ∈ (0, a∗ ] × (0, k∗ ] × (0, c3 /(c4 k1k)) × (0, γ2∗ ], the solutions of (10) with the initial condition k¯ ur (0) − u∗ k ≤ ∆, will satisfy ∀τ ≥ 0.
∂p = kFav (q, a) − k diag (s(τ )) J(q + p + as(τ )). ∂τ
(30)
For any bounded domain D0 ∈ RNu containing u∗ , the solution of (30) will be p(q, τ, a, k) = O(k) uniformly in (q, τ ) ∈ D0 × R and small (a, k). Furthermore, there is a w such that p(w, τ, a, k) exists and satisfies ¯r w + p(w, τ, a, k) = u (31) ¯ r satisfying u ¯ r − u∗ ∈ D0 . for sufficiently small (a, k) and any u Differentiating (31) with respect to τ gives !−1 dw ∂p = −k I + Fav (w, a). (32) dτ ∂q q=w
∂Ji ∂Ji ∗ ∗ (e u + u ) − (u ) ∂uj ∂uj X
r c2 ke uk ≤ α−1 α(ε) . c1
(28)
Proof: Let p(q, τ, a, k) represent the T -periodic (in τ ) solution
Note that from (18), kFav (u) − G(u)k is of O(a ). Consider the second term in (22), using Assumptions 3 and 4 and (6c) from Assumption 5
i=1
α(ke uk) 2
of
2
Nu X ∂V
(25)
Rearranging (25) to isolate ke uk leads to the inequality, u + u∗ ) − G(e u + u∗ )k γ2 c4 k1k + ca4 kFav (e −1 ke uk ≥ α c3 − γ1 c4 k1k =: ε. (26)
k¯ ur (τ ) − u∗ k ≤ βr (k¯ ur (0) − u∗ k, kaτ ) + ν,
− ∇V (e u)T (Fav (e u + u∗ ) − G(e u + u∗ ))
−
α(ke uk)2 2
+ (aγ2 c4 k1k + c4 kFav (e u + u∗ ) − G(e u + u∗ )k)
and therefore
d¯ uav = −kG(¯ uav ) − k (Fav (¯ uav ) − G(¯ uav )) (21) dτ e = u ¯ av − u∗ and consider the Lyapunov function V (e Let u u) satisfying Assumption 5.
(24)
α(ke uk) + (aγ2 c4 k1k + c4 kFav (e u + u∗ ) − G(e u + u∗ )k) . 2 It is desired for the right hand side of (24) be negative.
(23)
Note that ∂p/∂q can be made arbitrarily small by decreasing k. Thus, the solution of (32) approaches the behaviour of the averaged
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only
4
system. Hence, using similar arguments to Lemma 1, V (w −u∗ ) can be used to prove convergence of kw − u∗ k to a solution that can be made arbitrarily small by decreasing (a, k, γ1 , γ2 ). Lemma 2 follows ¯ r from (31). after noting that the same conclusion can be taken for u Remark 3: Lemma 1 and 2 are stated in terms of γ1 and γ2 . As previously mentioned, they are not intended to be used in practice since that implies one requires the knowledge of how γ1 and γ2 change with R. The main theorem will address this so that such knowledge is unnecessary.
However, this can only be done if one has more knowledge of the plant. By exploiting inequality (27), the following assumption is produced. Assumption 6: There exists R1 ∈ R≥0 such that γ1 < c3 /(c4 k1k) for all R ∈ [R1 , Rmax ) and γ1 ≥ c3 /(c4 k1k) for all R ∈ (0, R1 ). If one has the luxury of possessing the knowledge to validate Assumption 6, Lemma 1 can be restated as follows. Corollary 1: Under Assumptions 2–6, for any (∆, ν) ∈ R2>0 , there exists a1 ∈ R≥0 such that for any (a, k, R) ∈ (0, a1 ) × R≥0 × [R1 , Rmax ], the solutions of the average system (12) with the initial condition k¯ uav (0) − u∗ k ≤ ∆, will satisfy k¯ uav (τ ) − u∗ k ≤ βav (k¯ uav (0) − u∗ k, kaτ ) + ν,
D. Main Result Before introducing the main stability result, define x − h(u) . z= ¯ − u∗ u
(33)
Theorem 1: Under Assumptions 1–5, there exists β(·, ·) ∈ KL such that for any (∆, ν) ∈ R2>0 there exists (a∗ , k∗ , R∗ ) ∈ R3>0 such that for all (a, k, R) ∈ (0, a∗ ] × (0, k∗ ] × R≥R∗ there is an ω ∗ ∈ R>0 such that for all ω ∈ (0, ω ∗ ], the solutions of (7) with any kz(t0 )k ≤ ∆ satisfy kz(t)k ≤ β (kz(t0 )k, kaω(t − t0 )) + ν,
∀τ ≥ 0.
∗
∀t ≥ t0 .
Proof: The proof follows three steps. First, the average system (12) is shown to be SPA stable uniformly in small (a, γ1 , γ2 ) (see Lemma 1). Secondly, the reduced system (10) is proven to be SPA stable uniformly in (k, a, γ1 , γ2 ) (see Lemma 2). In addition, the stability of the boundary layer system is provided by Assumption 1. Finally, using a similar singular perturbation argument as in [2, Lemma 1], the full closed-loop system can be proven to be SPA stable with respect to (k, a, γ1 , γ2 , ω) uniformly in (k, a, γ1 , γ2 ). Theorem 1 follows after noting that the conclusion can be restated to incorporate R instead of γ1 and γ2 (because R is the actual design parameter) by using Assumption 3 and 4. Remark 4: Theorem 1 provides for a systematic procedure for selecting appropriate ES parameters. As is consistent with previous ES results [1], [2], the issue of gradient estimate bias is addressed by choosing sufficiently small a, ω and k. Similarly R must be chosen to be sufficiently large to achieve an acceptable level of gradient estimate accuracy. For example, one can start with an arbitrary R. Then, if the scheme is unstable or has an unsatisfactory convergence properties, R can be increased until a desired result is achieved. Alternatively, R can potentially be decreased while the system closed loop behaviour remains satisfactory. Remark 5: In some applications, game theoretical approaches are used to decentralise a large-scale optimisation problem, where the goal is to optimise a global cost function. Although in general the Nash equilibrium does not necessarily represent the global optimum of the system, several methods are available in the literature [17]– [19] which enable the Nash equilibrium to coincide with the global optimum. One condition that is common among these works is the fact that the global cost is the sum of each agent’s (local) cost. ∗
E. Existence of R < Rmax Although Theorem 1 has provided stability and tuning guidelines for the scheme, Theorem 1 does not guarantee the existence of R∗ < Rmax , where Rmax is the smallest value of R that yields Ni (Rmax ) = S for at least one of the agents. In other words, the result also means that one can always choose R = Rmax to guarantee convergence. Therefore, it is desirable to establish conditions for which one can guarantee the existence of R∗ < Rmax .
Remark 6: Indeed, to be able to calculate R , more specific knowledge of γ1 is required, which might be difficult to obtain in practice. Yet the lack of this knowledge does not prevent one to use the main theorem. In addition, it is not always a simple task to verify that Assumption 6 is satisfied. The next section provides details on how this may be achieved for plants with quadratic costs, including outlining the knowledge required to validate Assumption 6 and the formula to be used in practice. IV. PLANTS WITH QUADRATIC COSTS It is of value to discuss the results developed in Section III with respect to plants with quadratic cost functions since these are very common in optimisation problems and many cost functions are well-approximated by quadratic functions in a region close to their minimum. Specifically, it is useful to investigate Assumption 5 and Corollary 1 when the cost function is quadratic. As a result, conditions which are more useful to the practitioner are determined, which can be used as a guide when choosing R. Specifically, an upper bound for γ1 is going to be determined. In addition, this section shows that it is relatively easy to satisfy the conditions required by the main result. Dynamics are ignored since they do not significantly affect the analysis. The first step of analysing the plant is to investigate Assumption 5 for quadratic plants, which includes determining the gradient system, finding the Lyapunov function, α(.), c1 , c2 , c3 and c4 . It is then possible to check whether the plant satisfies Assumption 6. The cost functions in this instance can be stated as: 1 Ji = uT Di u + HiT u + li (34) 2 where Di = [dij,k ] is a symmetric matrix and dij,k is the Di matrix entry at row j and column k; dij = dij,1 dij,2 . . . dij,Nu ; Hi = i T h1 hi2 . . . hiN u ; and dij,k , hij , li ∈ R. For the quadratic cost function (34), the overall gradient system can be represented as a linear system, 1 1 d1 h1 2 d2 h22 du = − . u − . := Au + B. (35) dt .. .. u dN Nu
u hN Nu
The gradient system is stable if and only if A is Hurwitz, however, it is uncertain whether this is true in general. To ensure this, some additional conditions that need to be satisfied by the system can be introduced. For example, [13] states that if A is diagonally dominant with negative diagonal entries X i dii,i > 0, |di,j | ≤ |dii,i | ∀i (36) j6=i
then A is Hurwitz based on Gershgorin’s discs theorem [20, Theorem 4.5].
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only e = u − u∗ . Thus, As in previous sections, let u de u = A(e u + u∗ ) + B = Ae u dt
(37)
since Au∗ + B = 0. Assuming A is Hurwitz, the Lyapunov function to show stability of the gradient system can be chosen as V (e u) = eT P u e where P = P T > 0 satisfies the Lyapunov equation AT P + u P A = −Q with any Q = QT > 0. As a result, the α function and all ci ’s in Assumption 5 can now be determined. By selecting α(.) as an identity function, the following ci ’s are selected c1 = λmin (P ) c2 = λmax (P )
(38a)
c3 = λmin (Q)
(38b)
c4 = 2λmax (P )
where λmin (.) and λmax (.) are the minimum and maximum eigenvalues respectively. Now consider (3) in Assumption 3. Define γ1 as X i γ1 = max |dj | . (39) i j ∈N / i (R)
Thus, Assumption 3 is satisfied since X X ∂Ji ∂Ji ∗ (u) − (u ) = ∂uj ∂uj
i e ≤ γ1 ke uk. dj u
In order to eliminate the need to know λmax (A + AT ), Assumption 8 and Gershgorin’s disc theorem are used. Firstly, X λmax (A + AT ) ≤ max −2dii,i + |dii,j + djj,i | i j6=i X ≤ max −2|dii,i | + |dii,i |ρ(|i − j|) + |djj,j |ρ(|i − j|) i j6=i X ≤ max (ˆ ρ − 2)|dii,i | + dmax ρ(|i − j|) i j6=i
≤ (ˆ ρ − 2)dmin + ρˆdmax .
(46)
Taking the absolute value of both sides, |λmax (A + AT )| ≥ (2 − ρˆ)dmin − ρˆdmax .
(47)
From (39), (43) and (47), we have γ1 ≤ ργ (R)
(48) T
|λmax (A + A )| (2 − ρˆ)dmin − ρˆdmax ≤ . 2k1k 2k1k
(49)
Therefore, (42) is satisfied if the following is satisfied (40)
(2 − ρˆ)dmin − ρˆdmax . 2k1k
ργ (R) ≤
j ∈N / i (R)
j ∈N / i (R)
5
(50)
Now it is possible to check whether Assumption 6 is satisfied. To remove the dependency of the result on P , the following inequality [21] is useful 1 λmax (P ) ≤ − . (41) λmax {(A + AT )Q−1 }
In conclusion, by using (50), one can determine whether Assumption 6 is satisfied. Thus, for plants with quadratic costs, (50) is an important equation in determining whether dither re-use is possible.
In addition, Q = I is selected to simplify the analysis. Note that since P > 0 and Q = I, (41) also indicates that λmax (A + AT ) ∈ R≤0 .
While not specifically addressed in the theoretical results, there is some interest in investigating the implication of dither re-use on the transient behaviour of the NES scheme. In addition, an example application of the analysis in Section IV is presented in order to show its use when dealing with a plant with quadratic cost functions. Consider a nonlinear map with first-order lag output dynamics: 0.2y˙ + y = J(u), where u, y ∈ RNu are the input and output respectively, and Nu = 21. The cost function is
γ1 ≤
λmin (Q) −λmax (A + AT ) c3 = ≤ c4 k1k 2λmax (P )k1k 2k1k
(42)
Hence, the upper bound on γ1 largely depends on A. Furthermore, it is obvious that the fulfilment of (42) largely depends on how γ1 changes with R. The following assumptions are going to be useful in that respect. Assumption 7: There exists a known function ργ (.) : R≥0 → R≥0 , which is decreasing to zero with respect to its argument, such that X i max |dj | ≤ ργ (R). (43) i j ∈N / i (R)
V. NUMERICAL EXAMPLE
Ji (u) =
1 T u Di u 2
where Di ∈ RNu ×Nu and for all (i, j, k) ∈ S3 , 10 ≥ dii,i ≥ 8, 0≥
dij,k
(52a) −(j−k)2
≥ −5e
2 2 e−(i−j) + e−(i−k)
∀(j, k) 6= (i, i). Assumption 8: Let dmin , dmax ∈ R>0 such that for all i ∈ S, dii,i ∈ [dmin , dmax ]. Then, there exist a known function ρ(·) : Z≥0 → R≥0 such that for all i, j ∈ S |dii,j | ≤ |dii,i |ρ(|i − j|), and ρˆ := max i
X
j6=i
ρ(|i − j|)
≤
2dmin . dmin + dmax
(44)
(45)
Remark 7: It is important to note that Assumptions 7 and 8 only require bounds on quantities within Di to be known. Precise knowledge of J is not required. In addition, Assumption 8 implies that A is diagonally dominant and, consequently, Hurwitz. Also, it is worth noting that Assumption 8 is a property of the entire plant and is independent of the selection of the neighbourhood Ni (R).
(51)
(52b)
The exact values of the entries of Di ’s are unknown. Due to the exponential decay in (52), the neighbours set can be defined as Ni (R) = {j ∈ S − {i} | |i − j| ≤ dRe} where dRe is the smallest integer larger than R. Hence, Rmax = 10. Also, the Nash equilibrium of this system is located at the origin. 1) Application of the R∗ analysis: Firstly, it is important to find ρ and ρˆ. From (52) and Assumption 8, it is found that: 2 2 ρ(|i − j|) = 0.625e−(i−j) 1 + e−(i−j) , (53) and ρˆ = 0.3145. In addition, since 8 ≤ dii,i ≤ 10 for all i, then dmin = 8 and dmax = 10. Finally, using (43) and (52), ργ (R) can be determined for all R ∈ [0, Rmax ]. Then, the upper bound on ργ can be determined using (50). (2 − ρˆ)dmin − ρˆdmax = 1.1281. 2k1k
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2014.2347212, IEEE Transactions on Automatic Control
Limited circulation. For review only
6
TABLE I: Calculated ργ bounds R
(0, 1]
(1, 2]
(2, 3]
(3, 4]
ργ
1.5797
0.0991
0.0034
3.2 × 10−5
Comparing ργ (R) given in Table I with the maximum allowable value of ργ of 1.1281, it can be concluded that dither re-use is possible for any R > 1. 2) Simulation result: A proper tuning is essential to ensure that the convergence speed (i.e. the transient performance) of the scheme is suitably fast, or even to ensure that the scheme is stable. In order to focus on the effect of dither re-use, a = ω = 1 for all simulations. Note that although ω is kept constant, the time-scale separation between the NES scheme and the plant can be achieved by choosing sufficiently small ωi . The other design parameters are the adaptation gain k, the neighbourhood parameter R and the agents’ dither frequencies ωi . The main concern in this simulation is the tuning of ωi and k at R = 10 (maximum), R = 5 and R = 2. For each R, ωi and k are optimally determined using the MATLAB Global Optimisation Toolbox with the transient performance numerically improved through many iterations of convergent plant behaviour. As larger R has more degrees of freedom, it is to be expected that the optimised transient performance should degrade as R is decreased. The purpose of the optimisation is to quantify this degradation. However, this is not indicative of the typical or expected deployments, where the opportunity to optimise transient performance in this fashion does not exist. The performance index used in the optimisation is the 10% settling time, averaged over 10 different initial conditions (such that the dependence of the result on the choice of the initial condition is reduced). Also, the 2-norm of all the initial conditions is set to be 50 (i.e. ∆ = 50). Furthermore, R is decided before an optimisation and is not changed during an optimisation. Thus, the R = 5 and the R = 2 cases are an optimisation of 12 variables (11 dither frequencies) and 6 variables (5 dither frequencies) respectively, in contrast to 22 variables when R = 10 (no dither re-use). TABLE II: The relative settling time increase and runtime reduction when R = 10, R = 5 and R = 2 R
Number of distinct
Relative settling
Wall time
frequencies
time increase
reduction
10
21
-
-
5
11
5.17%
58.1%
2
5
27.3%
84.8%
60 R=10 R=5 R=2
50
k¯ uk
40 30 20 10 0
0
10
20
30 t (s)
40
50
60
Fig. 1: Examples of convergence of k¯ uk for each R. The horizontal dashed line indicates the 10% settling time threshold at k¯ uk = 5. For all tested cases, convergence to the intended region is achieved.
Figure 1 shows examples of convergence of the NES scheme for each R. Furthermore, the settling time relative to the scheme with completely independent dithers is compared in Table II, along with the wall time taken to optimally tune the scheme for different dithers. The latter is indicative of the calibration effort required for each scheme. It can be seen that the average transient performance degrades as R is decreased. However, there is significant reduction in the time taken to produce an optimal calibration of the NES parameters, implying this task is now significantly easier. Therefore, a proper amount of dither re-use can allow for decentralised tuning without significantly impacting upon convergence properties of the closed-loop system. R EFERENCES [1] K. B. Ariyur and M. Krsti´c, Real time optimization by extremum seeking control. Hoboken, NJ : Wiley Interscience, 2003. [2] Y. Tan, D. Neˇsi´c, and I. Mareels, “On non-local stability properties of extremum seeking control,” Automatica, vol. 42, no. 6, pp. 889 – 903, 2006. ¨ uner, P. Dix, and B. Ashra, “ABS control using [3] S. Drakunov, U. Ozg¨ optimum search via sliding modes,” IEEE Trans. Contr. Syst. Technol., vol. 3, no. 1, pp. 79–85, Mar. 1995. [4] M. Guay and T. Zhang, “Adaptive extremum seeking control of nonlinear dynamic systems with parametric uncertainties,” Automatica, vol. 39, no. 7, pp. 1283–1293, Jul. 2003. [5] C. Zhang and R. Ord´on˜ ez, “Numerical optimization-based extremumseeking control with application to ABS design,” IEEE Trans. Automat. Contr., vol. 52, no. 3, pp. 454–467, Mar. 2007. [6] C. Manzie and M. Krsti´c, “Extremum seeking with stochastic perturbations,” IEEE Trans. Automat. Contr., vol. 54, no. 3, pp. 580–585, Mar. 2009. [7] W. Moase, C. Manzie, and M. Brear, “Newton-like extremum-seeking for the control of thermoacoustic instability,” IEEE Trans. Automat. Contr., vol. 55, no. 9, pp. 2094 –2105, Sept. 2010. [8] W. H. Moase and C. Manzie, “Semi-global stability analysis of observerbased extremum-seeking for Hammerstein plants,” IEEE Trans. Automat. Contr., vol. 57, no. 7, pp. 1685–1695, 2011. [9] J. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE Trans. Automat. Contr., vol. 37, no. 3, pp. 332–341, 1992. [10] K. Ariyur and M. Krsti´c, “Analysis and design of multivariable extremum seeking,” in Proc. Amer. Control Conf., vol. 4, 2002, pp. 2903– 2908. [11] W. Moase, Y. Tan, D. Neˇsi´c, and C. Manzie, “Non-local stability of a multi-variable extremum-seeking scheme,” in Proc. Australian Contr. Conf., Nov. 2011, pp. 38–43. [12] M. Stankovi´c, K. Johansson, and D. Stipanovi´c, “Distributed seeking of Nash equilibria in mobile sensor networks,” in Proc. 49th IEEE Conf. Dec. Contr., Dec. 2010, pp. 5598 –5603. [13] P. Frihauf, M. Krsti´c, and T. Bas¸ar, “Nash equilibrium seeking in noncooperative games,” IEEE Trans. Automat. Contr., vol. 57, no. 5, pp. 1192–1207, 2011. [14] S.-J. Liu and M. Krsti´c, “Stochastic Nash equilibrium seeking for games with general nonlinear payoffs,” SIAM J. Contr. Opt., vol. 49, no. 4, pp. 1659–1679, 2011. [15] P. Frihauf, M. Krsti´c, and T. Bas¸ar, “Nash equilibrium seeking for dynamic systems with non-quadratic payoffs,” in Proc. IFAC World Congr., Aug. 2011, pp. 3605 – 3610. [16] R. Kutadinata, W. Moase, and C. Manzie, “Non-local stability of a Nash equilibrium seeking scheme with dither re-use,” in Proc. 51st IEEE Conf. Dec. Contr., 2012, pp. 6077–6082. [17] B. Avi-Itzhak, B. Golany, and U. G. Rothblum, “Strategic equilibrium versus global optimum for a pair of competing servers,” J. Appl. Probability, vol. 43, no. 4, pp. 1165–1172, 2006. [18] T. Alpcan and L. Pavel, “Nash equilibrium design and optimization,” in Proc. Int. Conf. Game Theory for Networks, May 2009, pp. 164–170. [19] T. Alpcan, L. Pavel, and N. Stefanovic, “A control theoretic approach to noncooperative game design,” in Proc. 48th IEEE Conf. Dec. Contr., Dec. 2009, pp. 8575–8580. [20] D. Monderer and L. S. Shapley, “Potential games,” Games Econ. Behavior, vol. 14, no. 1, pp. 124 – 143, 1996. [21] K. Yasuda and K. Hirai, “Upper and lower bounds on the solution of the algebraic Riccati equation,” IEEE Trans. Automat. Contr., vol. 24, no. 3, pp. 483–487, Jun. 1979.
Preprint submitted to IEEE Transactions on Automatic Control. Received: August 7, 2014 22:53:51 PST 0018-9286 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.