c 2011 Society for Industrial and Applied Mathematics !
SIAM J. CONTROL OPTIM. Vol. 49, No. 4, pp. 1659–1679
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES WITH GENERAL NONLINEAR PAYOFFS∗ ´‡ SHU-JUN LIU† AND MIROSLAV KRSTIC Abstract. We introduce a multi-input stochastic extremum seeking algorithm to solve the problem of seeking Nash equilibria for a noncooperative game whose N players seek to maximize their individual payoff functions. The payoff functions are general (not necessarily quadratic), and their forms are not known to the players. Our algorithm is a nonmodel-based approach for asymptotic attainment of the Nash equilibria. Different from classical game theory algorithms, where each player employs the knowledge of the functional form of his payoff and the knowledge of the other players’ actions, a player employing our algorithm measures only his own payoff values, without knowing the functional form of his or other players’ payoff functions. We prove local exponential (in probability) convergence of our algorithms. For nonquadratic payoffs, the convergence is not necessarily perfect but may be biased in proportion to the third derivatives of the payoff functions and the intensity of the stochastic perturbations used in the algorithm. We quantify the size of these residual biases. Compared to the deterministic extremum seeking with sinusoidal perturbation signals, where convergence occurs only if the players use distinct frequencies, in our algorithm each player simply employs an independent ergodic stochastic probing signal in his seeking strategy, which is realistic in noncooperative games. As a special case of an N-player noncooperative game, the problem of standard multivariable optimization (when the players’ payoffs coincide) for quadratic maps is also solved using our stochastic extremum seeking algorithm. Key words. Nash equilibrium, stochastic extremum seeking, stochastic averaging AMS subject classifications. 60H10, 93E03, 93E15, 93E35 DOI. 10.1137/100811738
1. Introduction. Seeking Nash equilibria in continuous games is a difficult problem (see [14]). Researchers in different fields including mathematics, computer science, economics, and system engineering have interest and need for techniques for finding Nash equilibria. Most algorithms designed to achieve convergence to Nash equilibria require modeling information for the game and assume that the players can observe the actions of the other players. The first serious algorithm perhaps is [28], in which a gradient-type algorithm is studied for convex games. Distributed iterative algorithms are designed for the computation of equilibrium in [16] for a general class of nonquadratic convex Nash games. In this algorithm, the agents do not have to know each other’s cost functionals and private information as well as the parameters and subjective probability distributions adopted by the others, but they have to communicate to each other their tentative decisions during each phase of computation. A strategy known as fictitious play is one such strategy that depends on the actions of the other players so that a player can devise a best response. A dynamic version of fictitious play and gradient response is developed in [31]. In [38], a syn∗ Received by the editors October 14, 2010; accepted for publication (in revised form) April 15, 2011; published electronically July 21, 2011. This research was supported by the National Natural Science Foundation of China under grant 60704016 and the Excellent Young Teacher Foundation of Southeast University 3207011203 and by grants from the U.S. National Science Foundation and the Office of Naval Research. http://www.siam.org/journals/sicon/49-4/81173.html † Department of Mathematics, Southeast University, Nanjing, 210096, China (
[email protected]). ‡ Department of Mechanical and Aerospace Engineering, University of California, San Diego, La Jolla, CA 92093-0411 (
[email protected]).
1659
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1660
´ SHU-JUN LIU AND MIROSLAV KRSTIC
chronous distributed learning algorithm is designed to the coverage optimization of mobile visual sensor networks. In this algorithm, players remember their own actions and utility values from the previous two times steps, and the algorithm is shown to converge in probability to the set of restricted Nash equilibria. Other diverse engineering applications of game theory include the design of communication networks in [22, 1, 3, 29], integrated structures and controls in [27], and distributed consensus protocols in [5, 23, 30]. A comprehensive treatment of static and dynamic noncooperative game theory can be found in [4]. Extremum seeking is a nonmodel-based real-time optimization approach for dynamic problems where only limited knowledge of a system is available. Since the emergence of a proof of its stability [13], extremum seeking has been an active research area both in applications [11, 17, 21, 24, 25, 37] and in further theoretical developments [2, 7, 34, 35, 36]. Based on the extremum seeking approach with sinusoidal perturbations, in [12], Nash equilibrium seeking is studied for noncooperative games with both finitely and infinitely many players. In [33], Nash games in mobile sensor networks are solved using extremum seeking. Owing to certain advantages of stochastic perturbations over the sinusoidal ones, in [19, 20], we investigated the stochastic extremum seeking algorithm for the single perturbation input case. In this work, a multi-input stochastic extremum seeking algorithm is developed for finding Nash equilibria in N-player noncooperative games. First, to analyze the convergence of the algorithm, a multi-input stochastic averaging theory is developed. Here multi-input means multiscaled stochastic perturbation input. Most of the existing stochastic averaging theory focuses on the systems with the single-scaled stochastic perturbation input [6, 8, 15, 18] or on two-time-scales systems with slow dynamics and fast dynamics [32, 10]. There are few results on stochastic averaging for systems with multiscaled stochastic perturbation inputs. For an N-player noncooperative game, each player employs independently stochastic extremum seeking to attain a Nash equilibrium. Similar to the deterministic case [9], the key feature of our approach is that the players are not required to know the mathematical model of their payoff function or the underlying model of the game. The players need only measure their own payoff values. It is proved that under certain conditions, the actions of players converge to a neighborhood of a Nash equilibrium. The convergence result is local in the sense that convergence to any particular Nash equilibrium is assured only for initial conditions in a set around that specific stable Nash equilibrium. Moreover, convergence to a Nash equilibrium is biased in proportion to the third derivatives of the payoff functions and is dependent on the intensity of stochastic perturbation. Compared to the deterministic case, one advantage of stochastic extremum seeking is that there is no need to choose different perturbation frequencies for each player and each player only needs to choose his own perturbation process independently, which is more realistic in a practical game with adversarial players. Finally, when all players have the same quadratic payoff, the Nash equilibrium seeking problem for an N-player noncooperative game reduces to a standard multiparameter extremum seeking problem. For this special case, we design a stochastic multiparameter extremum seeking algorithm and analyze its convergence. The paper is organized as follows: we introduce our general problem formulation in section 2, state our algorithm and convergence results in section 3, and present the convergence proof in section 4. We provide a numerical example for a two-player game in section 5. Finally, we state our extremum seeking algorithm for multiparameter quadratic static maps in section 6 and state our stochastic averaging theory for the multi-input case in the appendix.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1661
2. Problem formulation. Consider an N-player noncooperative game where each player wishes to maximize his payoff function of the general nonlinear form. Assume the payoff function of player i is of the form (2.1)
Ji = hi (ui , u−i ),
where ui is player i’s action, the action (strategy) space is the whole space R, u−i = [u1 , . . . , ui−1 , ui+1 , . . . , uN ] represents the actions of the other players, hi : RN → R is smooth, and i ∈ {1, . . . , N }. Our algorithm is based on the following assumptions. Assumption 2.1. There exists at least one, possibly multiple, isolated stable Nash equilibrium u∗ = [u∗1 , . . . , u∗N ] such that ∂hi ∗ (u ) = 0, ∂ui ∂ 2 hi ∗ (u ) < 0 ∂u2i
(2.2) (2.3) for all i ∈ {1, . . . , N }. Assumption 2.2. The matrix 2 ∗ Ξ=
(2.4)
∂ h1 (u ) ∂u21 ∂ 2 h2 (u∗ ) ∂u1 ∂u2
∂ 2 h1 (u∗ ) ∂u1 ∂u2
∂ 2 hN (u∗ ) ∂u1 ∂uN
∂ 2 hN (u∗ ) ∂u2 ∂uN
.. .
∂ 2 h2 (u∗ ) ∂u22
.. .
... ... .. . ···
∂ 2 h1 (u∗ ) ∂u1 ∂uN ∂ 2 h2 (u∗ ) ∂u2 ∂uN
.. .
∂ 2 hN (u∗ ) ∂u2N
is strictly diagonally dominant and hence, nonsingular. By Assumptions 2.1 and 2.2, Ξ is Hurwitz. In our scheme, player i has no knowledge of other players’ payoff hj (j #= i) and actions uj (j #= i). He can measure only his own payoff hi . Our objective is to design a stochastic extremum seeking algorithm for each player to approximate Nash equilibrium. 3. Stochastic Nash equilibrium seeking algorithm. In our algorithm, each player independently employs a stochastic seeking strategy to attain the stable Nash equilibrium of the game. Player i implements the following strategy: (3.1)
ui (t) = uˆi (t) + ai fi (ηi (t)), dˆ ui (t) = ki ai fi (ηi (t))Ji (t), dt
(3.2)
where for any i = 1, . . . , N , ai > 0 is the perturbation amplitude, ki > 0 is the adaptive gain, Ji (t) is the measured payoff value for player i, and fi is a bounded smooth function that player i chooses, e.g., a sine function. ηi (t), i = 1, . . . , N, are independent time homogeneous continuous Markov ergodic processes chosen by player i, e.g., the Ornstein–Uhlenbeck (OU) process (3.3)
ηi =
√
εi qi ˙ [Wi ] or εi s + 1
εi dηi (t) = −ηi (t)dt +
√ εi qi dWi (t),
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1662
K1 (t )
a1 sin( 0, and Wi (t), i = 1, . . . , N , are independent 1-dimensional standard Brownian motion on a complete probability space (Ω, F , P ) with the sample space Ω, σ-field F , and probability measure P . System (3.2) is given by an ordinary differential equation with stochastic perturbation (see [8]), namely, a stochastic ordinary differential equation, and its solution can be defined for each sample path of the perturbation process (ηi (t), t ≥ 0), which is given by Ito stochastic differential equation (3.3). Figure 1 depicts a noncooperative game played by two players implementing the stochastic extremum seeking strategy (3.1)–(3.2) to attain a Nash equilibrium. To analyze the convergence of the algorithm, we denote the error relative to the Nash equilibrium as ˆi (t) − u∗i . u˜i (t) = u
(3.4)
Then, we obtain an error system as (3.5) (1)
' ( d˜ ui (t) (1) (1) (1) = ki ρi (t)hi u∗i + u ˜i + ρi (t), u∗−i + u ˜−i + ρ−i (t) , dt (1)
where ρi (t) = ai fi (ηi (t)), ρ−i (t) = [a1 f1 (η1 (t)), . . . , ai−1 fi−1 (ηi−1 (t)), ai+1 fi+1 (ηi+1 (t)), . . . , aN fN (ηN (t))], u˜∗−i = [˜ u∗1 , . . . , u ˜∗i−1 ,˜ u∗i+1 , . . . , u ˜∗N ], and u ˜−i = [˜ u1 , . . . , u˜i−1 ,˜ ui+1 , . . . , u˜N ]. If the players choose fi (x) = sin x for all i = 1, . . . , N , and ηi as independent OU processes (3.3), we have the following convergence result. Theorem 3.1. Consider the error system (3.5) for an N -player game under Assumptions 2.1 and 2.2. Then there exists a constant a∗ > 0 such that for
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1663
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
max1≤i≤N ai ∈ (0, a∗ ) there exist constants r > 0, c > 0, γ > 0 and a function T (ε1 ) : (0, ε0 ) → N such that for any initial condition |Λε1 (0)| < r and any δ > 0, ) * (3.6) lim inf t ≥ 0 : |Λε1 (t)| > c|Λε1 (0)|e−γt + δ + O(max a3i ) = ∞ a.s. ε1 →0
i
and (3.7)
) * lim P |Λε1 (t)| ≤ c|Λε1 (0)|e−γt + δ + O(max a3i ) ∀t ∈ [0, T (ε1 )] = 1
ε1 →0
i
with lim T (ε1 ) = ∞,
ε1 →0
where
Λε1 (t) = u ˜1 (t) −
(3.8)
d1jj . .. dj−1 jj j djj j+1 djj .. . dN jj
(3.9)
2
N + j=1
d1jj a2j , . . . , u ˜N (t) −
−1 = −Ξ
N + j=1
2 dN jj aj ,
∂ 3 h1 1 ∗ 2 G0 (qj ) ∂u1 ∂u2j (u )
∂ 3 hj−1 1 ∗ G (q ) (u ) 2 0 j 2 ∂uj−1 ∂uj 3 (q ) h G ∂ j 1 1 j ∗ (u ) 3 6 G0 (qj ) ∂uj , ∂ 3 hj+1 1 ∗ G (q ) (u ) 2 0 j ∂u2j ∂uj+1 .. . ∂ 3 hN 1 ∗ G (q ) (u ) 2 0 j ∂u2 ∂uN .. .
j
2
2
2
2
2
and G0 (qj ) = 12 (1−e−qj ), G1 (qj ) = 38 − 12 e−qj + 18 e−4qj = 18 (1−e−qj )2 (e−2qj +2e−qj + 3). Several remarks are needed in order to properly interpret Theorem 3.1. From , i 2 ui (t) − N (3.6) and the fact |Λε1 (t)| ≥ maxi |˜ j=1 djj aj |, we obtain
0 0 0 N 00 + 0 ˜i (t) − dijj a2j 00 > c|Λε1 (0)|e−γt + δ + O(max a3i ) lim inf t ≥ 0 : max 00u ε1 →0 i 0 i 0 j=1 = ∞ a.s.
By taking all the ai ’s small, maxi |˜ ui (t)| can be made arbitrarily small as t → ∞. ,N The bias terms j=1 dijj a2j defined by (3.9) appear complicated but have a simple physical interpretation. When the game’s payoff functions are not quadratic (not symmetric), the extremum seeking algorithms, which employ zero-mean (symmetric) perturbations, will produce a bias. According to the formula (3.9), the bias depends
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1664
on the third derivatives of the payoff functions, namely, on the level of asymmetry in the payoff surfaces at the Nash equilibrium. In the trivial case of a single player the interpretation is easy—extremum seeking settles on the flatter (more favorable) side of an asymmetric peak. In the case of multiple players the interpretation is more difficult, as each player contributes both to his own bias and to the other players’ biases. Though difficult to intuitively interpret in the multiplayer case, the formula (3.9) is useful as it quantifies the biases. The estimate of the region of attraction r can be conservatively taken as independent of the ai ’s, for ai ’s chosen sufficiently small. This fact can be seen only by going through the proof of the averaging theorem for the specific system (3.5). Hence, r is larger than the bias terms, which means that for small ai ’s the algorithm reduces the distance to the Nash equilibrium for all initial conditions except for those within an O(maxi a2i ) to the Nash equilibrium. On the other hand, the convergence rate γ cannot be taken independently of the ai ’s, because the ai ’s appear as factors on the entire right-hand side of (3.5). However, by letting the ki ’s increase as the ai ’s decrease, independence of γ from the ai ’s can be ensured. In the rare case where the error system (3.5) may be globally Lipschitz, we obtain global convergence using the global averaging theorem in [18]. 4. Proof of the algorithm convergence. We apply the multi-input stochastic averaging theory presented in the appendix to analyze the error system (3.5). First, we calculate the average system of (3.5). Define χi (t) = ηi (εi t) and Bi (t) = √1εi Wi (εi t). Then by (3.3) we have dχi (t) = −χi (t)dt + qi dBi (t),
(4.1)
where [B1 (t), . . . , BN (t)]T is an N -dimensional standard Brownian motion on the space (Ω, F , P ). Thus we can rewrite the error system (3.5) as (4.2)
' ( d˜ ui (t) (2) (2) (2) = ki ρi (t/εi )hi u∗i + u ˜i + ρi (t/εi ), u∗−i + u ˜−i + ρ−i (t/ε−i ) , dt (2)
(2)
where ρi (t) = ai sin(χi (t)), ρ−i (t/ε−i ) = [a1 sin(χ1 (t/ε1 )), . . . , ai−1 sin (χi−1 (t/εi−1 )), ai+1 sin(χi+1 (t/εi+1 )), . . . , aN sin(χN (t/εN ))]. Denote ε1 (4.3) εi = , i = 2, . . . , N ci for some positive real constants ci and consider the change of variable (4.4)
Z1 (t) = χ1 (t), Z2 (t) = χ2 (c2 t), . . . , ZN (t) = χ(cN t).
Then the error system (4.2) can be transformed as one with single small parameter ε1 : (4.5)
' ( d˜ ui (t) (3) (3) (3) = ki ρi (t/ε1 )hi u∗i + u ˜i + ρi (t/ε1 ), u∗−i + u˜−i + ρ−i (t/ε1 ) , dt (3)
(3)
where ρi (t) = ai sin(Zi (t)), ρ−i (t/ε1 ) = [a1 sin(Z1 (t/ε1 )), . . . , ai−1 sin (Zi−1 (t/ε1 )), ai+1 sin(Zi+1 (t/ε1 )), . . . , aN sin(ZN (t/ε1 ))].
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1665
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
√1 e πqi
−
x2 i q2 i
For (χi (t), t ≥ 0) that is ergodic and has invariant distribution µi (dxi ) = dxi T (see [26]), by Lemma A.2, the vector-valued process [Z1 (t), . . . , ZN (t)] is also ergodic with invariant distribution (µ1 × · · · × µN ). Thus by (A.4), we have the average error system (4.6) 4 6 5 d˜ uave i (t) = ki ai sin(xi )hi u∗i + u˜ave + ai sin(xi ), u∗−i + u ˜ave i −i + a−i sin(x−i ) dt RN µ1 (dx1 ) × · · · × µN (dxN ), where a−i sin(x−i ) = [a1 sin(x1 ), . . . , ai−1 sin(xi−1 ), ai+1 sin(xi+1 ), . . . , aN sin(xN )], and µi is the invariant distribution of the process (χi (t), t ≥ 0) or (Zi (t), t ≥ 0). The equilibrium u ˜e = [˜ ue1 , . . . , u ˜eN ] of (4.6) satisfies (4.7)
0=
4
RN
5 6 sin(xi )hi u∗i + u˜ei + ai sin(xi ), u∗−i + u˜e−i + a−i sin(x−i ) µ1 (dx1 ) × · · · × µN (dxN )
for all i = {1, . . . , N }. To calculate the equilibrium of the average error system and analyze its stability, we postulate that u ˜e has the form u ˜ei =
(4.8)
N +
bij aj +
j=1
N + N +
dijk aj ak + O(max a3i ). i
j=1 k≥j
By expanding hi about u∗ in (4.7) and substituting (4.8), the unknown coefficients bij and dijk can be determined. The Taylor series expansion of hi about u∗ in (4.7) for an N-player game is (4.9) hi (u∗ + vi , u∗−i + v−i ) =
∞ +
n1 =0
···
8 ∞ nN 7 n1 +···+nN + v1n1 · · · vN ∂ hi (u∗ ), n1 nN n ! · · · n ! ∂u · · · ∂u 1 N 1 N n =0 N
where vi = u ˜ei +ai sin(xi ) and v−i = u ˜e−i +a−i sin(x−i ). Although for any i = 1, . . . , N , hi may not have its Taylor series expansion only by its smoothness, here we give just the form of Taylor series expansion. In fact, we need only its third order Taylor formula. 2 Since the invariant distribution µi (dxi ) of OU process (χi (t), t ≥ 0) is (4.10)
(4.11)
4
R
sin2k+1 (xi )µi (dxi ) =
4
+∞
−∞
√1 e πqi
x
− qi i
x2
sin2k+1 (xi ) √
1 − q2i e i dxi πqi
= 0, k = 0, 1, 2, . . . , 4 4 +∞ x2 1 − q2i 2 2 sin (xi )µi (dxi ) = sin (xi ) √ e i dxi πqi −∞ R 2 1 = (1 − e−qi ) = G0 (qi ), 2
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
dxi ,
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1666 (4.12)
4
4
R
sin (xi )µi (dxi ) =
4
+∞ −∞
x2
1 − q2i sin (xi ) √ e i dxi πqi 4
2 2 3 1 1 = − e−qi + e−4qi = G1 (qi ), 8 48 2
(4.13)
=
R2
4
sin(xi ) sin(xj )µi (dxi ) × µj (dxj )
+∞
−∞
(4.14) =
4
R2
4
−∞
(4.15) =
R2
4
−∞
(4.16) =
R2
4
−∞
x2 j
x2
1 − q2i 1 − qj2 sin(xi ) sin(xj ) √ e i √ e dxi dxj = 0, πqi πqj
4
+∞ −∞
x2 j
x2
1 − q2i 1 − qj2 sin (xi ) sin(xj ) √ e i √ e dxi dxj = 0, πqi πqj 2
sin3 (xi ) sin(xj )µi (dxi ) × µj (dxj )
+∞
4
+∞
sin2 (xi ) sin(xj )µi (dxi ) × µj (dxj )
+∞
4
4
4
+∞ −∞
x2 j
x2
1 − q2i 1 − qj2 sin3 (xi ) sin(xj ) √ e i √ e dxi dxj = 0, πqi πqj
sin2 (xi ) sin2 (xj )µi (dxi ) × µj (dxj )
+∞
−∞
4
+∞ −∞
x2 j
x2
1 − q2i 1 − qj2 sin (xi ) sin (xj ) √ e i √ e dxi dxj πqi πqj 2
2
2 2 1 = (1 − e−qi )(1 − e−qj ) ! G2 (qi , qj ), 4 4 (4.17) sin(xi ) sin(xj ) sin(xk )µi (dxi ) × µj (dxj ) × µk (dxk )
R3
=
4
+∞
−∞
4
+∞ −∞
4
+∞
−∞
x2
x2 j
× dxi dxj dxk = 0, 4 (4.18) sin(xi ) sin2 (xj ) sin(xk )µi (dxi ) × µj (dxj ) × µk (dxk ) R3
=
4
+∞
−∞
4
+∞ −∞
4
+∞
−∞
x2
− 2k 1 − q2i 1 − qj2 1 √ sin(xi ) sin(xj ) sin(xk ) √ e i √ e e qk πqi πqj πqk
x2
x2 j
x2
− 2k 1 − q2i 1 − qj2 1 √ sin(xi ) sin (xj ) sin(xk ) √ e i √ e e qk πqi πqj πqk 2
× dxi dxj dxk = 0. Based on the above calculations, substituting (4.9) into (4.7) and computing the average of each term gives N
(4.19)
+ ∂ 2 hi ∗ ∂ 2 hi (u ) + a2i G0 (qi ) u˜ej (u∗ ) 2 ∂ui ∂ui ∂uj j)=i 7 2 8 3 4 ai ∂ hi ∗ a + G0 (qi )(˜ uei )2 + i G1 (qi ) (u ) 2 6 ∂u3i
0 = a2i G0 (qi )˜ uei
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
+ a2i G0 (qi )˜ uei +
N + j)=i
+
9
N + j)=i
u ˜ej
1667
∂ 3 hi (u∗ ) ∂u2i ∂uj
: 2 2 a a ∂ 3 hi i j G0 (qi )(˜ G2 (qi , qj ) uej )2 + (u∗ ) 2 2 ∂ui ∂u2j
a2i
N N + +
a2i G0 (qi )˜ uej u ˜ek
j)=i k>j,k)=i
∂ 3 hi (u∗ ) + O(max a5i ), i ∂ui ∂uj ∂uk
or, equivalently, 7 8 N 2 + 1 e 2 a2i G1 (qi ) ∂ 3 hi ∗ ∂ 2 hi ∗ e ∂ hi ∗ (˜ u (u ) + u ˜ (u ) + ) + (u ) j ∂u2i ∂ui ∂uj 2 i 6 G0 (qi ) ∂u3i j)=i 9 : N N 3 + + 1 e 2 a2j ∂ 3 hi e e ∂ hi ∗ (˜ uj ) + G0 (qj ) u ˜j 2 (u ) + (u∗ ) + u˜i ∂ui ∂uj 2 2 ∂ui ∂u2j
˜ei (4.20) 0 = u
j)=i
+
N N + +
j)=i
j)=i k>j,k)=i
∂ 3 hi u ˜ej u ˜ek (u∗ ) + O(max a3i ). i ∂ui ∂uj ∂uk
Substituting (4.8) into (4.19) and 0 .. (4.21) . = Ξ 0
matching first order powers of ai gives b1i .. , i = 1, . . . , N, . bN i
which implies that bij = 0 for all i, j since Ξ is nonsingular by Assumption 2.2. Similarly, matching second order terms aj ak (j > k) and a2j of aj and substituting bij = 0 to simplify the resulting expressions yields 1 djk 0 .. .. (4.22) . = Ξ . , j = 1, . . . , N, j > k, dN 0 jk and
(4.23)
0 .. . 0
1 d jj .. + = Ξ . dN jj
∂ 3 h1 1 ∗ 2 G0 (qj ) ∂u1 ∂u2j (u )
3 ∂ hj−1 1 ∗ G (q ) (u ) 2 0 j ∂uj−1 ∂u2j 3 (q ) h G ∂ j 1 1 j ∗ (u ) 3 6 G0 (qj ) ∂uj . ∂ 3 hj+1 1 ∗ G (q )) (u ) 2 0 j ∂u2j ∂uj+1 .. . ∂ 3 hN 1 ∗ G (q ) (u ) 2 0 j ∂u2 ∂uN .. .
j
Thus, dijk = 0 for all i, j, k when j #= k, and dijj is given by (3.9).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1668
Therefore, by (4.8), the equilibrium of the average error system (4.6) is u˜ei
(4.24)
=
N +
dijj a2j + O(max a3i ). i
j=1
By the dominated convergence theorem, we obtain that the Jacobian Ψave = (ψij )N ×N of the average error system (4.6) at u ˜e has elements given by (4.25) (4.26)
ψij = ki
4
RN
ai sin(xi )
∂hi 5 ∗ ˜ei + ai sin(xi ), u∗−i + u ˜e−i u +u ∂uj i
+ a−i sin(x−i )) µ1 (dx1 ) × · · · × µN (dxN ) = ki a2i G0 (qi )
6 5 ∂ 2 hi (u∗ ) + O max a3i i ∂ui ∂uj
and is Hurwitz by Assumptions 2.1 and 2.2 for sufficiently small ai , which implies that the equilibrium (4.24) of the average error system (4.6) is locally exponentially stable. By the multi-input averaging theorem in the appendix, the theorem is proved. 5. Numerical example. We consider two players with payoff functions (5.1)
3 J1 = −u31 + 2u1 u2 + u21 − u1 , 4
(5.2)
J2 = 2u21 u2 − u22 .
Since J1 is not globally concave in u1 , we restrict the action space to A = {u1 ≥ 1/3 , u2 ≥ 1/6} in order to avoid the existence of maximizing actions at infinity or Nash equilibria at the boundary of the action space. (However, we do not restrict the extremum seeking algorithm to A. Such a restriction can be imposed using parameter projection but would complicate our exposition considerably.) ∗1 ∗2 ∗2 The game (J1 , J2 ) yields two Nash equilibria: (u∗1 1 , u2 ) = (0.5, 0.25) and (u1 , u2 ) = (1.5, 2.25). The corresponding matrices are Ξ1 =
A
−1 2 2 −2
B
and Ξ2 =
A
−7 6
2 −2
B
,
where Ξ1 is nonsingular but not Hurwitz, while Ξ2 is nonsingular and Hurwitz, and both matrices are not diagonally dominant. From the proof of the algorithm convergence, we know that diagonal dominance is only a sufficient condition for Ξ to be nonsingular and is not required in general. The average error system for this game is (5.3)
2 d˜ uave 1 (t) 4 = k1 a21 G0 (q1 )(−3˜ uave − 6u∗1 u˜ave + 2˜ uave + 2˜ uave 1 1 2 1 ) − k1 a1 G1 (q1 ), dt
(5.4)
2 d˜ uave 2 (t) 2 2 = k2 a22 G0 (q2 )(−2˜ uave + 2˜ uave + 4u∗1 u ˜ave 2 1 1 ) + 2k2 a1 a2 G2 (q1 , q2 ), dt
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1669
∗2 where u∗1 can be u∗1 ue1 , u ˜e2 ) of this average system are 1 or u1 . The equilibria (˜ C 8 7 G1 (q1 ) e ∗ − 2G0 (q1 ) , u˜1 = 1 − u1 ± (1 − u∗1 )2 − a21 (5.5) G0 (q1 ) C 8 7 G1 (q1 ) e ∗ ∗ 2 2 u˜2 = 2 − 2u1 ± 2 (1 − u1 ) − a1 − 2G0 (q1 ) (5.6) G0 (q1 )
− a21
G1 (q1 ) + 3a21 G0 (q1 ), G0 (q1 )
and their postulated form is (5.7)
u˜e,p 1
(5.8)
u˜e,p 2
8 7 G1 (q1 ) 1 − 2G0 (q1 ) a21 + O(max a3i ), = i 2(1 − u∗1 ) G0 (q1 ) 7 8 ∗ ∗ u1 G1 (q1 ) 1 − 3u1 + = G0 (q1 ) a21 + O(max a3i ). i 1 − u∗1 G0 (q1 ) 1 − u∗1
The corresponding Jacobian matrices are A (−6˜ ue1 − 6u∗1 + 2)γ1 Ψave = (5.9) (2˜ ue1 + 4u∗1 )γ2
2γ1 −2γ2
B
,
where γi = ki a2i G0 (qi ), i = 1, 2, and their characteristic equation is given by λ2 + α1 λ + α2 = 0, where (5.10) (5.11)
α1 = (6˜ ue1 + 6u∗1 − 2)γ1 + 2γ2 , α2 = (2˜ ue1 + u∗1 − 1)4γ1 γ2 .
Thus Ψave is Hurwitz if and only if α1 and α2 are positive. For sufficiently small a1 , which makes u˜e ≈ (0, 0), α1 and α2 are positive for u∗1 = 1.5, but for u∗1 = 0.5, α2 is not positive, which is reasonable because Ξ1 is not Hurwitz, but Ξ2 is Hurwitz. Thus, ∗1 ∗2 ∗2 (u∗1 1 , u2 ) = (0.5, 0.25) is an unstable Nash equilibrium, but (u1 , u2 ) = (1.5, 2.25) is a stable Nash equilibrium. We employ the stochastic multi-input extremum seeking algorithm given in section 3 to attain this stable equilibrium. The top picture in Figure 2 depicts the evolution of the game in the u ˜ plane, u1 (0), u˜2 (0)) = (−1.5, 0.75). initialized at the point (u1 (0), u2 (0)) = (0, 3), i.e., at (˜ Note that the initial condition is outside of A. This illustrates the point that the region of attraction of the stable Nash equilibrium under the extremum seeking algorithm is not a subset of A but a large subset of R2 . The parameters are chosen as k1 = 14, k2 = 6, a1 = 0.2, a2 = 0.02, ε1 = 0.01, ε2 = 0.8. The bottom two pictures depict the two players’ actions in stochastically seeking the Nash equilibrium (u∗1 , u∗2 ) = (1.5, 2.25). From Figure 2, the actions of the players converge to a small neighborhood of the stable Nash equilibrium. In the algorithm, bounded smooth functions fi and the excitation processes (ηi (t), t ≥ 0), i = 1, . . . , N , can be chosen in other forms. We can replace the bounded excitation signal sin(ηi (t)) = sin(χi (t/εi )) with the signal H T (ˇ ηi (t/εi )), where ηˇi (t) = [cos(Wi (t)), sin(Wi (t))]T is a Brownian motion on the unit circle (see [19]), and G = [g1 , g2 ]T is a constant vector. Figure 3 depicts the evolution of the game in the u ˜ plane for games with Brownian motion on the unit circle as perturbation. The initial conditions are the same with the
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1670
0.8 0.6 0.4 0.2
(˜ u1 (t), u ˜2 (t)) (˜ ue1 , u ˜e2 ) ≈ (0, 0)
0
(˜ u1 (0), u ˜2 (0))
−0.2 −2
−1.5
−1
2
−0.5
0
0.5
3
u2 (t) u∗2
2.8
1.5
2.6 1 2.4
u1 (t) 0.5
0
u∗1
0
5000
10000 Time (sec)
2.2
15000
20000
2
0
5000
10000 Time (sec)
15000
20000
Fig. 2. Stochastic Nash equilibrium seeking with an OU process perturbation. Top: evolution of the game in the u ˜ plane. Bottom: two players’ actions. 0.8 (˜ u1 (t), u ˜2 (t)) 0.6
(˜ ue1 , u ˜e2 ) ≈ (0, 0) (˜ u1 (0), u ˜2 (0))
0.4 0.2 0 −0.2 −2
−1.5
−1
2
−0.5
0
0.5
3.4
u2 (t)
3.2
1.5
u∗2
3 1
2.8 2.6
0.5
u1 (t)
−0.5
2.4
u∗1
0
2.2 0
5000
10000 Time (sec)
15000
20000
2
0
5000
10000 Time (sec)
15000
20000
Fig. 3. Stochastic Nash equilibrium seeking with Brownian motion on the unit circle as perturbation. Top: evolution of the game in the u ˜ plane. Bottom: two players’ actions.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1671
case of the OU process perturbation. The parameters are chosen as k1 = 5, k2 = 9, a1 = 0.2, a2 = 0.04, ε1 = 0.02, ε2 = 0.02. From Figure 3, the actions of the players also converge to a small neighborhood of the stable Nash equilibrium. In these two simulations, a possibly different high-pass filter for each player’s measurement on the payoff is used to improve the asymptotic performance but is not essential for achieving stability (see [36]), which can also be seen from the following stochastic multiparameter extremum seeking algorithm. 6. Multiparameter extremum seeking for static maps. 6.1. Stochastic multiparameter extremum seeking algorithm. Let f (θ) be a function of the form (6.1)
f (θ) = f ∗ + (θ − θ∗ )T P (θ − θ∗ ),
where P = (pij )l×l ∈ Rl×l is an unknown symmetric matrix, f ∗ is an unknown constant, θ = [θ1 , . . . , θl ]T , and θ∗ = [θ1∗ , . . . , θl∗ ]T . Any C 2 (Rl ) function f (θ) with an extremum at θ = θ∗ and with ∇2 f #= 0 can be locally approximated by (6.1). Without loss of generality, we assume that the matrix P is positive definite. The objective is to design an algorithm to make |θ − θ∗ | as small as possible, so that the output y = f (θ) is driven to its minimum f ∗ . This problem is a special case of a finite and multiplayer noncooperative game: all the players’ payoffs are the same with a quadratic static map, and the corresponding matrix Ξ is 2P . Here, we do not assume that Ξ, i.e., 2P , is strictly diagonally dominant. For the static map, we can prove that the condition of strictly diagonal dominance is not necessary. Denote θˆj (t) as the estimate of the unknown optimal input θj∗ and let (6.2)
θ˜j (t) = θj∗ − θˆj (t)
denote the estimation error. We use stochastic perturbation to develop a gradient estimate for every parameter. Let (6.3)
θj (t) = θˆj (t) + aj sin(ηj (t)),
where aj > 0 is the perturbation amplitude and (ηj (t), t ≥ 0) is an OU process as in (3.3). By (6.2) and (6.3), we have (6.4)
θj (t) − θj∗ = aj sin(ηj (t)) − θ˜j (t).
Substituting (6.4) into (6.1), we have the output (6.5)
y(t) = f ∗ + (θ(t) − θ∗ )T P (θ(t) − θ∗ ),
where θ(t) − θ∗ = [a1 sin(η1 (t)) − θ˜1 (t), . . . , al sin(ηl (t)) − θ˜l (t)]T . We design the parameter update law as follows: (6.6) (6.7) (6.8)
dθˆj (t) = −kj aj sin(ηj (t))(y(t) − ξi (t)), dt dξj (t) = −hj ξj (t) + hj y(t), dt √ εj dηj (t) = −ηj (t)dt + εj qj dWj (t),
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1672
where hj , kj , j = 1, . . . , l, are scalar design parameters. Different from the extremum seeking algorithm in section 3, where we excluded the standard washout filter of the output signal [13], which is not essential for convergence but helps performance, in this s for each parameter, and the gradient estimation section we use a washout filter s+h j s [y] = y(t) − ξj (t) of this filter. for each parameter is based on the output s+h j 1 Define χj (t) = ηj (εj t) and Bj (t) = √εj Wj (εj t). Then we have dχj (t) = −χj (t)dt + qj dBj (t),
(6.9)
where Bj (t) is a 1-dimensional standard Brownian motion defined on the complete probability space (Ω, F , P ), while [B1 (t), . . . , Bl (t)]T is an l-dimensional independent standard Brownian motion on the same space. Define the output error variable ej (t) = ξj (t) − f ∗ , j = 1, . . . , l. Therefore, it follows from (6.2), (6.5), (6.6), and (6.7) that we have the error dynamics
(6.10)
(6.11)
dθ˜j (t) dθˆj (t) =− dt dt = −kj aj sin(ηj (t))((θ(t) − θ∗ )T P (θ(t) − θ∗ ) − ej (t)) = −kj aj sin(χj (t/εj ))((θ(t) − θ∗ )T P (θ(t) − θ∗ ) − ej (t)), dej (t) = hj (y(t) − f ∗ − ej (t)) dt = hj ((θ(t) − θ∗ )T P (θ(t) − θ∗ ) − ej (t)), j = 1, . . . , l.
˜ Denote θ(t)= [θ˜1 (t), . . . , θ˜l (t)]T and e(t) = [e1 (t), . . . , el (t)]T . Then we have the following result. Theorem 6.1. Consider the static map (6.1) under the parameter update law (6.6)–(6.8). Then the error system (6.10)–(6.11) is weak stochastic exponentially stable; i.e., there exist constants r > 0, c > 0, and γ > 0 such that for any initial condition |Λε11 (0)| < r and any δ > 0 D E lim inf t ≥ 0 : |Λε11 (t)| > c|Λε11 (0)|e−γt + δ = +∞ a.s.
(6.12)
ε1 →0
Moreover, there exists a function T (ε1 ) : (0, ε0 ) → N such that (6.13) lim P
ε1 →0
F
sup 0≤t≤T (ε1 )
D ε1 E |Λ1 (t)| − c|Λε11 (0)|e−γt > δ
G
=0
with lim T (ε1 ) = ∞, ε1 →0
˜ T , e(t)T ) − (0T , ,l pii a2 G0 (qi )I T ), I1 = [1, 1, . . . , 1]T . Furwhere Λε11 (t) = (θ(t) 1 i l×l 1×l i=1 thermore, (6.13) is equivalent to (6.14) D E lim P |Λε11 (t)| ≤ c|Λε11 (0)|e−γt + δ ∀t ∈ [0, T (ε1)] = 1 with lim T (ε1 ) = ∞. ε1 →0
ε1 →0
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1673
6.2. Convergence analysis. We rewrite the error dynamics (6.10)–(6.11) as (6.15) dθ˜j (t) dt
' = kj aj sin(χj (t/εj )) [a1 sin(χ1 (t/ε1 )) − θ˜1 (t), . . . , al sin(χl (t/εl )) − θ˜l (t)]T P ( ×[a1 sin(χ1 (t/ε1 )) − θ˜1 (t), . . . , al sin(χl (t/εl )) − θ˜l (t)]
aj sin(χj (t/εj )) = kj l ' (' ( + × pik ai sin(χi (t/εi )) − θ˜i (t) ak sin(χk (t/εk )) − θ˜k (t) − ej (t) , i,k=1
(6.16)
dei (t) = hj dt
l +
i,k=1
j = 1, . . . , l.
(' ( ' pik ai sin(χi (t/εi )) − θ˜i (t) ak sin(χk (t/εk )) − θ˜k (t) − ej (t) ,
Now we calculate the average system of the error system. Assume that εi =
(6.17)
ε1 , ci
i = 2, . . . , l,
for some positive real constants ci . Denote (6.18)
Z1 (t) = χ1 (t), Z2 (t) = χ2 (c2 t), . . . , Zl (t) = χ(cl t).
Then the error dynamics become (6.19) dθ˜j (t) dt
= kj aj sin(Zj (t/ε1 )) l ' (' ( + pik ai sin(Zi (t/ε1 )) − θ˜i (t) ak sin(Zk (t/ε1 )) − θ˜k (t) − ej (t) , × i,k=1
(6.20) dej (t) = hj (y(t) − f ∗ − ej (t)) dt l ' (' ( + pik ai sin(Zi (t/ε1 )) − θ˜i (t) ak sin(Zk (t/ε1 )) − θ˜k (t) − ej (t) , = hj i,k=1
j = 1, . . . , l.
It is known that for given j = 1, . . . , l, the stochastic process (χj (t), t ≥ 0) is ergodic and has invariant distribution x2 j
1 − qj2 µj (dxj ) = √ e dxj . πqj Thus by Lemma A.2, the vector-valued process [Z1 (t), Z2 (t), . . . , Zl (t)]T is also ergodic with invariant distribution µ1 (dx1 ) × · · · × µl (dxl ).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1674
To calculate the average system of system (6.19)–(6.20), we need to consider the terms (6.21) (6.22)
sin(Zj (t/ε1 )) sin(Zi (t/ε1 )) sin(Zk (t/ε1 )), sin3 (Zj (t/ε1 )),
(6.23) (6.24)
sin(Zj (t/ε1 )) sin2 (Zi (t/ε1 )), sin2 (Zj (t/ε1 )),
(6.25)
sin(Zj (t/ε1 )) sin(Zi (t/ε1 )),
i #= j, j #= k, k #= i,
i #= j, i #= j.
By the integrals (4.17), (4.10), (4.14), (4.11), and (4.13), we get the following average error system: l dθ˜jave (t) 2 + = −a2j kj (1 − e−qj ) pji θ˜iave (t), dt i=1 l l ave + + dej (t) 2 1 = hj pii a2i (1 − e−qi ) + θ˜iave θ˜kave − eave (6.27) j (t) , j = 1, . . . , l. dt 2 i=1
(6.26)
i,k=1
In the matrix form, the average error system is (6.28) (6.29) where
dθ˜ave (t) = −ΠP θ˜ave (t), dt 9 l : + deave (t) 2 ave ave ˜ =H pii ai G0 (qi )I1 − e (t) + Q(θ (t)) , dt i=1
Π=
2
a21 k1 (1 − e−q1 ) 0 .. . 0
0 ··· 2 a22 k2 (1 − e−q2 ) · · · .. .. . . 0 ···
θ˜ave (t) = [θ˜1ave (t), . . . , θ˜lave (t)]T , T (t), . . . , eave eave (t) = [eave l (t)] , 1 h1 0 · · · 0 0 h2 · · · 0 H= . .. . , .. .. . .. .
0 0 .. . 2
a2l kl (1 − e−ql )
,
0 · · · hl 1 ··· 1 T Q(θ˜ave (t)) = θ˜ave (t) ... . . . ... θ˜ave (t)I1 , 1 · · · 1 l×l 0
I1 = [1, 1, . . . , 1]T1×l .
,l T T The average error system has equilibrium (θ˜e , ee ) = (0Tl×1 , i=1 pii a2i G0 (qi )I1T ). The corresponding Jacobi matrix at this equilibrium is A B −ΠP 0 Ξ1 = (6.30) . 0 −H
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1675
Since Π and P are positive definite, all eigenvalues of the matrix ΠP are positive; i.e., the eigenvalues of the matrix −ΠP are negative. Furthermore, from the fact hi > 0, i = 1, . . . , l, it follows that the matrix Ξ1 is Hurwitz and hence the equilibrium is locally exponentially stable. Thus by Theorem A.3 in the appendix, the convergence results (6.12) and (6.14) hold. The proof is complete. To quantify the output convergence to the extremum, for any ε1 > 0, define a stopping time D E τεδ1 = inf t ≥ 0 : |Λε11 (t)| > c|Λε11 (0)|e−γt + δ .
Then by (6.12), we know that limε1 →0 τεδ1 = ∞ a.s. and (6.31)
0 0 0˜ 0 0θ(t)0 ≤ c |Λε11 (0)| e−γt + δ
∀t ≤ τεδ1 .
ˆ Denote θ(t)= [θˆ1 (t), . . . , θˆl (t)]T , a sin(η(t))= [a1 sin(η1 (t)), . . . , al sin(ηl (t))]T . Then ∗ ˜ + a sin(η(t))) for -f (θ∗ ) = 0, and we have y(t) = f (θ + θ(t)
( ' ˜ + a sin(η(t)))T Hf (θ∗ )(θ(t) ˜ + a sin(η(t))) + O |θ(t) ˜ + a sin(η(t))|3 , y(t) − f (θ∗ ) = (θ(t) where Hf is the Hessian matrix of the function f . Thus by (6.31), it holds that
(6.32)
for some positive constant C, where |a| = we have (6.33)
2
|y(t) − f (θ∗ )| ≤ O(|a|2 ) + O(δ 2 ) + C |Λε11 (0)| e−2γt
H a21 + a22 + · · · + a2l . Similarly, by (6.14),
) 2 lim P |y(t) − f (θ∗ )| ≤ O(|a|2 ) + O(δ 2 ) + C |Λε11 (0)| e−2γt
ε1 →0
∀t ≤ τεδ1
* ∀t ∈ [0, T (ε1 )] = 1,
where T (ε1 ) is a deterministic function with limε1 →0 T (ε1 ) = ∞. Figure 4 displays the simulation results with f ∗ = 1, (θ1∗ , θ2∗ ) = (0, 1), P = [ 11 12 ] in the static map (6.1), and a1 = 0.8, a2 = 0.6, k1 = 1.25, k2 = 5/3, q1 = q2 = 1, ε1 = 0.25, ε2 = 0.01 in the parameter update law (6.6)–(6.8) and initial condition θ˜1 (0) = 1, θ˜2 (0) = −1, θˆ1 (0) = −1, θˆ2 (0) = 2. 7. Conclusion. In this paper, we propose a multi-input stochastic extremum seeking algorithm to solve the problem of seeking Nash equilibria for an N-player nonoperative game. In our algorithm, each player independently employs his seeking strategy using only the value of his own payoff but without any information about the form of his payoff function and other players’ actions. Our convergence result is local, and the convergence error is in proportion to the third derivatives of the payoff functions and is dependent on the intensity of stochastic perturbation. The advantage of our stochastic algorithm over the deterministic ones lies in that there is no requirement on different frequencies of the perturbation signal for different players. As a special case of a multiplayer noncooperative game, stochastic multiparameter extremum seeking for quadratic static maps is investigated.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1676
3.5 ES value y = f(θ) extremum value f ∗
3 2.5 2 1.5 1 0
200
400 600 Time(sec)
1.5
800
1000
0.4 θ˜1 (t)
θ˜2 (t)
0.2 0
1
−0.2 −0.4
0.5
−0.6 −0.8
0 0
200
400 600 Time(sec)
800
1000
−1
0
200
400 600 Time(sec)
800
1000
Fig. 4. Stochastic extremum seeking with an OU process perturbation. Top: output and extremum values. Bottom: solutions of the error system.
Appendix. Multi-input stochastic averaging. Consider the system F dX(t) = a(X(t), Y1 (t/ε1 ), Y2 (t/ε2 ), . . . , Yl (t/εl )), dt (A.1) X(0) = x, where X(t) ∈ Rn , Yi (t) ∈ Rmi , 1 ≤ i ≤ l, are time homogeneous continuous Markov processes defined on a complete probability space (Ω, F , P ), where Ω is the sample space, F is the σ-field, and P is the probability measure. The initial condition X(0) = x is deterministic. εi , i = 1, 2, . . . , l, are some small parameters in (0, ε0 ) with fixed ε0 > 0. Let SYi ⊂ Rmi be the living space of the perturbation process (Yi (t), t ≥ 0) and note that SYi may be a proper (e.g., compact) subset of Rmi . Assume that εi =
ε1 ci
for some positive real constants ci . Denote Z1 (t) = Y1 (t), Z2 (t) = Y2 (c2 t), . . . , Zl (t) = Yl (cl t). Then (A.1) becomes (A.2)
F
dX(t) dt
= a(X(t), Z1 (t/ε1 ), Z2 (t/ε1 ), . . . , Zl (t/ε1 )), X(0) = x.
About the ergodicity of the processes (Yi (t), t ≥ 0) and (Zi (t), t ≥ 0), we have the following lemma.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1677
Lemma A.1. For i = 1, . . . , l, if the process (Yi (t), t ≥ 0) is ergodic with invariant distribution µi (dxi ) (i.e., for any x in the living space of (Yi (t), t ≥ 0), we have that /Pi (x, t, ·) − µi /var → 0 as t → ∞, where Pi (x, t, ·) is the distribution of Yi (t) when Yi (0) = x, and / · /var is the total variation norm), then the process (Zi (t), t ≥ 0) is ergodic with the same invariant distribution µi (dxi ). Proof. Since Z1 ≡ Y1 , we need only prove for i = 2, . . . , l. For any i = 2, . . . , l, denote by Qi (zi , t, ·) the distribution of Zi (t) when Zi (0) = Yi (0) = zi . Then, by the definition of Zi (t), we have that Qi (zi , t, ·) = Pi (zi , ci t, ·), and thus /Qi (zi , t, ·) − µi /var = /Pi (zi , ci t, ·) − µi /var → 0 as t → ∞. The proof is complete. Denote Z(t) = [Z1 (t)T , Z2 (t)T , . . . , Zl (t)T ]T . Then for the vector-valued process, we have the following result. Lemma A.2. If the process (Yi (t), t ≥ 0) is ergodic with invariant distribution µi (dxi ), and the processes (Y1 (t), t ≥ 0), . . . , (Yl (t), t ≥ 0) are independent, then the process (Z(t), t ≥ 0) is ergodic with the invariant distribution µ1 (dx1 ) × · · · × µl (dxl ). Proof. By the independence of {Y1 , . . . , Yl }, we can assume that the process (Z(t), t ≥ 0) lives in the product space of SY1 × · · · × SYl . Denote the distribution of Zi (t) when Zi (0) = zi , i = 1, . . . , l, by Qi (zi , t, ·) and the distribution of Z(t) when Z(0) = z = (z1 , . . . , zl ) by Q(z, t, ·). Then by the independence, we have that Q(z, t, ·) = Q1 (z1 , t, ·) × · · · × Ql (zl , t, ·). And thus by Lemma A.1, we get /Q(z, t, ·) − µ1 × · · · × µl /var
= /Q1 (z1 , t, ·) × · · · × Ql (zl , t, ·) − µ1 × µ2 × · · · × µl /var ≤ /Q1 (z1 , t, ·) × · · · × Ql (zl , t, ·) − µ1 × Q2 (z2 , t, ·) × · · · × Ql (zl , t, ·)/var +/µ1 × Q2 (z2 , t, ·) × · · · × Ql (zl , t, ·) − µ1 × µ2 × Q3 (z3 , t, ·) × · · · × Ql (zl , t, ·)/var + · · · + /µ1 × · · · × µl−1 × Ql (zl , t, ·) − µ1 × · · · × µl−1 × µl /var ≤ /Q1 (z1 , t, ·) − µ1 /var + · · · + /Ql (zl , t, ·) − µl /var → 0, t → ∞. The proof is complete. So we obtain the average system of system (A.2) as follows: ¯ dX(t) ¯ =a ¯(X(t)), dt
(A.3)
¯ 0 = x, X
where (A.4)
a ¯(x) =
4
SY1 ×···×SYl
a(x, z1 , . . . , zl )µ1 (dz1 ) × · · · × µl (dzl ).
To obtain the multi-input stochastic averaging theorem, we consider the following assumptions. Assumption A.1. The vector field a(x, y1 , y2 , . . . , yl ) is a continuous function of (x, y1 , y2 , . . . , yl ), and for any x ∈ Rn , it is a bounded function of y = [y1T , y2T , . . . , ylT ]T . Further, it satisfies the locally Lipschitz condition in x ∈ Rn uniformly in y ∈ SY1 × SY2 × · · · × SYl ; i.e., for any compact subset D ⊂ Rn , there is a constant kD such that for all x1 , x2 ∈ D and all y ∈ SY1 × SY2 × · · · × SYl , |a(x1 , y) − a(x2 , y)| ≤ kD |x1 − x2 |. Assumption A.2. The perturbation processes (Yi (t), t ≥ 0), i = 1, . . . , l, are ergodic with invariant distribution µi , respectively, and independent. By the same method as in our work [19] and [20] for the single input stochastic averaging theorem, we obtain the following multi-input averaging theorem.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
´ SHU-JUN LIU AND MIROSLAV KRSTIC
1678
Theorem A.3. Consider system (A.1) under Assumptions A.1 and A.2. If the ¯ equilibrium X(t) ≡ 0 of the average system (A.3) is locally exponentially stable, then the following statements hold: (i) the solution of system (A.1) is weakly stochastic exponentially stable under random perturbation; i.e., there exist constants r > 0, c > 0, and γ > 0 such x| < r} and any δ > 0, the that for any initial condition x ∈ {ˇ x ∈ Rn : |ˇ solution of system (A.1) satisfies D E lim inf t ≥ 0 : |X(t)| > c|x|e−γt + δ = +∞ a.s. (A.5) ε1 →0
(ii) Moreover, there exists a function T (ε1 ) : (0, ε0 ) → N such that (A.6) lim P
ε1 →0
F
sup 0≤t≤T (ε1 )
D
|X(t)| − c|x|e
D lim P |X(t)| ≤ c|x|e−γt + δ
ε1 →0
E
G
=0
with lim T (ε1 ) = ∞.
E ∀t ∈ [0, T (ε1 )] = 1
with lim T (ε1 ) = ∞.
Furthermore, (A.6) is equivalent to (A.7)
−γt
>δ
ε1 →0
ε1 →0
REFERENCES [1] E. Altman, T. Bas¸ar, and R. Srikant, Nash equilibria for combined flow control and routing in networks: Asymptotic behavior for a large number of users, IEEE Trans. Automat. Control, 47 (2002), pp. 917–930. ´, Real-Time Optimization by Extremum Seeking Control, Wiley[2] K. B. Ariyur and M. Krstic Interscience, Hoboken, NJ, 2003. [3] T. Bas¸ar,Control and game-theoretic tools for communication networks, Appl. Comput. Math., 6 (2007), pp. 104–125. [4] T. Bas¸ar and G. J. Olsder, Dynamic Noncooperative Game Theory, 2nd ed., Classics Appl. Math. 23, SIAM, Philadelphia, 1999. [5] D. Bauso, L. Giarre, and R. Pesenti, Consensus in noncooperative dynamic games: A multiretailer inventory application, IEEE Trans. Automat. Control, 53 (2008), pp. 998– 1003. [6] G. Blankenship and G. C. Papanicolaou, Stability and control of stochastic systems with wide-band noise disturbances. I, SIAM J. Appl. Math., 34 (1978), pp. 437–476. ´, K. B. Ariyur, and J. S. Lee, Extremum seeking control for discrete [7] J.-Y. Choi, M. Krstic time systems, IEEE Trans. Automat. Control, 47 (2002), pp. 318–323. [8] M. I. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical Systems, Grundlehren Math. Wiss. 260, Springer-Verlag, New York, 1984. ´, and T. Bas¸ar, Nash equilibrium seeking for games with non-quadratic [9] P. Frihauf, M. Krstic payoffs, in Proceedings of the 2010 IEEE Conference on Decision and Control, 2010, pp. 881–886. [10] R. Z. Khasminskii and G. Yin, On averaging principles: An asymptotic expansion approach, SIAM J. Math. Anal., 35 (2004), pp. 1534–1560. [11] R. King, R. Becker, G. Feuerbach, L. Henning, R. Petz, W. Nitsche, O. Lemke, and W. Neise, Adaptive flow control using slope seeking, in Proceedings of the 14th IEEE Mediterranean Conference on Control and Automation, 2006, pp. 1–6. ´, P. Frihauf, J. Krieger, and T. Bas¸ar, Nash Equilibrium Seeking with Finitely[12] M. Krstic and Infinitely-Many Players, in Proceedings of the 8th IFAC Symposium on Nonlinear Control Systems, 2010. ´ and H. H. Wang, Stability of extremum seeking feedback for general nonlinear [13] M. Krstic dynamic systems, Automatica J. IFAC, 36 (2000), pp. 595–601. [14] B. J. Kubica and A. Wozniak, An interval method for seeking the Nash equilibrium of noncooperative games, Lecture Notes in Comput. Sci. 6068, Springer-Verlag, Berlin, 2010, pp. 446–455.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES
1679
[15] H. J. Kushner and K. M. Ramachandran, Nearly optimal singular controls for wideband noise driven systems, SIAM J. Control Optim., 26 (1988), pp. 569–591. [16] S. Li and T. Bas¸ar, Distributed algorithms for the computation of noncooperative equilibria, Automatica J. IFAC, 23 (1987), pp. 523–533. [17] Y. Li, M. A. Rotea, G. T.-C. Chiu, L. G. Mongeau, and I.-S. Paek, Extremum seeking control of a tunable thermoacoustic cooler, IEEE Trans. Control Syst. Tech., 36 (2005), pp. 527–536. ´, Continuous-time stochastic averaging on the infinite interval for [18] S.-J. Liu and M. Krstic locally Lipschitz systems, SIAM J. Control Optim., 48 (2010), pp. 3589–3622. ´, Stochastic averaging in continuous time and its applications to [19] S.-J. Liu and M. Krstic extremum seeking, IEEE Trans. Automat. Control, 55 (2010), pp. 2235–2250. ´, Stochastic source seeking for nonholonomic unicycle, Automatica J. [20] S.-J. Liu and M. Krstic IFAC, 46 (2010), pp. 1443–1453. [21] L. Luo and E. Schuster, Mixing enhancement in 2D magnetohydrodynamic channel flow by extremum seeking boundary control, in Proceedings of the 2009 American Control Conference, 2009, pp. 1530–1535. [22] A. B. MacKenzie and S. B. Wicker, Game theory and the design of self-configuring, adaptive wireless networks, IEEE Commun. Mag., 39 (2001), pp. 126–131. [23] J. R. Marden, G. Arslon, and J. S. Shamma,Cooperative control and potential games, IEEE Trans. Syst. Man Cybernet.: Part B: Cybernetics, 39 (2009), pp. 1393–1407. [24] W. H. Moase, C. Manzie, and M. J. Brear, Newton-like extremum-seeking Part I: Theory, in Proceedings of the 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, 2009, pp. 3840–3844. [25] W. H. Moase, C. Manzie, and M. J. Brear, Newton-like extremum-seeking Part II: Simulation and experiments, in Proceedings of the 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, 2009, pp. 3845–3850. [26] E. Pardoux and A. Yu. Veretennikov, On the Poisson equation and diffusion approximation. I, Ann. Probab., 29 (2001), pp. 1061–1085. [27] S. S. Rao, V. B. Venkayya, and, N. S. Khot, Game theory approach for the integrated design of structures and controls, AIAA J., 26 (1988), pp. 463–469. [28] J. B. Rosen, Existence and uniqueness of equilibrium points for concave N-person games, Econometrica, 33 (1965), pp. 520–534. [29] G. Scutari, D. P. Palomar, and S. Barbarossa, The MIMO iterative waterfilling algorithm, IEEE Trans. Signal Process., 57 (2009), pp. 1917–1935. [30] E. Semsar-Kazerooni and K. Khorasani, Multi-agent team cooperation: A game theory approach, Automatica J. IFAC, 45 (2009), pp. 2205–2213. [31] J. S. Shamma and G. Arslan, Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria, IEEE Trans. Automat. Control, 53 (2005), pp. 312–327. [32] A. V. Skorokhod, Asymptotic Methods in the Theory of Stochastic Differential Equations, Transl. Math. Monogr. 78, AMS, Providence, RI, 1989. ´, K. H. Johansson, and D. M. Stipanovic ´, Distributed seeking of Nash [33] M. S. Stankovic equilibrium in mobile sensor networks, in Proceedings of the 2010 IEEE Conference on Decision and Control, 2010, pp. 5598–5603. ´ and D. M. Stipanovic ´, Stochastic extremum seeking with applications to [34] M. S. Stankovic mobile sensor networks, in Proceedings of the 2009 American Control Conference, 2009, pp. 5622–5627. ´ and D. M. Stipanovic ´, Discrete time extremum seeking by autonomous [35] M. S. Stankovic vehicles in a stochastic environment, in Proceedings of the 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, 2009, pp. 4541–4546. ´, and I. Mareels, On non-local stability properties of extremum seeking [36] Y. Tan, D. Neˇ sic control, Automatica J. IFAC, 42 (2006), pp. 889–903. [37] W. Wehner and E. Schuster, Stabilization of neoclassical tearing modes in tokamak fusion plasmas via extremum seeking, in Proceedings of the Third IEEE Multi-conference on Systems and Control (MSC 2009), 2009, pp. 853–860. [38] M. Zhu and S. Martinez, Distributed coverage games for mobile visual sensor networks, SIAM J. Control Optim., submitted.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.