Centralized Versus Decentralized Team Games of ... - Semantic Scholar

Report 4 Downloads 115 Views
1

Centralized Versus Decentralized Team Games of Distributed Stochastic Differential arXiv:1302.3416v1 [math.OC] 14 Feb 2013

Decision Systems with Noiseless Information Structures-Part II: Applications Charalambos D. Charalambous and Nasir U. Ahmed

Abstract In this second part of our two-part paper, we invoke the stochastic maximum principle, conditional Hamiltonian and the coupled backward-forward stochastic differential equations of the first part [1] to derive team optimal decentralized strategies for distributed stochastic differential systems with noiseless information structures. We present examples of such team games of nonlinear as well as linear quadratic forms. In some cases we obtain closed form expressions of the optimal decentralized strategies. Through the examples, we illustrate the effect of information signaling among the decision makers in reducing the computational complexity of optimal decentralized decision strategies.

Index Terms. Team Games Optimality, Stochastic Differential Systems, Decentralized, Stochastic Maximum Principle, Applications-Examples.

C.D. Charalambous is with the Department of Electrical and Computer Engineering, University of Cyprus, Nicosia 1678 (E-mail: [email protected]). N.U Ahmed is with the School of Engineering and Computer Science, and Department of Mathematics, University of Ottawa, Ontario, Canada, K1N 6N5 (E-mail: [email protected]).

February 15, 2013

DRAFT

2

I. I NTRODUCTION In the first part [1] of this two part paper, we have derived team and person-by-person optimality conditions for distributed stochastic differential systems with noiseless decentralized information structures. Specifically, we considered distributed (coupled) stochastic differential equations of Itˆo form driven by Brownian motions, and decision makers acting on decentralized noiseless i) nonanticipative and ii) feedback information structures, and we have shown existence of team and person-by-person optimal strategies utilizing relaxed and regular strategies. Then we applied tools from the classical theory of stochastic optimization with some variations to derive team and person-by-person optimality conditions [2]–[5]. The first important concussions drawn from [1] is that the classical theory of stochastic optimization is not limited in mathematical concepts and procedures by the centralized assumption based upon which it is developed. It is directly applicable to differential systems consisting of multiple decision makers, in which the acquisition of information and its processing is decentralized or shared among several locations, while the decision makers actions are based on different information structures. The second important conclusion drawn from [1] is that team and person-by-person optimality conditions are given by a Hamiltonian system of equations consisting of a conditional Hamiltonian, and coupled forward-backward stochastic differential equations. The work in [1] compliments the current body of knowledge on static team game theory [6]–[10], and decentralized decision making [9]–[20], and more recent work in [21]–[26], by introducing optimility conditions for general stochastic nonlinear differential systems. The main remaining challenge is to determine whether under the formulation and assumptions introduced in [1], we can derive optimal decentralized strategies for nonlinear and linear distributed stochastic differential systems, understand the computational complexity of these strategies compared to centralized strategies, and determine how this complexity can be reduced by allowing limited signaling among the different decision makers. Therefore, in this second part of the two-part investigation, we apply the optimality conditions derived in the first part to a variety of linear and nonlinear distributed stochastic differential systems with decentralized noiseless information structures to derive optimal strategies. Our investigation leads to the following conclusions. 1) When the dynamics are linear in the decision variables and nonlinear in the state variables, February 15, 2013

DRAFT

3

and the pay-off is quadratic in the decision variable and nonlinear in the state variable, the optimal decentralized strategies are given in terms of conditional expectations with respect to the information structure on which they act on; 2) When the dynamics are linear in the state and the decision variables, and the pay-off is quadratic in the state and the decision variables, then the optimal decentralized strategies are computed in closed form, much as in the classical Linear-Quadratic Theory. However, when the pay-off includes coupling between the decision makes the optimal strategy of any player is also a function of the average value of the optimal strategies of the other players. 3) The computation of the optimal strategies involves the solution of certain equations, which can be formulated and solved via fixed point methods. 4) The computation complexity of the optimal decentralized strategies can be reduced by signaling specific information among the decision makers and/or by considering certain structure for the distributed system and pay-off. The rest of the paper is organized as follows. In Section II, we introduce the distributed stochastic system with decentralized information structures and the main assumption, and we state the optimality conditions derived in [1]. In Section III, we apply to optimality conditions to several forms of team games, and we show how the optimal decentralized strategies are computed. For the case of linear differential dynamics and quadratic pay-off we obtain explicit expressions of the optimal decentralized team strategies. The paper is concluded with some comments on possible extensions of our results. II. T EAM

AND

P ERSON - BY-P ERSON O PTIMALITY C ONDITIONS

In this section we introduce the mathematical formulation of distributed stochastic systems with decentralized noiseless information structures, and the optimality conditions derived in [1].  The formulation in [1] presupposes a fixed probability space with filtration, Ω, F, {F0,t : t ∈  [0, T ]}, P satisfying the usual conditions, that is, (Ω, F, P) is complete, F0,0 contains all P-null △

sets in F. All σ−algebras are assumed complete and right continuous, that is, F0,t = F0,t+ = T △ s>t F0,s , ∀t ∈ [0, T ). We use the notation FT = {F0,t : t ∈ [0, T ]} and similarly for the rest of the filtrations.

February 15, 2013

DRAFT

4

The minimum principle in [1] is derived utilizing the following spaces. Let L2FT ([0, T ], Rn ) ⊂ L2 (Ω × [0, T ], dP × dt, Rn ) ≡ L2 ([0, T ], L2 (Ω, Rn )) denote the space of FT −adapted random processes {z(t) : t ∈ [0, T ]} such that Z E

|z(t)|2Rn dt < ∞,

[0,T ]

which is a sub-Hilbert space of L2 ([0, T ], L2 (Ω, Rn )). Similarly, let L2FT ([0, T ], L(Rm, Rn )) ⊂ L2 ([0, T ], L2 (Ω, L(Rm , Rn ))) denote the space of FT −adapted n × m matrix valued random processes {Σ(t) : t ∈ [0, T ]} such that Z Z △ 2 E |Σ(t)|L(Rm ,Rn ) dt = E [0,T ]

tr(Σ∗ (t)Σ(t))dt < ∞.

[0,T ]

A. Distributed Stochastic Differential Decision Systems A stochastic differential decision or control system is called distributed if it consists of an interconnection of at least two subsystems and decision makers, whose actions are based on decentralized information structures. The underlying assumption is that the decision makers are allowed to exchange information on their law or strategy deployed, but not their actions.   Let Ω, F, {F0,t : t ∈ [0, T ]}, P denote a fixed complete filtered probability space on which we

shall define all processes. At this state we do not specify how {F0,t : t ∈ [0, T ]} came about,

but we require that Brownian motions are adapted to this filtration.

Admissible Decision Maker Strategies The Decision Makers (DM) {ui : i ∈ ZN } take values in a closed convex subset of metric spaces △

i {(Mi , d) : i ∈ ZN }. Let GTi = {G0,t : t ∈ [0, T ]} ⊂ {F0,t : t ∈ [0, T ]} denote the information

available to DM i, ∀i ∈ ZN . The admissible set of regular strategies is defined by o n △ Uireg [0, T ] = ui ∈ L2G i ([0, T ], Rdi ) : uit ∈ Ai ⊂ Rdi , a.e.t ∈ [0, T ], P − a.s. , ∀i ∈ ZN . (1) T

Clearly, Uireg [0, T ] is a closed convex subset of L2FT ([0, T ], Rn ), for i = 1, 2, . . . , N, and ui : [0, T ] × Ω → Ai , {uit : t ∈ [0, T ]} is GTi −adapted, ∀i ∈ ZN . (N )



i An N tuple of DM strategies is by definition (u1 , u2 , . . . , uN ) ∈ Ureg [0, T ] = ×N i=1 Ureg [0, T ].

Distributed Stochastic Systems   On the probability space Ω, F, {F0,t : t ∈ [0, T ]}, P the distributed stochastic system consists of an interconnection of N subsystems, and each subsystem i has, state space Rni , action space February 15, 2013

DRAFT

5

Ai ⊂ Rdi , an exogenous noise space Rmi , and an initial state xi (0) = xi0 , identified by the following quantities. (S1)

xi (0) = xi0 : an Rni -valued Random Variable;

(S2)

{W i (t) : t ∈ [0, T ]}: an Rmi -valued standard Brownian motion which models the exogenous state noise, adapted to FT , independent of xi (0);

Each subsystem is described by coupled stochastic differential equations of Itˆo type as follows. i

i

i

dx (t) =f (t, x

(t), uit )dt

i

i

+ σ (t, x

(t), uit )dW i (t)

+

N X

f ij (t, xj (t), ujt )dt

j=1,j6=i

+

N X

σ ij (t, xj (t), ujt )dW j (t),

xi (0) = xi0 , t ∈ (0, T ], i ∈ ZN .

(2)

j=1,j6=i

Define the augmented vectors by △





W = (W 1 , W 2 , . . . , W N ) ∈ Rm , u = (u1 , u2 , . . . , uN ) ∈ Rd , x = (x1 , x2 , . . . , xN ) ∈ Rn .

The distributed system is described in compact form by dx(t) = f (t, x(t), ut)dt + σ(t, x(t), ut ) dW (t),

x(0) = x0 ,

t ∈ (0, T ],

(3)

where f : [0, T ] × Rn × A(N ) −→ Rn denotes the drift and σ : [0, T ] × Rn × A(N ) −→ L(Rm , Rn ) the diffusion coefficients.

Pay-off Functional (N )

Given a u ∈ Ureg [0, T ] and (2) we define the reward or performance criterion by Z T  1 2 N △ J(u) ≡ J(u , u , . . . , u ) = E ℓ(t, x(t), ut )dt + ϕ(x(T ) ,

(4)

0

where ℓ : [0, T ] × Rn × U(N ) −→ (−∞, ∞] denotes the running cost function and ϕ : Rn −→ (−∞, ∞], the terminal cost function. B. Team and Person-by-Person Optimality In this section we give the precise definitions of team and person-by-person optimality for regular strategies.

February 15, 2013

DRAFT

6

We consider the following information structures. (NIS): Nonanticipative Information Structures. ui is adapted to the filtration GTi ⊂ FT generated by the σ−algbebra of nonlinear nonanticipative measurable functionals of any combination of the subsystems Brownian motions {(W 1 (t), W 2 (t), . . . , W N (t)) : t ∈ [0, T ]}, ∀i ∈ ZN . This is often called open loop information, and it is the one used in classical stochastic control with centralized full information to derive the maximum principe [27]. (FIS): Feedback Information Structures. ui is adapted to the filtration GTz z i,u

i,u

generated by



the σ−algebra G0,t = σ{z i (s) : 0 ≤ s ≤ t}, t ∈ [0, T ], where the observables z i are nonlinear nonanticipative measurable functionals of any combination of the states defined by z i (t) = hi (t, x),

hi : [0, T ] × C([0, T ], Rn ) −→ Rki , i ∈ ZN .

(5)

Note that the index u emphasizes the fact that feedback strategies depend on u. The set of admissible regular feedback strategies is defined by n o △ (N ),z u ) i z i,u Ureg [0, T ] = u ∈ U(N [0, T ] : u is G − measurable, t ∈ [0, T ], i = 1, . . . , N . reg t 0,t

(6)

Problem 1. (Team Optimality) Given the pay-off functional (4), constraint (3) the N tuple of △

(N )

strategies uo = (u1,o , u2,o, . . . , uN,o) ∈ Ureg [0, T ] is called nonanticipative team optimal if it satisfies J(u1,o, u2,o , . . . , uN,o) ≤ J(u1 , u2, . . . , uN ),



) ∀u = (u1 , u2 , . . . , uN ) ∈ U(N reg [0, T ]

(7)

(N )

Any uo ∈ Ureg [0, T ] satisfying (7) is called an optimal decision strategy (or control) and the corresponding xo (·) ≡ x(·; uo (·)) (satisfying (4)) is called an optimal state process. Similarly, (N ),z u

feedback team optimal strategies are defined with respect to uo ∈ Ureg

[0, T ].

An alternative approach to handle such problems with decentralized information structures is to restrict the definition of optimality to the so-called person-by-person equilibrium. Define △ ˜ u−i ) = J(v, J(u1 , u2 , . . . , ui−1 , v, ui+1, . . . , uN )

February 15, 2013

DRAFT

7

Problem 2. (Person-by-Person Optimality) Given the pay-off functional (4), constraint (3) the △

(N )

N tuple of strategies uo = (u1,o , u2,o, . . . , uN,o) ∈ Ureg [0, T ] is called nonanticipative personby-person optimal if it satisfies ˜ i,o, u−i,o) ≤ J(u ˜ i , u−i,o), J(u

∀ui ∈ Uireg [0, T ], ∀i ∈ ZN .

(8) (N ),z u

Similarly, feedback person-by-person optimal strategies are defined with respect to uo ∈ Ureg

[0, T ].

Conditions (8) are analogous to the Nash equilibrium strategies of team games consisting of a single pay-off and N DM. The person-by-person optimal strategy states that none of the N DM with different information structures can deviate unilaterally from the optimal strategy and gain by doing so. C. Team and Person-by-Person Optimality Conditions In this section we first introduce the assumptions on {f, σ, h, ℓ, ϕ} and then we state the optimality conditions derived in [1]. Let BF∞T ([0, T ], L2 (Ω, Rn )) denote the space of FT -adapted Rn valued second order random processes endowed with the norm topology k · k defined by △

k x k2 = sup E|x(t)|2Rn . t∈[0,T ]

The main assumptions are stated below. Assumptions 1. (Main assumptions) Ui is closed and convex subset of Rdi , ∀i ∈ ZN , E|x(0)|Rn < ∞ and the maps of {f, σ, ℓ, ϕ} satisfy the following conditions. (A1) f : [0, T ] × Rn × A(N ) −→ Rn is continuous in (t, x, u) and continously differentiable with respect to x, u; (A2) σ : [0, T ] × Rn × A(N ) −→ L(Rm ; Rn ) is continuous in (t, x, u) and continously differentiable with respect to x, u; (A3) The first derivatives of {fx , σx , fu , σu } are bounded uniformly on [0, T ] × Rn × A(N ) . (A4) ℓ : [0, T ] × Rn × A(N ) −→ (−∞, ∞] is Borel measurable, continuously differentiable with respect to (x, u), ϕ : [0, T ] × Rn −→ (−∞, ∞] is continously differentiable with respect to x, ℓ(0, 0, t) is bounded, and there exist K1 , K2 > 0 such that   |ℓx (t, x, u)|Rn + |ℓu (t, x, u)|Rd ≤ K1 1 + |x|Rn + |u|Rd , |ϕx (x)|Rn ≤ K2 1 + |x|Rn . February 15, 2013

DRAFT

8

The following lemma states existence of solutions and their continuous dependence on the decision variables. Lemma 1. Suppose Assumptions 1 hold. Then for any F0,0 -measurable initial state x0 having (N )

finite second moment, and any u ∈ Ureg [0, T ], the following hold. System (3) has a unique solution x ∈ BF∞T ([0, T ], L2 (Ω, Rn )) having a continuous

(1)

modification, that is, x ∈ C([0, T ], Rn ), P−a.s, ∀i ∈ ZN . (2)

The solution of system (3) is continuously dependent on the control, in the sense that, s

as ui,α −→ ui,o in Uireg [0, T ], ∀i ∈ ZN , xα −→ xo in BF∞T ([0, T ], L2 (Ω, Rn )), ∀i ∈ ZN . (N ),z u

These statements also hold for feedback strategies u ∈ Ureg

[0, T ].

Proof: Proof is identical to that of [4]. Note that the differentiability of f, σ, ℓ with respect to u can be removed without affecting the results (by considering either needle variations when deriving the maximum principle or by deriving the maximum principle for relaxed strategies and then specializing it to regular strategies as in [1]). Assumptions 1 are used to derive optimality conditions for stochastic control problems with nonanticipative centralized strategies. However, for stochastic control problems with feedback centralized strategies additional assumptions are required to avoid certain technicalities associated with the derivation of the maximum principle. In [1] we identified these assumptions for decentralized randomized feedback strategies; the main theorems are stated below. Assumptions 2. The following holds. (E1)

The diffusion coefficients σ is restricted to the map σ : [0, T ] ×Rn −→ L(Rn , Rn ) (e.g., it is independent of u) and σ(·, ·) and σ −1 (·, ·) are bounded.

Define the σ−algebras x(0),W △

F0,t

= σ{x(0), W (s) : 0 ≤ s ≤ t}, (N ),z u

Under Assumptions 1, 2, if u ∈ Ureg i,u

u



x F0,t = σ{x(s) : 0 ≤ s ≤ t}, x(0),W

[0, T ] then F0,t

∀t ∈ [0, T ].

u

x = F0,t , ∀t ∈ [0, T ]. Thus, for any ui ∈

i,u

Uzreg [0, T ] which is GTz −adapted there exists a function φi (·) measurable to a sub-σ−algebra of V x(0),W i F0,t ⊂ F0,t such that uit (ω) = φi (t, x(0), W (· t, ω)), P−a.s.ω ∈ Ω, ∀t ∈ [0, T ], i = 1, . . . N.

February 15, 2013

DRAFT

9

Define all such adapted nonanticipative functions by n o i △ i 2 di i i,z i,u Ureg [0, T ] = u ∈ LF i ([0, T ], R ) : ut ∈ Ureg [0, T ] , ∀i ∈ ZN .

(9)

T

Next, we introduce the following additional assumptions. Assumptions 3. The following hold. (E2)

i

i,u

Uzreg [0, T ] is dense in Ureg [0, T ], ∀i ∈ ZN . (N )

Under Assumptions 1 it can be shown that J(·) is continuous in the sense of Ureg [0, T ] and by J(u) = inf u∈×N

J(u). Hence, the necessary

conditions for feedback information structures u ∈ Ureg

[0, T ] to be optimal are those for

Assumptions 3 we have inf u∈×N

i i=1 Ureg [0,T ]

z i,u i=1 Ureg [0,T ] u (N ),z

which nonanticipative information structures u ∈

(N ) Ureg [0, T ]

are optimal.

We now show that under Assumptions 1, 2 then Assumptions 3 holds. Theorem 1. Consider Problem 1 under Assumptions 1, 2. Then J(u) =

inf i

inf

J(u).

i,u

z u∈×N i=1 Ureg [0,T ]

u∈×N i=1 Ureg [0,T ]

i,u

i,u

Proof: We follow the procedure in [28]. For any ui ∈ Uzreg [0, T ] which is GTz −adapted (N )

i



i

we can define the set Ureg [0, T ], i = 1, . . . , N via (9). Let u ∈ Ureg [0, T ] = ×N i=1 Ureg [0, T ] and for k =

T , M

define uik,t

=

  

1 k

R nk

ui0

u 0 ∈ Ai

0≤t 0 then H(t, x, ψ, Q, u) is convex in (x, u).

C. Decentralized Information Structures for LQF In this section we invoke the minimum principle to compute the optimal strategies for team games of Linear-Quadratic Form. We consider decentralized strategies based on 1) nonanticipative information structures, and 2) feedback information structures. Without loss of generality we assume the distributed stochastic dynamical decision systems consists of an interconnection of two subsystems, each governed by a linear stochastic differential equation with coupling. The generalizations to an arbitrary number of interconnected subsystems will be given as a corollary. Consider the distributed dynamics described below.

February 15, 2013

DRAFT

27

Subsystem Dynamics 1: dx1 (t) =A11 (t)x1 (t)dt + B11 (t)u1t dt + G11 (t)dW 1(t) + A12 (t)x2 (t)dt + B12 (t)u2t dt,

x1 (0) = x10 , t ∈ (0, T ],

(86)

Subsystem Dynamics 2: dx2 (t) =A22 (t)x2 (t)dt + B22 (t)u2t dt + G22 (t)dW 2(t) + A21 (t)x1 (t)dt + B21 u1t dt,

x2 (0) = x20 , t ∈ (0, T ]

(87)

1 For any t ∈ [0, T ] the information structure of u1t of subsystem 1 is the σ−algebra G0,t , and 2 information structure of u2t of subsystem 2 is the σ−algebra G0,t . These information structures

are defined shortly.

Pay-off Functional:

        Z Th 1 1 1 1 i n u (t) x (t) x (t) u (t) 1 i dt  , R(t)  t  , H(t)  i + h t h J(u1 , u2) = E 2 0 u2t (t) x2 (t) x2 (t) u2t (t)     1 1 o x (T ) x (T )  , M(T )  i . + h (88) x2 (T ) x2 (T )

We assume that the initial condition x(0), the system Brownian motion {W (t) : t ∈ [0, T ]}, and

the observations Brownian motion {B 1 (t) : t ∈ [0, T ]}, and {B 2 (t) : t ∈ [0, T ]} are mutually independent and x(0) is Gaussian (E(x(0)), Cov(x(0))) = (¯ x0 , P0 ). Define the augmented variables by           1 1 1 1 1 x u ψ Q W △ △ △ △ △ , u =  , ψ =  , Q =  , W =   x= x2 u2 ψ2 Q2 W2

(89)

and matrices by           A11 A12 B B12 B B G 0 △ △ △ △ △ , B =  11  , B (1) =  11  , B (2) =  12  , G =  11 . A = A21 A22 B21 B22 B21 B22 0 G22

February 15, 2013

DRAFT

28

Let (xo (·), ψ o(·), Qo (·)) denote the solutions of the Hamiltonian system, corresponding to the optimal control uo , then dxo (t) =A(t)xo (t)dt + B(t)uot dt + G(t)dW (t),

xo (0) = x0 ,

dψ o (t) = − A∗ (t)ψ o (t)dt − H(t)xo (t)dt − VQo (t)dt + Qo (t)dW (t),

(90)

ψ o (T ) = M(T )xo (T ), (91)

VQo (t) =0,

Qo (t) = Σ(t)G(t),

ψ o (t) = Σ(t)xo (t) + β o (t),

(92)

2,o where Σ(·), β o(·) are given by (59), (60) with si , κi , b, F, E = 0. The optimal decisions {(u1,o t , ut ) :

0 ≤ t ≤ T } are obtained from (48) with σ(t, x, u) = G(t), b = 0, F = 0, E = 0, m = 0, and they are given by o n 2,0 1 , u )|G E Hu1 (t, x1,o (t), x2,o (t), ψ 1,o (t), ψ 2,o (t), Q1,o (t), Q2,o (t), u1,o t t 0,t = 0, 1 − a.s. a.e.t ∈ [0, T ], P|G0,t

(93)

n o 1,o 2,0 1,o 2,o 1,o 2,o 1,o 2,o 2 2 E Hu (t, x (t), x (t), ψ (t), ψ (t), Q (t), Q (t), ut , ut )|G0,t = 0, 2 − a.s. a.e.t ∈ [0, T ], P|G0,t

From (93), (94) the optimal decisions are n o n o 2,o 1 −1 (1),∗ o 1 −1 u1,o = −R (t)B (t)E ψ (t)|G − R (t)R (t)E u |G 12 t t 11 0,t 11 0,t , n o n o 1,o 2 −1 (2),∗ o 2 −1 u2,o = −R (t)B (t)E ψ (t)|G − R (t)R (t)E u |G 21 t t 22 0,t 22 0,t .

(94)

t ∈ [0, T ].

(95)

t ∈ [0, T ].

(96)

From the previous expressions we notice the following. (O6): The optimal strategies (95), (96) illustrate the signaling between u1 and u2 , which is facilitated by the coupling in the pay-off via R(·), and the coupling in the state dynamics of x1 and x2 via ψ o (t) = Σ(t)xo (t)+β o(t). Clearly, u1,o estimates the optimal decision of subsystem 2, u2,o, and the adjoint processes ψ o from its observations, and vice-versa. This coupling is simplified if we consider a simplified model of dynamical 2 coupling between subsystems x1 , x2 and/or nested information structures, i.e., G0,t ⊂ 1 G0,t . Moreover, if we consider no coupling through the pay-off, i.e., a diagonal R(·),

February 15, 2013

DRAFT

29

then the second right hand side terms in (95), (96) will be zero, implying that the signaling between u1,o , u2,o is done via the adjoint process ψ o . Let φ(·) be any square integrable and FT −adapted matrix-valued process or scalar-valued processes, and define its filtered and predictor versions by n o n o △ △ i i i i π (φ)(t) = E φ(t)|G0,t , π (φ)(s, t) = E φ(s)|G0,t ,

t ∈ [0, T ], s ≥ t, i = 1, 2.

For any admissible decision u and corresponding (x(·), ψ(·)) define their filter versions with i respect to G0,t for i = 1, 2, by  n o  1 i E x (t)|G 0,t △ o ≡x π i (x)(t) =  n bi (t), 2 i E x (t)|G0,t



t ∈ [0, T ], i = 1, 2,

n o  1 i E ψ (t)|G 0,t △ o  ≡ ψbi (t), π i (ψ)(t) =  n 2 i E ψ (t)|G0,t 

n o  1 i E u |G t 0,t △ o ≡u π i (u)(t) =  n bi (t), 2 i E ut |G0,t

and their predictor versions by  n o  1 i E x (s)|G 0,t △ o ≡x π i (x)(s, t) =  n bi (s, t), 2 i E x (s)|G0,t 

o  n 1 i E ψ (s)|G 0,t △ o  ≡ ψbi (s, t), π i (ψ)(s, t) =  n 2 i E ψ (s)|G0,t 

o  n 1 i E us |G0,t △ o ≡u π i (u)(s, t) =  n bi (s, t), 2 i E us |G0,t

t ∈ [0, T ], i = 1, 2,

t ∈ [0, T ], i = 1, 2,

t ∈ [0, T ], s ≥ t, i = 1, 2,

t ∈ [0, T ], s ≥ t, i = 1, 2.

t ∈ [0, T ], s ≥ t, i = 1, 2,

From (95), (96) the optimal decisions are

o n 2,o 1 −1 (1),∗ 1 o −1 |G ≡ −R (t)B (t)π (ψ )(t) − R (t)R (t)E u u1,o 12 t t 0,t , 11 11

t ∈ [0, T ],

(97)

o n 1,o 2 o −1 (2),∗ w2 −1 u2,o (t)R (t)E u (ψ )(t) − R (t)B (t)π ≡ −R |G 21 t t 22 22 0,t ,

t ∈ [0, T ].

(98)

February 15, 2013

DRAFT

30

The previous optimal decisions require the conditional estimates {(π 1 (ψ o )(t), π 2 (ψ o )(t)) : 0 ≤ t ≤ T }. These are obtained by taking conditional expectations of (74) giving i

o



i

o

π (ψ )(t) = Φ (T, t)M(T )π (x )(T, t) +

Z

T

Φ∗ (s, t)H(s)π i (xo )(s, t)ds,

t ∈ [0, T ], i = 1, 2.

t

(99) Before we proceed further we shall specify the information structures available to the DMs. △

1 = Nonanticipative Information Structures. The information structure available to u1 is G0,t △

1

W 2 σ{W 1 (s) : 0 ≤ s ≤ t} ≡ G0,t , and the information structure available to u2 is G0,t = σ{W 2 (s) : i

2

W 0 ≤ s ≤ t} ≡ G0,t . Therefore, by denoting π w (·)(·) the conditional expectation with respect to i

W G0,· , i = 1, 2, for any admissible decision, the filtered versions of x(·) based on this information

structures are given by the following stochastic differential equations [30] (Theorem 8.2). 1

1

1

dπ w (x)(t) =A(t)π w (x)(t)dt + B (1) (t)u1t dt + B (2) (t)π w (u2 )(t)dt 1

+ G11 (t)dW 1 (t), π w (x)(0) = x¯0 , 2

2

(100) 2

dπ w (x)(t) =A(t)π w (x)(t)dt + B (2) (t)u2t dt + B (1) (t)π w (u1 )(t)dt 2

+ G22 dW 2 (t), π w (x)(0) = x¯0 .

(101)

From the previous filtered versions of x(·) it is clear that subsystem 1 estimates the augmented 1

state vector and the actions of subsystem 2 based on its own observations, namely, π w (u2 )(·) and subsystem 2 estimates the augmented state vector and the actions of subsystem 1 based on 2

its own observations, namely, π w (u1 )(·).

For any admissible decision u the predicted versions of x(·) are obtained from (100) and (101) o o n o n n i i Wi Wi Wi , = E π w (x)(s)|G0,t |G0,t as follows. Utilizing the identity π w (x)(s, t) = E E x(s)|G0,s for 0 ≤ t ≤ s ≤ T then

d w1 1 1 1 π (x)(s, t) = A(s)π w (x)(s, t) + B (1) (s)π w (u1 )(s, t) + B (2) (s)π w (u2 )(s, t), ds

t < s ≤ T, (102)

1

1

π w (x)(t, t) = π w (x)(t),

February 15, 2013

t ∈ [0, T ),

(103)

DRAFT

31

d w2 2 2 2 π (x)(s, t) = A(s)π w (x)(s, t) + B (2) (s)π w (u1 )(s, t) + B (1) (s)π w (u1 )(s, t), ds

t < s ≤ T, (104)

2

2

π w (x)(t, t) = π w (x)(t),

t ∈ [0, T ).

(105) 1

Since for a given admissible policy and observation paths, {π w (x)(s, t) : 0 ≤ t ≤ s ≤ T } is 1

1

2

determined from (102) and its current value π w (xo )(t, t) = π w (x)(t), and {π w (x)(s, t) : 0 ≤ 2

2

t ≤ s ≤ T } is determined from (104), and its current value π w (x)(t, t) = π w (x)(t), then (99) can be expressed via i

i

π w (ψ o )(t) = K i (t)π w (xo )(t) + r i (t),

t ∈ [0, T ], i = 1, 2.

(106)

where K i (·), r i (·) determines the operators to the one expressed via (99), for i = 1, 2. Utilizing (106) into (97) and (98) then n o 1 1,o −1 (1),∗ 1 w1 o 1 −1 ut ≡ −R11 (t)B (t) K (t)π (x )(t) + r (t) − R11 (t)R12 (t)π w (u2,o t )(t), n o 2 −1 (2),∗ 2 w2 o 2 −1 u2,o ≡ −R (t)B (t) K (t)π (x )(t) + r (t) − R22 (t)R21 (t)π w (u1,o)(t), t 22

t ∈ [0, T ], (107) t ∈ [0, T ]. (108) △

Let {ΨK i (t, s) : 0 ≤ s ≤ t ≤ T } denote the transition operator of AK i (t) =  B (i) (t)Rii−1 (t)B (i),∗ (t)K i (t) , for i = 1, 2.

 A(t) −

Next, we determine K i (·), r i (·), i = 1, 2. Substituting the previous equations into (102), (103) and (104), (105) then w1

w1

o

w2

o

Z

s

−1 π (x )(s, t) =ΨK 1 (s, t)π (x )(t) − (τ )B (1),∗ (τ )r 1 (τ )dτ ΨK 1 (s, τ )B (1) (τ )R11 t Z s 1 −1 − (τ )R12 (τ )π w (u2,o )(τ, t)dτ ΨK 1 (s, τ )B (1) (τ )R11 Zt s 1 + (109) ΨK 1 (s, τ )B (2) (τ )π w (u2,o )(τ, t)dτ, t ≤ s ≤ T, o

t

w2

Z

s

−1 π (x )(s, t) =ΨK 2 (s, t)π (x )(t) − (τ )B (2),∗ (τ )r 2 (τ )dτ ΨK 2 (s, τ )B (2) (τ )R22 t Z s 2 −1 − (τ )R21 (τ )π w (u1,o )(τ, t)dτ ΨK 2 (s, τ )B (2) (τ )R22 Zt s 2 + (110) ΨK 2 (s, τ )B (1) (τ )π w (u1,o )(τ, t)dτ, t ≤ s ≤ T. o

t

February 15, 2013

DRAFT

32 2,o W W W W Since u1,o t is G0,t −measurable and ut is G0,t −measurable, and G0,t and G0,t are independent,   2 1 then π w (u2,o)(τ, t) = E u2τ ≡ u2,o (τ ), π w (u1,o)(τ, t) = E u1τ ≡ u1,o (τ ), 0 ≤ t ≤ τ ≤ T . 1

2

1

2

Utilizing the last observation we show in the next main theorem that the optimal DM strategies are finite dimensional (i.e., given in terms of finite number of statistics), and that each optimal strategy is linear function of the augmented state estimate based on his information, and the

average value of the other optimal strategy. The computation of the average optimal strategies can be expressed in fixed point form. Theorem 3. (Optimal decentralized strategies for LQF) Given a LQF game the optimal decisions (u1,o, u2,o ) are given n o −1 (1),∗ 1 w1 o 1 −1 u1,o ≡ −R (t)B (t) K (t)π (x )(t) + r (t) − R11 (t)R12 (t)u2,o (t), t 11

u2,o t



−1 −R22 (t)B (2),∗ (t)

n

2

w2

o

2

o

−1 K (t)π (x )(t) + r (t) − R22 (t)R21 (t)u1,o (t),

t ∈ [0, T ], (111) t ∈ [0, T ]. (112)

i

where π w (xo )(·), i = 1, 2 satisfy the linear non-homogeneous stochastic differential equations 1

1

(2) 2,o dπ w (x)(t) =A(t)π w (x)(t)dt + B (1) (t)u1,o t dt + B (t)u (t)dt 1

+ G11 (t)dW 1(t), π w (x)(0) = x¯0 , 2

(113)

2

(1) 1,o dπ w (x)(t) =A(t)π w (x)(t)dt + B (2) (t)u2,o t dt + B (t)u (t)dt 2

+ G22 dW 2(t), π w (x)(0) = x¯0 . (114)   and K i (·), r i (·), xo (·), ui,o(·) , i = 1, 2 are solutions of the ordinary differential equations (115),

(116), (117), (118), (119), (120) below.

K˙ i (t) + A∗ (t)K i (t) + K i (t)A(t) − K i (t)B (i) (t)Rii−1 (t)B (i),∗ (t)K i (t) + H(t) = 0,

t ∈ [0, T ), i = 1, 2,

K i (T ) = M(T ), i = 1, 2,

February 15, 2013

(115) (116)

DRAFT

33

n

−1 (t)B (1),∗ (t) r˙ (t) = − A∗ (t) + Φ∗ (T, t)M(T )ΨK 1 (T, t)B (1) (t)R11 Z T  o (1) −1 (1),∗ ∗ 1 + (t) r 1 (t) Φ (s, t)H(s)ΨK (s, t)ds B (t)R11 (t)B 1

t

Z

  −1 (t)R12 (t) u2,o(t), Φ∗ (s, t)H(s)ΨK 1 (s, t)ds B (2) (t) − B (1) (t)R11 t   (2) (1) −1 ∗ − Φ (T, t)M(T )ΨK 1 (T, t) B (t) − B (t)R11 (t)R12 (t) u2,o(t) t ∈ [0, T ), r 1 (T ) = 0, −

T

(117)

n −1 (t)B (2),∗ (t) r˙ 2 (t) = − A∗ (t) + Φ∗ (T, t)M(T )ΨK 2 (T, t)B (2) (t)R22 Z T  o (2) −1 (2),∗ ∗ + (t) r 2 (t) Φ (s, t)H(s)ΨK 2 (s, t)ds B (t)R22 (t)B t

Z

  −1 (t)R21 (t) u1,o(t) Φ∗ (s, t)H(s)ΨK 2 (s, t)ds B (1) (t) − B (2) (t)R22 t   −1 (t)R21 (t) u1,o(t), t ∈ [0, T ), r 2 (T ) = 0, − Φ∗ (T, t)M(T )ΨK 2 (T, t) B (1) (t) − B (2) (t)R22 −

T

(118)

x˙o (t) = A(t)xo (t) + B (1) (t)u1,o(t) + B (2) (t)u2,o (t),  

u1,o (t) u2,o (t)





 = −

I

−1 R11 (t)R12 (t)

−1 R22 (t)R21 (t)

I

xo (0) = x0 ,

(119)

o  n −1 R11 (t)B (1),∗ (t) K 1 (t)xo (t) + r 1 (t)  n o (120) . −1 (2),∗ 2 2 o R22 (t)B (t) K (t)x (t) + r (t)

−1  

W W W W Proof: Since u1,o is G0,t −measurable and u2,o t t is G0,t −measurable, and G0,t and G0,t are 1

2

1

2

independent, then     2 W1 π (u )(s, t) = E us |G0,t = E u2s ≡ u2 (s), t ≤ s ≤ T,     w2 1 1 W2 π (u )(s, t) = E us |G0,t = E u1s ≡ u1 (s), t ≤ s ≤ T. w1

February 15, 2013

2

(121) (122)

DRAFT

34

Substituting (121), (122) into (109), (110), and then (109), (110) into (99) we have Z T n o 1 w1 o ∗ ∗ π (ψ )(t) = Φ (T, t)M(T )ΨK 1 (T, t) + Φ (s, t)H(s)ΨK 1 (s, t)ds π w (xo )(t) + Φ∗ (T, t)M(T ) Z

+

t

T t

T ∗

Φ (s, t)H(s) t

− Φ∗ (T, t)M(T ) −

Z

Z

n

o

s

t

Z

  −1 (τ )R12 (τ ) u2,o(τ )dτ ds ΨK 1 (s, τ ) B (2) (τ ) − B (1) (τ )R11

T −1 (τ )B (1),∗ (τ )r 1 (τ )dτ ΨK 1 (T, τ )B (1) (τ )R11

t

T ∗

Φ (s, t)H(s)

t

w2

Z

  −1 (τ )R12 (τ ) u2,o (τ )dτ ΨK 1 (T, τ ) B (2) (τ ) − B (1) (τ )R11

Z

s −1 (τ )B (1),∗ (τ )r 1 (τ )dτ ds, ΨK 1 (s, τ )B (1) (τ )R11



π (ψ )(t) = Φ (T, t)M(T )ΨK 2 (T, t) + + Φ∗ (T, t)M(T ) Z

+

T t



Φ (s, t)H(s) t

− Φ∗ (T, t)M(T ) −

Z

T

Z

(123)

t

o 2 Φ (s, t)H(s)ΨK 2 (s, t)ds π w (xo )(t) ∗

t

s

  −1 (τ )R21 (τ ) u1,o(τ )dτ ds ΨK 2 (s, τ ) B (1) (τ ) − B (2) (τ )R22

T −1 (τ )B (2),∗ (τ )r 2 (τ )dτ ΨK 2 (T, τ )B (2) (τ )R22

t

T

Φ (s, t)H(s)

t

T

  −1 (τ )R21 (τ ) u1,o (τ )dτ ΨK 2 (T, τ ) B (1) (τ ) − B (2) (τ )R22

t

Z



Z

Z

Z

s −1 (τ )B (2),∗ (τ )r 2 (τ )dτ ds. ΨK 2 (s, τ )B (2) (τ )R22

(124)

t

Comparing (106) with the previous two equations then K i (·), i = 1, 2 are identified by the operators i



K (t) = Φ (T, t)M(T )ΨK i (T, t) +

Z

T

Φ∗ (s, t)H(s)ΨK i (s, t)ds,

t ∈ [0, T ], i = 1, 2,

t

(125) and r i (·), i = 1, 2 by the processes Z T   1 ∗ (2) (1) −1 r (t) =Φ (T, t)M(T ) ΨK 1 (T, τ ) B (τ ) − B (τ )R11 (τ )R12 (τ ) u2,o(τ )dτ +

t

Z

T ∗

Φ (s, t)H(s)

t

− Φ∗ (T, t)M(T ) −

Z

t

February 15, 2013

Z

s

t

  −1 (τ )R12 (τ ) u2,o (τ )dτ ds ΨK 1 (s, τ ) B (2) (τ ) − B (1) (τ )R11

T −1 (τ )B (1),∗ (τ )r 1 (τ )dτ ΨK 1 (T, τ )B (1) (τ )R11 t

T ∗

Z

Φ (s, t)H(s)

Z

s −1 (τ )B (1),∗ (τ )r 1 (τ )dτ ds, ΨK 1 (s, τ )B (1) (τ )R11

(126)

t

DRAFT

35

2



r (t) =Φ (T, t)M(T ) +

Z

T ∗

Φ (s, t)H(s)

t

− Φ∗ (T, t)M(T ) −

Z

(1)

ΨK 2 (T, τ ) B (τ ) − B

t

Z



T

Z

Z

s

t

(2)

−1 (τ )R22 (τ )R21 (τ )



u1,o(τ )dτ

  −1 (τ )R21 (τ ) u1,o (τ )dτ ds ΨK 2 (s, τ ) B (1) (τ ) − B (2) (τ )R22

T −1 (τ )B (2),∗ (τ )r 2 (τ )dτ ΨK 2 (T, τ )B (2) (τ )R22 t

T ∗

Φ (s, t)H(s)

t

Z

s −1 (τ )B (2),∗ (τ )r 2 (τ )dτ ds. ΨK 2 (s, τ )B (2) (τ )R22

(127)

t

Differentiating both sides of (125) the operators K i (·), i = 1, 2 satisfy the following matrix differential equations (115), (116). Differentiating both sides of (126), (127) the processes r i (·), i = 1, 2 satisfy the differential equations (117), (118). Utilizing (121), (122) we obtain the optimal strategies (111), (112). Next, we determine ui,o for i = 1, 2 from (111), (112). Define the averages n o n i o w x(t) = E x(t) = E π (x)(t) , △

i = 1, 2.

(128)

Then xo (·) satisfies the ordinary differential equation (119). Taking the expectation of both sides of (111), (112) we deduce the corresponding equations n o −1 −1 u1,o (t) = −R11 (t)B (1),∗ (t) K 1 (t)xo (t) + r 1 (t) − R11 (t)R12 (t)u2,o (t), n o −1 −1 u2,o (t) = −R22 (t)B (2),∗ (t) K 2 (t)xo (t) + r 2 (t) − R22 (t)R21 (t)u1,o (t),

t ∈ [0, T ],

(129)

t ∈ [0, T ].

(130)

The last two equations can be written in matrix form (120). This completes the derivation. Hence, the optimal strategies are computed from (111), (112), where the filter equations for   i π w (xo )(·), i = 1, 2 satisfy (113), (114), while K i (·), r i (·), ui,o(·), xo (·) , i = 1, 2 are computed off-line utilizing the ordinary differential equations (115), (116), (117), (118), (119), (120). Note that the optimal decentralized strategy u1,o given by (111) is a linear function of the state x1,o (·) and E(u2,o )(·), while the state is governed by (113) corresponding to u2,o replaced by its average value E(u2,o )(·), and similarly for u2,o . The optimal strategies can be further simplified by considering special structures of interconnected dynamics, such as, coupling of the subsystems via the DM’s, coupling through the pay-off only, diagonal matrices R = diag{R11 , R12 }, H = diag{H11 , H22 }, M = diag{M11 , M22 }, etc. Further, Theorem 3 can be generalized to an arbitrary number of interconnected system team games. In addition, one may consider feedback information structures, delayed information

February 15, 2013

DRAFT

36

structures, etc.. These generalization or simplification are stated in the next remark. Remark 4. (Generalizations and Simplifications) Generalizations. Theorem 3 is easily generalized to the following arbitrary coupled dynamics dxi (t) =Aii (t)xi (t)dt + B (i) uit dt + Gii dW i (t) +

N X

j

Aij x (t)dt +

N X

B (j) (t)ujt dt, xi (0) = xi0 , t ∈ (0, T ], i ∈ ZN

(131)

j=1,j6=i

j=1,j6=i

and DM’s information structures i

W − measurable, t ∈ [0, T ], i ∈ ZN . uit is G0,t

(132)

The optimal strategies are obvious extensions of the ones given in Theorem 3.

Simplifications. Several simpler forms can be deduced from the results of Theorem 3 by assuming any of the following R = diag{R11 , R12 }, H = diag{H11, H22 }, M = diag{M11 , M22 }. Moreover, simplified strategies can be derived by assuming nested information structures, that W W is, u1t is G0,t −measurable and u2t is G0,t 1

1 ,W 2

−measurable.

i Delay Information Structures. The optimality conditions hold for any G0,t −measurable DM

strategies ui , i = 1, . . . , N. Therefore, one can apply the necessary conditions to DM’s information structures i

W uit is G0,t−ǫ − measurable, ǫi > 0, t ∈ [0, T ], i ∈ ZN . i

(133)

or any other information structures of interest, such as, delayed sharing.

Feedback Information Structures. The previous generalizations/simplifications also apply to i,u

z feedback information structures G0,t . Specifically, to derive the corresponding results of The-

orem 3, even for the simplest scenario z 1 = x1 , z 2 = x2 , one has to compute conditional i,u

x expectations with respect to G0,t , i = 1, 2, and hence one has to invoke nonlinear filtern o i △ xi,u ing techniques to determine expressions for the filters π x (x)(t) = E x(t)|G0,t , i = 1, 2, n n o o △ △ 2 1 x2,u x1,u π x (u1 )(t) = E u1t |G0,t , π x (u2 )(t) = E u2t |G0,t , (and predictions of x(t), u1t , u2t ). It ap-

pears to us that the optimal team laws are the same as those derived for nonanticipative February 15, 2013

DRAFT

37 i

i

information structures, given by (111), (112), with π w (x)(t) replaced by π x (x)(t), i = 1, 2 and 2

1

u1,o (t), u2,o (t) replaced by π x (u1 )(t), π x (u2 )(t). These estimates (filters) may not be described in terms of linear Kalman-type equations driven by the DMs strategies governing the conditional means, whose gains are specified by the conditional error covariance equations, independently of the observations. A possible approach is to compute these conditional expectations is the identification of a sufficient statistic as in [31]–[33].

Signaling. Given the optimal decentralized strategies of Theorem 3 we can determine the amount of signaling among the DMs to reduce the computational complexity of the optimal strategies. IV. C ONCLUSIONS

AND

F UTURE WORK

In this second part of our two-part paper, we invoke the stochastic maximum principle, conditional Hamiltonian and the coupled backward-forward stochastic differential equations of the first part [1] to derive team optimal decentralized strategies for distributed stochastic differential systems with noiseless information structures. We present examples of such team games of nonlinear as well as linear quadratic forms. In some cases we obtain closed form expressions of the optimal decentralized strategies. The methodology is very general, and applicable to several types of information structures such as the ones described under Remark 4. It will be interesting to consider additional types of information structures and compute the optimal decentralized strategies in closed form, to better understand the implications of signaling and computational complexity of such strategies compared to centralized strategies. R EFERENCES [1] C. D. Charalambous and N. U. Ahmed, “Centralized versus decentralized team games of distributed stochastic differential decision systems with noiseless information structures-Part I: Theory,” Preprint, 2012, draft: October 2012. [2] J. M. Bismut, “An introductory approach to duality in optimal stochastic control,” SIAM Review, vol. 30, pp. 62–78, 1978. [3] N. U. Ahmed and K. L. Teo, Optimal Control of Distributed Parameter Systems.

Elsevier North Holland, New York,

Oxford, 1981. [4] N. U. Ahmed and C. D. Charalambous, “Stochastic minimum principle for partially observed systems subject to continuous and jump diffusion processes and driven by relaxed controls,” SIAM Journal on Control and Optimization, 2012, submitted, June 2012. [5] C. D. Charalambous and J. L. Hibey, “Minimum principle for partially observable nonlinear risk-sensitive control problems using measure-valued decompositions,” Stochastics & Stochastic Reports, pp. 247–288, 1996. February 15, 2013

DRAFT

38

[6] J. Marschak, “Elements for a theory of teams,” Management Science, vol. 1, no. 2, 1955. [7] R. Radner, “Team decision problems,” The Annals of Mathematical Statistics, vol. 33, no. 3, pp. 857–881, 1962. [8] J. Marschak and R. Radner, Economic Theory of Teams. New Haven: Yale University Pres, 1972. [9] J. Krainak, J. L. Speyer, and S. I. Marcus, “Static team problems-part I: Sufficient conditions and the exponential cost criterion,” IEEE Transactions on Automatic Control, vol. 27, no. 4, pp. 839–848, 1982. [10] P. R. Wall and J. H. van Schuppen, “A class of team problems with discrete action spaces: Optimality conditions based on multimodularity,” SIAM Journal on Control and Optimization, vol. 38, no. 3, pp. 875–892, 2000. [11] H. S. Witsenhausen, “A counter example in stochastic optimum control,” SIAM Journal on Control and Optimization, vol. 6, no. 1, pp. 131–147, 1968. [12] ——, “Separation of estimation and control for discrete time systems,” in Proceedinfs of the IEEE, 1971, pp. 1557–1566. [13] Y.-C. Ho and K.-C. Chu, “Team decision theory and information structures in optimal control problems-part i,” IEEE Transactions on Automatic Control, vol. 17, no. 1, pp. 15–22, 1972. [14] B.-Z. Kurtaran and R. Sivan, “Linear-Quadratic-Gaussian control with one-step-delay sharing pattern,” IEEE Transactions on Automatic Control, pp. 571–574, 1974. [15] N. R. Sandell and M. Athans, “Solution of some nonclassical LQG stochastic decision problems,” IEEE Transactions on Automatic Control, vol. 19, no. 2, pp. 108–116, 1974. [16] B.-Z. Kurtaran, “A concice derivation of the LQG one-step-delay sharing problem solution,” IEEE Transactions on Automatic Control, vol. 20, no. 6, pp. 808–810, 1975. [17] P. Varaiya and J. Walrand, “On delay sharing patterns,” IEEE Transactions on Automatic Control, vol. 23, no. 3, pp. 443–445, 1978. [18] Y. Ho, “Team decision theory and information structures,” Proceedings of IEEE, vol. 68, pp. 644–655, 1980. [19] A. Bagghi and T. Basar, “Teams decision theory for linear continuous-time systems,” IEEE Transactions on Automatic Control, vol. 25, no. 6, pp. 1154–1161, 1980. [20] J. Krainak, J. L. Speyer, and S. I. Marcus, “Static team problems-part II: Affine control laws, projections, algorithms, and the LEGT problem,” IEEE Transactions on Automatic Control, vol. 27, no. 4, pp. 848–859, 1982. [21] B. Bamieh and P. Voulgaris, “A convex characterization of distributed control problems in spatially invariant systems with communication constraints,” Systems and Control Letters, vol. 54, no. 6, pp. 575–583, 2005. [22] M. Aicardi, F. Davoli, and R. Minciardi, “Decentralized optimal control of markov chains with a common past information,” IEEE Transactions on Automatic Control, vol. 32, no. 11, pp. 1028–1031, 1987. [23] A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal control strategies in delayed sharing information structures,” IEEE Transactions on Automatic Control, vol. 56, no. 7, pp. 1606–1620, 2011. [24] J. H. van Schuppen, “Control of distributed stochastic systems-introduction, problems, and approaches,” in International Proceedings of the IFAC World Congress, 2011. [25] L. Lessard and S. Lall, “A state-space solution to the two-player optimal control problems,” in Proceedings of 49th Annual Allerton Conference on Communication, Control and Computing, 2011. [26] A. Mahajan, N. Martins, M. Rotkowitz, and S. Yuksel, “Information structures in optimal decentralized control,” in In Proceedings of the 51st Conference on Decision and Control, 2011. [27] J. Yong and X. Y. Zhou, Stochastic Controls, Hamiltonian Systems and HJB Equations. [28] A. Bensoussan, Lecture on Stochastic Control, Lecture Notes in Mathematics.

Springer-Verlag, 1999.

Springer-Verlag, Berlin, 1982.

[29] C. D. Charalambous and N. U. Ahmed, “Centralized versus decentralized team games of distributed stochastic differential decision systems with noiseless information structures-Part II: Applications,” Preprint, 2012, draft: October 2012.

February 15, 2013

DRAFT

39

[30] R. Liptser and A. Shiryayev, Statistics of Random Processes Vol.1. Springer-Verlag New York, 1977. [31] C. Charalambous, “Partially observable nonlinear risk-sensitive control problems: Dynamic programming and verification theorems,” IEEE Transactions on Automatic Control, vol. 42, no. 8, pp. 1130–1138, 1997. [32] C. Charalambous and R. Elliott, “Certain classes of nonlinear partially observable stochastic optimal control problems with explicit optimal control laws equivalent to LEQG/LQG problems,” IEEE Transactions on Automatic Control, vol. 42, no. 4, pp. 482–497, 1997. [33] C. Charalambous and R. J. Elliott, “Classes of nonlinear partially observable stochastic control problems with explicit optimal control laws,” SIAM Journal on Control and Optimization, vol. 36, no. 2, pp. 542–578, 1998.

February 15, 2013

DRAFT