Separable Optimal Cooperative Control Problems Jong-Han Kim1
Sanjay Lall2
Proceedings of the American Control Conference, pp. 5868–5873, 2012
Abstract
properties of each player, and this defines the allowable measurement profiles for each player. Exploiting the factorization, we express the optimal control in terms of the solutions to a series of Riccati equations whose size grows linearly with the problem size.
We characterize a class of separable problems in multiple player optimal cooperative control. We show that a simple factorization condition unifies many of the previously known computational approaches and enables us to explicitly compute the optimal solutions to a wide range of cooperative control problems.
1
2
Preliminaries
2.1
Introduction
Transfer Functions
We denote by Rp the set of real-rational proper transfer matrices, and denote by Rsp ⊂ Rp the set of real-rational strictly proper transfer matrices. We use the following packed notation for a state space realization of linear system in Rp . A B = C(sI − A)−1 B + D C D
We consider linear quadratic optimal control problems of multiple players over cooperation networks. We allow directed information flow over the cooperation graph, and assume that the dynamic interaction between the players is also directed along the cooperation graph. Recent efforts have provided explicit solutions for a wide range of nested two player problems, including the state feedback case [6], the partial output-feedback case [7], and the dynamically-decoupled output-feedback case [1]. These problems can be analyzed and solved by the unified approach presented in [1], which shows that all the above problems satisfy a simple algebraic condition. Very recently, a solution to the general output feedback two-player problem has been developed in [2], using different techniques. Contrary to the notable achievements for two player problems, the general multiple player cases have not yet been completely analyzed. For problems with more than two players, although explicit solutions to the state feedback case have been presented recently [4, 5], little is known about more complicated measurement cases. This paper extends the results obtained in [1] to more general multiple player situations, therefore more complicated measurement-feedback cases can be analyzed and explicitly solved using a unified method. We present a factorization condition on part of the open loop transfer functions, under which the n-player optimal control problem reduces to n separately solvable optimization problems. The condition is expressed in terms of the measurement conditions or the dynamical
We use the conventional notations RL∞ , RH∞ , RL2 , and RH2 to refer to specific subsets of Rp , and we define the H2 norm on RH2 as usual. These are very standard, and details can be found in [8], for example. 2.2
Networked Systems
We consider a linear dynamical system of n interacting players labelled by V = {1, 2, . . . , n}. The communication between the players is described by a binary n × n matrix A. We make the following assumptions about A. A1) Aii = 1 for all i A2) Aij = 1 and Ajk = 1 implies that Aik = 1 A3) A is lower triangular The matrix A may be regarded as the adjacency matrix for an associated directed graph, with Aij = 1 meaning that there is an edge from j to i. The assumptions mean that the matrix A may be equivalently viewed as a transitively closed directed acyclic graph with self-loops. Throughout this paper, we focus on communication constraints described by such directed acyclic transitively closed graphs. The acyclicity assumption can be eliminated without loss of generality because any cyclic subgraphs can be reduced to a single node. For k ∈ V, and n × n block matrix Q, we define the sets of parents and children
1 J.-H.
Kim is with the Department of Aeronautics and Astronautics, Stanford University, Stanford, CA.
[email protected] 2 S. Lall is with the Department of Electrical Engineering, and Department of Aeronautics and Astronautics, Stanford University, Stanford, CA.
[email protected] 3 This work was supported in part by AFOSR/AFRL contract number FA9550-08-C-0059. 4 The authors thank Laurent Lessard for insightful discussions.
k(Q) = {j ∈ V | Qkj 6= 0} , 1
k(Q) = {j ∈ V | Qjk 6= 0}
A − jωI R4) C
B is full column rank for all ω ∈ R. D
For convenience, we denote k = k(A) and k = k(A). We also denote the strict ancestors and the strict descendants by k = k\k and k = k\k, respectively. Let H be a commutative ring with a unit. The incidence algebra SH is defined to be the set of n × n matrices with the sparsity pattern of A and entries in H, so that SH = X ∈ Hn×n | Xij = 0 if Aij = 0
If (A, B, C, D) satisfies the Riccati assumptions, then there exists a unique positive semidefinite X satisfying the Riccati equation
We simply use S when the underlying ring is clear from the context. Note that the incidence algebra S is closed under addition and multiplication. It is also closed under inversion if the inverse exists in the corresponding ring.
(X, F ) = ARE(A, B, C, D)
2.3
AT X + XA + C T C − XB(DT D)−1 B T X = 0 such that A − B(DT D)−1 B T X is stable. We define the associated gain by F = −(DT D)−1 B T X and write
We now address the two particular special cases of the H2 model matching problem that we need in this paper.
Problem Definition
Proposition 1. Suppose
We will consider dynamics of n players described by
T V
x(t) ˙ = Ax(t) + Hw(t) + Bu(t) y(t) = M x(t) + N w(t)
Here x and u are the vectors of state and control actions of the n players. The exogenous noise, w affecting the n subsystems are independent and identically distributed Gaussian with unit intensity. Here y represents the measurement information and z is a vector to be regulated. The decentralization constraints we will impose are defined by the graph A, and the objective of the optimal cooperative control is to find the optimal sparse controller K ∈ S so that with the control law u = Ky minimizes the H2 norm of the closed-loop map from w to z. To make this more explicit, assume A is Hurwitz and define A H B T U (1) = C 0 D V G M N 0
B D 0
Then there exists a unique Q ∈ RH∞ which minimizes kT + U QV k2 , given by A + BF BF Q= F F Proposition 2. Suppose
T V
A U = C G M
H 0 N
B D 0
and suppose both (A, B, C, D) and (AT , M T , H T , N T ) satisfy the Riccati assumptions. Let (X, F ) = ARE(A, B, C, D) (Y, LT ) = ARE(AT , M T , H T , N T )
The controller is also required to be stabilizing [8], which requires K(I − GK)−1 ∈ RH∞ .
Then there exists a unique Q ∈ RH∞ which minimizes kT + U QV k2 , given by A + BF BF A + LM −L Q= F F I 0
Problem statement. The n-player H2 cooperative control problem over the communication graph A is minimize kT + U K(I − GK)−1 V k22 (2)
2.5
K is stabilizing 2.4
H 0 0
and suppose (A, B, C, D) satisfies the Riccati assumptions. Let (X, F ) = ARE(A, B, C, D)
z(t) = Cx(t) + Du(t)
subject to K ∈ S
A U = C G I
System Description
There is no known solution to problem (2) for general A, B, C, D, H, M , N , therefore we make some assumptions. To this end, make the following definitions.
Riccati Equations
The following material is very standard, and may be found in [8] for example. We will say that the Riccati assumptions hold for (A, B, C, D) if
Definition 3. Define the set Vsf ⊂ V as follows. An element k ∈ V lies in Vsf if ( I if i = k Mki = 0 otherwise
R1) C T D = 0 R2) DT D > 0
and Nkk = 0. A node k ∈ Vsf is called a state feedback node. Otherwise, k is called an output feedback node.
R3) (A, B) is stabilizable 2
We make the following assumptions about the system given the graph A.
which the optimal controllers can be explicitly computed by solving several separate problems. We first give conditions under which the minimization problem (2) can be explicitly solved. As stated in the following lemma, this will allow us to make a new coordinate transformation.
B1) A, B, M ∈ S B2) H and N are diagonal
Lemma 4. Suppose that there exists a strictly lower triangular and W ∈ SRH∞ that factorizes V as
B3) A is Hurwitz
V = (I + W ) diag(V )
B4) The Riccati assumptions hold for (A, B, C, D)
Then R is optimal for
B5) For all k 6∈ Vsf , the Riccati assumptions hold for T T T (AT kk , Mkk , Hkk , Nkk ). Assumption B1 means that the subsystems can only interact according to the edges of the graph A. Assumption B2 means that the external noise affecting each subsystem is uncorrelated. We make the restrictive assumption B3 to simplify the solution. Even with these assumptions, there is no known explicit minimal statespace solution to the problem in general. We will make one further assumption, to be stated in Lemma 8. 2.6
subject to
Q ∈ SRH∞
is optimal for (3).
k∈V
Hence, under the above factorization condition, we now need to solve n independent problems, each minimize subject to
kT:k + U:k Γ(k)Vkk k22 Γ(k) ∈ RH∞
(6)
for k = 1, . . . , n. Moreover, it turns out that each of these is an unstructured optimization. Note that R is uniquely determined given the Rkk .
(3)
3.1
Conditions for Factorizability
In this section, we elaborate on the factorization condition (4). The measurement profiles under which the factorization holds are explained in detail.
Submatrix Notations
Suppose Q is an n × m block matrix, I ⊂ {1, . . . , n}, and J ⊂ {1, . . . , m}. Then QIJ refers to the submatrix of Q that corresponds to the rows with block index in I and the columns with block index in J. For convenience, we use “:” to refer to “all indices”. For n × n block matrix Q, we let R = diag(Q) be the block diagonal matrix defined by Rii = Qii for all i = 1, . . . , n, and Rij = 0 for all i 6= j.
3
(5)
Lemma 5. Suppose R ∈ S. Then X kT:k + U:k Rkk Vkk k22 = kT + U R diag(V )k22
Given the optimal Q for this problem, the optimal K is given by K = Q(I + GQ)−1 . 2.7
R∈S
In fact, the minimization problem (5) reduces to n smaller problems, because the kth column of T + U R diag(V ) depends only on the kth column of R, and the H2 norm is additive. This is made explicit in the following Lemma.
Q ∈ SRH∞
kT + U QV k22
subject to
Proof. Since W is strictly lower triangular Pn−1 I + W is invertible and is given by (I + W )−1 = i=0 (−W )i ∈ SRH∞ . Then the result follows from the fact that both R and W lie in SRH∞ .
and this allows a convex parameterization of the minimization problem (2), which is then equivalent to minimize
kT + U R diag(V )k22 −1
The specific structures of A, B, H, M, N described by Assumption B1 and B2 imply that G ∈ S. Now we use the Youla parameterization of the problem, changing variables to Q = K(I − GK)−1 . Then Q ∈ S if and only if K ∈ S. In addition, recall that internal stability is equivalent to Q ∈ RH∞ . Then we have ⇔
minimize
if and only if Q = R (I + W )
Youla Parameterization
K stabilizing and K ∈ SRp
(4)
Definition 6. Define the set Vdsink = { k ∈ V | k(A) = {k} and k(M ) = {k} } An element k ∈ Vdsink is called a disturbance sink. Otherwise, k is called a disturbance nonsink. In other words, a disturbance sink is a node where its external disturbance is not transferred to anywhere by open loop dynamics. We note that Vdsink is determined by the dynamic interaction pattern of the system. The condition (4) implies that every column of V is a multiple of the diagonal element in the corresponding column. This leads to the following.
Factorization Approach
In this section, we present the key ideas. We characterize a class of the separable cooperative control problems to 3
Lemma 7. There exists strictly lower W ∈ S such that V = (I + W ) diag(V ) if and only if, for all k ∈ 6 Vdsink , there exists Ω(k) such that
3.2
Special Problems
(7)
We mention some interesting classes of separable problems. The explicit solutions can be obtained from the main results to be presented in Theorem 12.
Now we will take a more detailed look at (7). We can state explicitly a set of conditions on the original problem for which the above factorization exists.
State feedback. It follows from the above that the problem is separable and explicitly solvable when every player takes state feedback. The optimal controller for this ideal case was reported very recently [3, 5].
Vkk = Ω(k)Vkk Proof. This is straightforward from the definitions.
Lemma 8. Suppose that k ∈ Vsf for all k 6∈ Vdsink . Then there exists a strictly lower triangular W ∈ SRH∞ such that V = (I + W ) diag(V ). Define " # Akk Akk Ω(k) = (8) Mkk Mkk
Dynamically decoupled output feedback. Consider n dynamically decoupled subsystems where the players communicate along a transitively closed graph and take noisy output feedback. The system is given by
Then one such W is given by ( 0 if k ∈ Vdsink Wkk = Ω(k) otherwise
for every k ∈ V. Because V is diagonal, every node in this system is a disturbance sink regardless of the communication graph, therefore every node is allowed to take noisy output feedback and the problem is separable. In this case, G is diagonal as well. Therefore the fully decentralized control (i.e., A = I) allows convex parametrization and can be explicitly solved.
x˙ k = Akk xk + Hkk wk + Bkk uk yk = Mkk xk + Nkk wk
Proof. The following state space realizations for Vkk and Vkk can be easily obtained. " # Akk 0 Hkk Akk Hkk 0 Vkk = Vkk = Akk Akk Mkk Nkk Mkk Mkk 0 Now notice that Vkk factorizes as " #" Akk Akk Akk Vkk = Mkk Mkk I
Hkk
Open loop decoupled output feedback. We observe that V is still diagonal and every node is a disturbance sink even when the matrix B is not diagonal. This implies the problem is still separable if only the open loop dynamical behavior of each player is decoupled. More precisely, each player k ∈ V in the open loop decoupled output feedback system follows P x˙ k = Akk xk + Hkk wk + i∈k Bki ui
#
0
Hence, the factorization condition (7) of Lemma 7 is equivalent to " #" # " # Akk Akk Akk Hkk Akk Hkk = Wkk Mkk Mkk I 0 Mkk Nkk
yk = Mkk xk + Nkk wk
4
Changes of Variables
Starting with the initial optimization problem (2) , we have so far defined two changes of variables. We now make one further change of variables, which will be useful when specifying the formulae for the optimal controller.
Now under the assumption, we have for all k 6∈ Vdsink that k ∈ Vsf , and hence Mkk = I and Nkk = 0. Therefore we may choose Ω(k) as in (8) to satisfy condition (7). Lemma 7 then implies the result.
Lemma 9. Suppose G, W, K ∈ S, and W is strictly lower. Define the following sequence of changes of variables.
In other words, we impose the following measurement profile so that we may factorize and separate the synthesis problem.
Q = K(I − GK)−1
1. If k ∈ / Vdsink , we require Mkk = I and Nkk = 0. This implies that every disturbance nonsink should take state feedback.
R = Q(I + W ) P = R diag(I + GR)
2. If k ∈ Vdsink , no constraint on the measurement profile is required. This implies that a disturbance sink can take either state feedback or output feedback.
−1
(9)
Then we can invert the change of variables according to −1 K = P (I + W ) diag(I − GP ) + GP Proof. The proof is straightforward.
The main idea in this section was that under the factorization condition of V , the minimization problem (2) can be explicitly solved via the n independent problems in (6). It turns out that the factorization condition unifies many of the previously known results [1, 3, 4, 5, 6, 7].
Note that by compressing (9) we have Pkk = Rkk (I + Gkk Rkk )−1 and since P ∈ S these n transfer functions determine P . 4
5
The Optimal Controller
Lemma 11. Suppose k ∈ / Vsf . Let (X, F ) = ARE(Akk , Bkk , C:k , D:k )
Compressing (1) leads to the following state space realization Akk Hkk Bkk T:k U:k = C:k 0 D:k Vkk Gkk Mkk Nkk 0
T T T (Y, LT ) = ARE(AT kk , Mkk , Hkk , Nkk )
Then the Γ that minimizes kT:k + U:k ΓVkk k22 is given by " # Akk + LMkk −L Akk + Bkk F Bkk F Γ= I 0 F F 0 0
which can be used in describing the explicit optimal solution of the kth problem in (6). The solutions will be described separately according to the player’s measurement profile.
Further, we have Γ(I + Gkk Γ)−1
5.1
Nodes with State-Feedback
First, consider the state feedback case for player k.
(X, F ) = ARE(Akk , Bkk , C:k , D:k )
Lemma 10 and Lemma 11 can be interpreted as each player computing some type of local controls for its descendants. Every local control involves an LQR type problem whose size grows linearly with the number of its descendants. If a player takes output feedback, it runs a statistical inference-based estimator.
Then the Γ that minimizes kT:k + U:k ΓVkk k22 is given by Γ= where
Akk + Bkk F
Bkk F
F
F
V0 =
Akk
Akk
0 I
I 0
Further, let AF be given by F Akk A11 AF 12 = F Akk AF A 21 22
# V0
5.3
Main Result
Now the optimal control for the main problem in (2) is explicitly presented using the above results. Theorem 12. We make use of the definitions of Section 2, and let T, U, V, G be as in (1). Suppose that assumptions A1–A3 and B1–B5 hold, and in addition that k ∈ Vsf for every k 6∈ Vdsink . Define
0 + Bkk F Akk
which is partitioned the same way on both sides. Then we have F A22 AF 21 Γ(I + Gkk Γ)−1 = F 0 I I 0
(X k , F k ) = ARE(Akk , Bkk , C:k , D:k ) T T T (Y k , LkT ) = ARE(AT kk , Mkk , Hkk , Nkk ) k Akk 0 A11 Ak12 = + Bkk F k Akk Akk Ak21 Ak22 " # Akk Akk k Ω = Mkk Mkk k A22 Ak21 Θk = F k 0 I 0 I k L k Akk + Bkk F + 0 Mkk k Φ = Fk
Proof. Exploiting the surjective map Γ 7→ ΓV0 and applying Proposition 1 to the following LQR problem Akk Hkk Bkk T:k U:k = C:k 0 D:k V0 Vkk Gkk I 0 0 lead to the desired result.
5.2
−L 0 0
Proof. We can show that the Riccati assumptions hold T T T for (AT kk , Mkk , Hkk , Nkk ). Then the specific structures T T T of (Y 0 , L0T ) = ARE(AT kk , Mkk , Hkk , Nkk ) leads to the desired result.
Lemma 10. Suppose k ∈ Vsf . Let
"
L A + B F + Mkk kk kk 0 = F
Now let W ∈ S
Nodes with Output-Feedback
Now let us consider the second case where player k takes noisy output measurement.
Wkk
5
( 0 = Ωk
if k ∈ Vdsink otherwise
−Lk 0 0
Define q = (I +∆)−1 y, so that the optimal control action is u = P q. The key observation is that P qk = yk − i∈k ∆ki qi
and P ∈ S be ( Pkk =
Φk Θk
if k ∈ / Vsf otherwise
This implies that each player k can compute qk using its own measurement and some information from its strict ancestors. This form also allows the sequential, distributed computation of q1 , . . . , qn . Once each player k obtains the new coordinate qi for i ∈ k, the optimal control uk can be computed from P uk = i∈k Pki qi
Then the optimal K for (2) is K = P (I + W ) diag(I − GP ) + GP
−1
Proof. It follows from Lemmas 4, 5, 8, 10, and 11. The measurement information is transformed to local −1 y, and coordinates by y 7→ (I +W ) diag(I −GP )+GP then the optimal control is obtained by applying the local controls in the new coordinates. All these are performed by distributed computation by each player, which we will discuss in Section 5.5. 5.4
6
We presented a unifying condition for separable n-player optimal cooperative control problems. Under the factorization condition on the noise-to-output transfer function, we showed that the explicit optimal solution can be obtained by n separate problems. The condition was interpreted in terms of each player’s measurement profile, which consequently enabled us to characterize a class of explicitly solvable cooperative control problems. The optimal controller is obtained by linear combination of the optimal controls of the separated problems, for which the required computational complexity depends on solving Riccati equations of the size linearly growing with the number of players.
Solutions to Special Cases
For the dynamically decoupled systems described in Section 3.2, the result of Theorem 12 simplifies by W = 0. This holds irrespective of the underlying graph, provided it satisfies the required transitive closure assumptions. We further specialize this result to the case when A = I. This means that we would like to find the optimal fully decentralized control where no communication is allowed. Corollary 13. We make use of the definitions of Section 2 and let T, U, V, G be as in (1). Suppose that A = I, Vsf = ∅, and assumptions B1–B5 hold. Define
References [1] J.-H. Kim and S. Lall. A unifying condition for separable two player optimal control problems. In Proc. IEEE Conference on Decision and Control, pages 3818–3823, 2011.
(X k , F k ) = ARE(Akk , Bkk , C:k , D:k ) T T T (Y k , LkT ) = ARE(AT kk , Mkk , Hkk , Nkk )
[2] L. Lessard and S. Lall. A state-space solution to the two-player decentralized optimal control problem. In Proc. Allerton Conference on Communication, Control, and Computing, 2011.
Then the optimal diagonal K for (2) is given by is " # Akk + Bkk F k + Lk Mkk −Lk Kkk = Fk 0
[3] P. Shah. A Partial Order Approach to Decentralized Control. PhD thesis, MIT, 2011. [4] P. Shah and P. A. Parrilo. H2 -optimal decentralized control over posets: A state space solution for state-feedback. In Proc. IEEE Conference on Decision and Control, pages 6722–6727, 2010.
Proof. This is straightforward from Theorem 12. This implies the optimal control action is obtained by each player independently solving for the optimal local control assuming that the other players are deterministic. 5.5
Concluding Notes
[5] J. Swigart. Optimal Controller Synthesis for Decentralized Systems. PhD thesis, Stanford University, 2010. [6] J. Swigart and S. Lall. Optimal synthesis and explicit state-space solution for a decentralized two-player linearquadratic regulator. In Proc. IEEE Conference on Decision and Control, pages 132–137, 2010.
Distributed Run-Time Implementation
We present a short discussion on some computational aspects. Using the notation of Theorem 12, define
[7] J. Swigart and S. Lall. Optimal controller synthesis for a decentralized two-player system with partial output feedback. In Proc. American Control Conference, pages 317– 323, 2011.
∆ = (I + W ) diag(I − GP ) − (I − GP ) Then the theorem gives the optimal controller as K = P (I + ∆)−1 . Notice that because W is strictly lower triangular, and hence so is ∆. This allows the controller to be implemented in a distributed manner as follows.
[8] K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. Prentice Hall, 1995.
6