A Numerical Study of Witsenhausen's Counterexample by
Jordan Romvary B.S. in Electrical and Computer Engineering, Rutgers University, 2012 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the
ARCHIVES MASSACHUSETTS INST1TUJTE OF TECHNOLOLGY
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
JUL 0 7 2015
June 2015
@
Massachusetts Institute of Technology 2015. All rights reserved.
A uthor....................
LIBRARIES
Signature redacted Department of Oectrical Engineering-4nd Computer Science May 18, 2015
Certified by.................
...........
Signature redacted Pablo Parrilo ofessor of Electrical Engineering Thesis Supervisor
Accepted by ...........................
Signature redacted DeCKA. Kolodziejski Chair, Department Committee on graduate Studies
A Numerical Study of Witsenhausen's Counterexample by Jordan Romvary Submitted to the Department of Electrical Engineering and Computer Science on May 15, 2015, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science
Abstract In this thesis, we consider Witsenhausen's Counterexample, a two-stage control system in decentralized stochastic control. In particular, we investigate a specific homogenous integral equation that arises from the necessary first-order condition for optimality of the Stage I controller in the system. Using finite element (FE) analysis, we develop a numerical framework to study this integral equation and understand the structure of optimal controllers. We then solve the integral equation as a mathematical optimization and use our FE model to numerically compute a nonlinear controller that satisfies the necessary condition approximately at a set of quadrature points. Thesis Supervisor: Pablo Parrilo Title: Professor of Electrical Engineering
3
4
Acknowledgments I want to start off my thanking those who have helped me pursue this research as well as those who have helped me in my adjustment to life as an MIT graduate student. To my advisor Pablo Parrilo, I want to thank you for challenging me to think critically and for your patience through my countless research dead ends and MATLAB® code malfunctions.
To Alan Oppenheim and Yury
Polyanskiy, thank you for your support during my first year and for the countless conversations we had while I was searching for a research group to work in. To Terry Orlando, thank you for making time to meet with me, for listening to my concerns, and for helping me navigate my first two years
at MIT. I would also like to thank those in my life who have contributed the most to my academic success as well as my development as a young man. To my parents, I want to thank you for sacrificing having fancy vacations and new cars so that my siblings and I could attend the best schools possible and obtain a thorough and diverse education. To my siblings, Christian, Jonathan, and Victoria, I want to thank you for being my first and closest friends and for challenging me to reach the highest levels as well as for supporting me when I struggled to do so. In addition, I want to thank my girlfriend Aarthy for being there for me the past two years and for giving me the strength to persevere when I questioned my place in graduate school. And to those other countless individuals in the Ashdown Community and elsewhere who have contributed to my success and helped me reach this level at MIT, I thank you wholeheartedly. Finally, I want to acknowledge my Jesuit education at Saint Joseph's Preparatory School for driving me to always question and investigate anything and everything. In particular, I want to acknowledge the teachers and administrators who tought me to be a man for and with others and to always keep in mind that all I do should be for the betterment of society and the well being of others. Ad maiorem Dei gloriam.
5
6
Contents
Introduction
1.1.1
Formal Definition. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
15
1.1.2
Static and Dynamic LQG Teams
. . . . . . . . . . . . . . . . . .
. . . . . .
17
.
.
15
Previous W ork
. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .
. . . . . .
21
1.3
Summary of Contributions.. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
22
1.4
Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
22
.
.
.
1.2
Background
25
Mathematical Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
25
2.2
Calculus of Variations: Functional Derivatives . . . . . . . . . . . . . . .
. . . . . .
26
2.3
Numerical Integration: Quadrature Rules and Gauss-Hermite Quadrature
. . . . . .
28
2.3.1
. . . . . . . . . . . . . . . . . . . . .
. . . . . .
31
2.4
Optimal Transport Theory: The Monge-Kantorovich Problem . . . . . .
. . . . . .
32
2.5
Properties of Special Functionals
. . . . . . . . . . . . . . . . . . . . . .
. . . . . .
33
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
33
. . . . . .
34
.
.
2.1
2.5.2
MMSE Functional
.
W2 Functional
. . . . . . . . . . . . . . . . . . . . . . . . . .
.
2.5.1
.
.
.
Gauss-Hermite Quadrature
Witsenhausen's Counterexample
35
3.1
Classical Formulation . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2
Transport-Theoretic Formulation
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
The "Counterexample" Aspect .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.1
Optimal Linear Controls
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.2
1-Bit Quantization Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3
.
3
. . . . . .
Main Theorems from TTF .
2
Team Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
1.1
15
.
1
7
3.3.3
.
52
Proof of Lemma 3.3.1
. . . . . . . . . . . . . . . . . .
52
3.4.2
Proof of Lemma 3.3.3
. . . . . . . . . . . . . . . . . .
54
Finite Element Model for Witsenhausen's Counterexample
57
4.1.1
D erivation . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
57
4.1.2
D iscussion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
61
. . . . . . . . . . . . .
62
.
. . . . . . . . . . . . .
62
.
.
57
. . . . . . . . . . . . . 67
.
.
.
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
.
Finite Element Model 4.2.1
Model Parameters
4.2.2
Formulas for Computing WC Formulas & Equations
4.2.3
Justification of Rational Basis Functions . . . . . . .
. . . . . . . . . . . . . 68
4.2.4
Discussion of "Edge Effects" in MMSE Calculation
. . . . . . . . . . . . . 69
.
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
70
4.3.1
Accuracy of the FE Model . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
70
4.3.2
Optimizational Framework using FE Model . . . . .
. . . . . . . . . . . . .
74
4.3.3
Analysis of Results . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
81
. . . . . . . . . . . . .
82
.
Numerical Experiments using Finite Element Model.....
.
4.3
Necessary Condition for WC . . . . . . . . . . . . . . . . . .
.
4.2
.
3.4.1
4.1
4.4
Additional Plots for Chapter 4
4.5
Derivation of FE Model Approximations for Chapter 4 . . .
.
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 96 99
5.1
C ontributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.2
Future W ork
.
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
5
Proofs for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . .
.
4
49
.
3.4
Refuting the Conjecture . . . . . . . . . . . . . . . . .
A MATLAB@ Code
100 101
A.1
Function Argum ents
A.2
Function Specifications .......
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................
8
101 103
List of Figures 3-1
Classical Formulation of Witsenhausen's Counterexample. An initial random variable Xo is passed through the Stage I controller C1 to get U 1 . The sum Xo + U, is then passed, with additive uncertainty given by the random variable V, to the Stage II controller C2 to get U2 . We then take the difference between U2 and the output of Stage I to get X2
=
(Xo + U1 ) - U2 . The objective of the control system is to
minimize the quadratic cost k 2 U2 + X2 for some scalar k. 3-2
. . . . . . . . . . . . . . .
I) of First-Order Condition for the Linear Case.
Absolute Magnitude (IG[A]
We set k = 0.2 and o,= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
36
Solutions to (3.21) for k 2U 2
-
45
1. Each colored value represents a distinct solution
to (3.21) for any designated (k, u) pair, where we iterate through all such pairs by changing k on the abscissa axis. 3-4
Absolute Magnitude of (3.28) for the 1-Bit Quantization Case. k
4-1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 and a
=
5 and use
Q
=
46
We set
50 quadrature weights. . . . . . . . . . . . . . . . . .
50
Third-Order Sub-Basis Rational Basis Functions with A -= . These two sub-basis functions in turn make up the rational basis functions, which we use to computationally approximate the Stage I controller.
Both sub-basis functions are
zero outside the interval [0, 1) and, within this interval, are either strictly decreasing or increasing. ......... 4-2
........................................
Example of the jth Third-Order Basis Function with A aj
=
2,
and aj+1
=
64 =
-,
aj_1
=
0,
1. We note that this jth third-order basis function is zero
outside the interval [aj_, aj+l) and, within this interval, is increasing on [aj_1, aj] and decreasing on [aj, aj+ ) .. . . . . . . . ... .. . ................65 4-3
Relative Errors for Approximation of Linear Controller with A
=
0.1. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 9
71
4-4
Relative Errors for Approximation of 1-Bit Controller with a
=
1.
We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 4-5
72
Absolute Size (IIG[]({x}!_ 0 )jI 2 ) of First-Order Condition for the Linear Case using FE Model. We set k
=
0.2 and o-
5 and use the values for our FE
model from Table 4.2. We see that the overall behavior of
IIG[I]({xi}= 0 )
2
follows
that of the theoretical IG[A] I from Figure 3-2, allowing us to verify the accuracy and suitability of the FE m odel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4-6
Example of 5-Level Sinusoidal Quantizer with
75
4-7
Optimal 5-Level Sinusoidal Quantizer. The form of this Stage I controller was
= [--10, -4,0, 3,15].
. . . . .
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 4-8
78
Necessary Functional Equation (G[*]) for Optimal 5-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and GaussHermite quadrature pairs as depicted in Table 4.9.
4-9
. . . . . . . . . . . . . . . . . . .
79
Relative Errors for Approximation of Linear Controller with A = 0.5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
83
4-10 Relative Errors for Approximation of Linear Controller with A = 1. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
84
4-11 Relative Errors for Approximation of Linear Controller with A = 5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
85
4-12 Relative Errors for Approximation of 1-Bit Controller with a = 2.8. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
86
4-13 Relative Errors for Approximation of 1-Bit Controller with a = 0- = 5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
87
4-14 Relative Errors for Approximation of 1-Bit Controller with a = a 2 = 25. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . .
88
4-15 Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R
=
3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 10
89
4-16 Necessary Functional Equation (IIG[T3*]I)
for Optimal 3-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
. . . . . . . . . . . . . . .
90
4-17 Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . .
91
4-18 Necessary Functional Equation (JIG[T*]H) for Optimal 4-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
. . . . . . . . . . . . . . .
92
4-19 Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 4-20 Necessary Functional Equation (JIG[Tr*]
93
I) for Optimal 6-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
11
. . . . . . . . . . . . . . .
94
12
List of Tables 4.1
FE Model Parameter Descriptions. These variables represent the principal components of our FE model, which we use to approximate the Stage I controller in the central problem of this thesis, Witsenhausen's Counterexample. . . . . . . . . . . . .
4.2
FE Model Parameter Values for Model Verification.
67
These values are used
to evaluate the accuracy of our FE model approximation for the Stage I controller in the previously worked-out cases of linear controllers and 1-bit quantization controllers. 70 4.3
FE Model Parameter Values used to find Optimal 5-level Sinusoidal Quantizer. These parameters were used to fully define the optimization problem (4.17).
4.4
77
Optimal 5-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5
Costs and Necessary Functional Equation Norms (JIG[-i*i]1I).
These values
were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3 for n = R
=
3, n= R =4, n= R= 5, and n
=
R =6
and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . . . . . 4.6
81
Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7
89
Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
91
4.8
Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in
T ab le 4.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9
Gauss-Hermite Quadrature Nodes ({xj}R_) and Weights ({wj}R values were determined computationally for the case of (k = 0.2, a-= 5).
14
).
93
These
. . . . . . .
95
Chapter 1
Introduction We begin this thesis by introducing the central problem: Witsenhausen's Counterexample (WC). We shall build up to WC in the context of team decision theory, a variant of the more general decentralized stochastic control. We conclude with a discussion of previous works on WC as well as our contributions through this thesis. Then, we end with a detailing of the organization of the rest of this thesis.
1.1
Team Decision Theory
Team decision theory refers to the stochastic control situation in which each decision maker (DM) has access to different sets of information and acts once. Following closely the approach and illustrative examples of [1], we shall make clear some of the basic concepts regarding the interplay between information structures and notions such as signaling, estimation, and error reduction. We shall detail basic results from the theory, including those dealing with the static linear quadratic Gaussian (LQG) case. 1 We then discuss WC using the terminology and methods introduced in this section and mention how it shows that dynamic LQG teams are fundamentally different from their static counterparts.
1.1.1
Formal Definition
To begin, we define the team problem to consist of five main facets: 'By static we mean that the underlying information structure does not imply a necessary ordering of the control actions.
15
a) a q-length random vector
=
[6, ...
, (]
we let the probability distribution of
[ui, ...
b) a control action vector u =
C B representing all uncertainties in the system, where
be denoted as P( );
, UK]
E U, where ui represents the action of decision maker i
(DMa), for i = 1,..., K; c) an observation vector z = [z,..
. , ZK]
E Z, where
zi = 77i[uc()
(1.1)
represents the observation available to DMi, for i = 1,..., K. We note that {ili = 1,... , K} is also known as the information structure of the team problem.
We further note that the
observation of DMi may depend on the actions of the other agents; 2 d) a set of control laws { yi :Z
UIi = 1, ...
-÷
,K}, where Zi and Uj refer to the sets of admissible
observations and admissible actions, respectively, for DMi and where each control action of a particular DMi is selected according to ui = y I[zi]. We let F denote the set of acceptable control laws -yj for DMi. We also let -y = [71, ...
F
=
F 1 x ... x
1
K
refer to the strategy of the system of all DMs, and
denote the set of all admissible strategies;
e) a cost function L : U x
L(u,
, YK]
)
=
E -÷ R, where L(ui = Y1[T1(U-1,&)],..
.,UK
= 'YK[?K(-K,
) i,,
---
, q)- 3
(1.2)
We note that the expectation of L w.r.t. P( ) is well-defined as we have fully specified the control laws. Next, we formally define the team decision problem in its most exact, so-called strategicform. If we introduce the functional J(y)
=
E [L(u = '-y(q(u, i)), )] for some control law vector, or strategy,
F, then the strategic form of the decision problem is
y
min J() yEr
min E [L(u = -y(rq(u, )), )]. yEr
2
(1.3)
We refer to the situations in which (1.1) does not depend on the actions of the other agents, or in which the dependence of the observation on the actions of the other agents is known, as static team decision problems. On the other hand, the situations in which (1.1) does depend on the actions of the other agents are referred to as dynamic team decision problems, and such dynamic teams necessarily require some type of partial ordering of the actions of the agents on which the observation of agent i depends (so-called causality constraints). 3The notation u-j refers to a vector consisting of all the entries of u with the exception of the entry at index j.
16
We note that this optimization, as opposed to being a parameter optimization, is a functional optimization. Such optimizations are inherently more difficult, so attempting to solve for an optimal strategy via this formulation could be very computationally intensive. Instead, consider the decision maker's point of view. Specifically, for DMj, let yj represent the control laws of all the other DMs except for i. We will treat these control laws as fixed. Then, from DMi's point of view, the decision problem becomes
min J(-yi, yi) = min E [L({ui = yi(qi(u, )),i},
We observe that, once we define our information structure
ni as an appropriate Borel-measurable
function, our measurement zi becomes a well-defined random variable. about the conditional distribution of
(1.4)
)].
Therefore, we can talk
given zi and, in particular, the conditional expectation
EJ[] = Ez [Eji[-]]. Thus, we can further reduce (1.4) to its extensive form:
min J(-yi, >y) = min Ez, [Eg z, [L({ui = yj(nj(u, )y-i}, = Ez, [min Egjz [L({ui, y-J},
)]
(1.5)
.
Indeed, we can reduce it to the following person-by-person, or semi-strategic, form:
(1.6)
min Ji(ui, zi; yi), Vi,
min Egjz, [L({ui, yi},)]
E
UuiUui
whose solution can be obtained iteratively by "guessing" the right strategy -y and then checking whether the chosen strategy is fixed under the optimization specified in (1.6) for all DMs. We note, however, that (1.3) -> (1.6), but not necessarily vice versa.
1.1.2
Static and Dynamic LQG Teams
Next, we shall discuss some conceptual differences between what is possible in static and dynamic teams.
For the sake of brevity and ease of exposition, we will assume that all the uncertainties
in the problem are Gaussian random variables of arbitrary correlation. We shall further assume that the cost functions are quadratic and that all observation functions (i.e., q) are linear. These stipulations define the linear quadratic Gaussian (LQG) team problem.
17
-
.A(O, a2
)
First, let us consider the very simple static LQG team as specified below, with X
being the Gaussian initial state and V1, V2 ~ N(O, 1) being the noise measurements: 4
L
=(X+U 1 -+U2 )2 +U+IU
Z
= X+ V
Z2
=
(1.7)
X+V2
An interpretation of this problem is as follows: Both DM1 and DM 2 observe the initial state with some additive white Gaussian noise. They then act to bring the state, which was originally X, to some other state X + Ui + U 2 . Their goal is to have their actions cancel out X, while also minimizing the energy of their individual control actions. The order in which they act does not matter, and we can, without loss of generality, assume a lexicographical ordering of their actions, i.e., DM1 acts before DM 2 . In this problem, there is no possibility for cooperation between the two DMs beyond some initial coordination and planning before they receive their respective measurements. It turns out that the optimal solution to this problem is actually given by (1.6), as a result of the following theorem:
Theorem 1.1.1 (Proposition 1, [1) Let _UTQU + uTS
and z
=
Q,
S, and H be matrices such that
Q
> 0.
If L
=
H , then the unique optimal control laws are linear and can be solved from
(1.6). At this point, the problem of finding an optimal set of control laws is no more than a simple optimization problem, and the action ordering does not matter. What happens if we consider a situation in which the information available to DM 1 and DM 2 is a little uneven, for example, if we allow DM 2 to know what DM, knows and also have a separate measurement on the state of the system after DM 2 acts? Would not such a problem necessitate DM1 acting before DM 2 ? To explore the answers to these questions, consider the LQG team as specified below: + U1 + U2)2 + IU2 + IU2 =(X ~= Y2 2 Z1~ +
L
Z,~ Z2
.,X+V
(1.8)
=,
Y2
X + U1 + V2
4
.
We note here that, because we assumed that both -y and 72 are Borel-measurable mappings, their outputs, U 1 and U2 , respectively, are themselves random variables. This is important in the case of dynamic teams because, if 772 depends on the control action U1 , then U2 is not well-defined until U1 is actualized. Hence, there is a necessary partial ordering of their actions, i.e., DM 1 must act before DM 2
18
At first glance, this problem appears to be fundamentally different from that of (1.7).
It seems
to necessitate DM 1 acting first, advancing the state of the system to X 1 = X + U 1 , and then DM 2 acting upon both DMj's initial measurement Z1 and a new noisy measurement of the system, Y2 . However, if DM 2 knows the control law 71 of DM, then we can define an equivalent set of measurements for DM 2 as
1
Z2=XVl
-22
F2
Y71(1.9) Z1
= X+ V
Y2 - Y(Yi) = X
19
+ V2
which has the same form as the problem specification in (1.7). In fact, we can also get Z2 from
Z2
with knowledge of -Yi. Hence, from Theorem 1.1.1, we know that the solution of the optimal control laws to (1.8) is linear. We refer to the information structure displayed in (1.8) as that of perfect recall, that is, an information structure in which all agents who act after other agents have access to all the information those previous agents had when they made their decisions. Indeed, if we further treat each successive agent's decision as coming from the same agent (i.e., a one-person team), then we get the following theorem: Theorem 1.1.2 (Proposition 2, [11) In one-person LQG teams with perfect recall, the optimal control laws are linear. Indeed, the information structure of perfect recall for one-person LQG games (which we can interpret our last example to be) is very similar to a more general information structure known as partially nested. Such an information structure can be applied to situations in which there is a time structure involved and a partial ordering of the actions of the agents, i.e., there is some element of sequential control involved. Indeed, we can represent the information available to DMi when it is its turn to act as
Zz = Hj + DjU,
(1.10)
where Hi and Di are linear operators (i.e., matrices) that satisfy causality constraints, so that the action of any agent that acts after DMi is not factored into the observation information Zi. Such an information structure necessitates a partial ordering of the agent actions because, if qj depends on Ui, then Uj is not a well-defined random variable until Uj is actualized, meaning that DMi must act before DMj (unless DMj has some side information that can remove the dependence of rmj on the action Ui, akin to (1.9)). 19
As such, we have the following theorem:
Theorem 1.1.3 (Proposition 3, [1) In an LQG team with a partially nested information structure, the optimal control laws are linear.
Indeed, due to the assumed superiority of linear control laws, a natural conjecture would be the following:
Conjecture 1.1.4 In any LQG team, the optimal control laws are linear.
Conjecture 1.1.4 is reasonable given Theorems 1.1.2 and 1.1.3, but consider the following system for some k E R+ L
=(X + U + U2)2+ k2U2 Y AX
(1.11)
Z
=
Z2
=Y 2 AX+U+V
2
At first glance, this problem stipulation looks very similar to that of (1.8). However, the underlying information structure is NOT the partially nested information structure that was present in that problem. Indeed, while DM 1 has perfect measurement of the state X, the only information that .
DM 2 has about the underlying state X is wholly affected by the choice of control action of DM1
Therefore there is no equivalence to (1.7) as there was in the case of (1.8) because knowledge of the control law -y1 does not help in the same way. We can interpret this problem as follows: DM 1 observes the state of the system X and then performs the action U1, advancing the state of the system to Xi = X + U1. DM 2 then receives a noisy measurement of this state, Z2 = X + U1 +
V2,
and chooses a control action accordingly. The ,
goal for DM 1 is to try and cancel out X using as little energy as possible in its control action U 1 whereas DM 2 desires to cancel out X+U 1 . Now, if DM 1 cancels out X completely (i.e., X+U1 and DM 2 accordingly follows U 2
=
2
=
0),
2
0, then L = k X , whose expectation w.r.t X remains high. If,
on the other hand, DM 1 uses no energy (i.e., U1 = 0), then DM 2 must choose a U2 to negate X based on an observation with additive white gaussian noise (AWGN), which is known as the minimum mean squared error (MMSE) problem in statistic inference and has a relatively high expectation ,
cost of L w.r.t. X and V2. As such, there is an inherent trade-off between the two purposes of DM1
that of reducing the error directly through its efforts and that of signaling DM 2 by reducing the 20
uncertainty in the information received by it. 5 It turns out that the control laws to the above problem were shown by Witsenhausen in [3] to be nonlinear, refuting Conjecture 1.1.4! This system, which we expand more upon in Chapter 3 and which serves as the main focus of this thesis, is known as Witsenhausen's Counterexample (WC). Indeed, one can see the advantage of nonlinearity by considering the control laws specified by
=
o-sgn(X) - X
U2
=
o- tanh(uZ 2
where U2 is the MMSE estimator for X
(1.13)
)
Ui
+ U under AWGN. In fact, these control laws, on average,
outperform any linear control law in a certain regime of k and
-! This is because these control
laws, as opposed to linear control laws, attempt to balance the error reduction and signaling aspects of decentralized control, which, along with estimation, are the three pillars of the "tridimensional nature" of decentralized stochastic control. Further information concerning team decision theory and its widespread applications in information theory, economics, and game theory can be found in [1]. The rest of this thesis will focus on WC.
1.2
Previous Work
Since its initial publication in 1968, the simple two-stage control system in (1.11) that Witsenhausen analyzed to disprove Conjecture 1.1.4 has attracted much interest in the control theory, information theory, and computer science research communities. Most of the research on WC has focused on two main thrusts. The first involves finding optimal controllers through the use of step functions (which we refer to as quantization schemes) for the canonical case of ir(k2 , .2 ) k=.2,-=5 (see Chapter 3). Beginning with Mitter and Sahai's demonstration that quantization schemes will achieve arbitrarily low costs in the regime of very small k (with o.2 k 2 = 5
1)
[4], a lot of work has been conducted on
A problem formulation from [2] that makes this concept of signaling more apparent is as follows: SL
Y1
Z{ Z2
+ U1+U2)2+ _IU12
=(X =Y2
X
(1.12)
=U1
If DM 1 and DM 2 pursue the control laws U1 = -yi(Yi) = eX and U2 = 7 2 (Y2) = E - 1 U1 , respectively, then they can, on average, almost cancel out the expected cost J(-y, ), if E -* 0. However, if DM 1 wished to communicate some information to DM 2 , then it could do so through a little additional cost. That is, instead of letting E -4 0, DM 1 could choose some Eo > 0 and transmit some information to DM 2 about the size of X via its control law yi. This is known as the "transparency of information," which is further explained in [2].
21
efficiently searching the feasible set of (2n + 1)-bit quantization schemes, most successfully through hierarchical search methods [5] and potential games [6]. However, basic computational difficulties of such schemes have been discussed in [7] and [8], the latter of which showed that a discrete version of the WC was NP complete. The second thrust uses information theory to develop upper and lower bounds on the cost for optimal controllers. For example, considering a finite-dimensional analog to WC, Grover and Sahai [9, 10] were able to develop control strategies that approximate optimal controllers within a bounded interval. Moreover, Choudhuri and Mitra considered implicit discrete memoryless channels in the case of an asymptotic version of WC in [11] to great effect. In addition, Wu and Verdni
recently used optimal transport theory to reformulate WC from
a functional optimization to a probability measure optimization [12]. Using this formulation, Wu and Verdfi were able to discern more analytical properties of optimal controllers and show that the necessary condition first introduced by Witsenhausen in his original paper had to hold everywhere. We touch more on this in Chapter 3.
1.3
Summary of Contributions
Our main contribution in this thesis is the design and implementation of a finite element model to be used in the study of the necessary condition of WC system. As we shall discuss later, the necessary condition presents as a homogenous integral equation of the Stage I controller,
f.
In
particular, we investigate the case in which all the native random variables are Gaussian in nature and the system specifications, k and o-, satisfy k 2 U2
=
1.
We verify the accuracy of our computational model using analytically derived formulas for the simple linear and 1-bit quantization controllers. We then develop a mathematical optimization framework to solve the necessary condition approximately at certain points of interest. These points of interest turn out be Gauss-Hermite quadrature abscissas. Using a simple family of five-parameter controllers, we then demonstrate successfully that we can find controllers within this family that approximately satisfy the necessary condition.
1.4
Organization of the Thesis
The thesis will proceed as follows: Chapter 2 details some of the general mathematical necessaries and background that will be required for sufficient understanding of the content of this thesis. 22
Chapter 3 introduces WC and discusses the existence of optimal solutions and the non-optimality of linear controllers. Chapter 4 includes a full derivation of the first-order condition for optimality and also introduces our finite element model and the results of numerical experiments obtained using the model. Ultimately, Chapter 5 summarizes our main contributions and suggests future avenues for research that build upon the results of this thesis.
23
24
Chapter 2
Background In this chapter, we will detail and define some of the mathematical terminology and notation we will be using. We will also discuss some of the underlying concepts that serve as the basis for our finite element model and the derivation of the necessary variational equation for WC (Chapter 4).
2.1
Mathematical Notation
To begin, let us discuss the notation and mathematical concepts we will be using in this thesis. We denote the set of real numbers by R, and use Rd to refer to the d-dimensional vector space defined in the usual way with R representing the underlying scalar field. Z is the set of integers, and N is the set of non-negative integers (including 0). We denote vectors, those belonging to Rd for some nonzero d E N as well as those belonging to the infinitely long extension of Rd, by non-italicized lower-case letters with "hats" (e.g., A) or by an explicitly defined Greek letter with a "hat" (e.g., C.). Constants are represented by italicized letters and are always assumed to be members of the real field, R, unless otherwise stated. In addition, limits are assumed to be defined as in the usual sense (so-called strong limits), and we denote elements of a set or a vector by italicized lower-case letters with subscripts (e.g., wi or aj). We denote sets themselves by an explicit representation like {ai}f 0 , or simply as {ai} when the bounds of the set are understood from the context. Underlying functions are assumed to be functions of R into R and are always denoted by lower-case letters, unless otherwise stated. Sets of functions or vectors are denoted by non-italicized capital letters and will be defined as they are introduced. Also, derivatives of functions are defined with ax -2- representing a partial derivative and d3a 44 regular"
derivative (in the sense of single-variable calculus). We denote derivatives of functions 25
explicitly using these aforementioned representations, though we at times use upper ticks (like ') when the underlying variable we are differentiating w.r.t. is understood. Furthermore, we represent functionals between function spaces using the regular functional analysis notation.
For example, if G : A -- B is a mapping between function spaces A and
G
B, and if fi-g, then we write g = G[f].
Also, inner products are defined similarly for function
spaces as well as vector spaces. For example, the inner product between two vectors, a and b, is defined as
(h, b) = E aibi, and the inner product between two functions, f and g, is defined as
(f, g) = f f(x)g(x)dx, unless otherwise indicated. 1 Also, we are assuming an underlying measurable space of (R, B(R)), where R is our observation set, and B(R) is the Borel algebra comprising all real Borel sets.
From this measurable space,
we define our probability laws for all of our independent random variables in our control system. Random variables themselves are denoted as italicized capital letters (e.g., X). In addition, we denote by P(B(R)) the set of all Borel-measurablefunctions on this space, i.e., F(B(R)) is the set of all real-valued real functions
f
: R
-÷
R such that, if E E B(R), then f- 1 (E) E B(R).
Finally, for arguments that utilize the term weak, we mean this in the functional analysis sense as follows: Consider the underlying metric space (R, I-) and the corresponding topology that is induced by it, (R, r1 . 1). We note that, because this metric space is completely separable (i.e., contains a dense countable subset), we have that the Levy-Prokhorov metric metrizes the notion of weak convergence in measures (or, more simply in the case of probability measures, convergence in distribution). Hence, we say that a sequence of probability measures {P,} converges weakly to a probability measure P (written P, !4 P) iff P, d+ P. Thus, for example, when we say that a particular function
#
: P (B (R))
-+
R is lower semi-continuous, we mean that
#(P)
B (4.20)
where conn(x) =
x
cos
-
ir
+
J
and B
min{ac- I ac- > x, x E {x2 } _ 1 , a E N},
with
nB and
2B n - I We can then solve (4.17) numerically for various values of n
-
R.1 7 In Figures 4-15, 4-16, 4-17,
4-18, 4-19, and 4-20 as well as in Tables 4.6, 4.7, and 4.8, we demonstrate optimal n-level sinusoidal 1
7
We set
n = R so as to allow for greater degrees of freedom in solving (4.17). 80
Optimal Costs & Norms n
=R 3 4 5 6
||G[T*]||2 1.771340660161439e-07 1.116564623767737e-07 1.474794586097925e-07 1.645876831841833e-06
J [_*]I 0.961825561638322 0.961581394710436 0.963530809260363 2.075764121916171
Table 4.5: Costs and Necessary Functional Equation Norms (IIG[-r*]J). These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3 for n = R = 3, n = R = 4, n = R = 5, and n = R = 6 and Gauss-Hermite quadrature pairs as depicted in Table 4.9. quantizers for the cases of n = values of |IGL,*]H
R
3, n = R = 4, and n = R = 6.18 Table 4.5 shows their respective
and J[f*]. We notice that, just like in the case of n = R = 5, G[f *] ~~0 while the
overall costs J[',*] are still above the optimal linear cost J[A*] = 0.96, but, as we discussed earlier, this is not unexpected nor extremely consequential given the nature of the randomness in WC. Also, it should be noted that our obtained optimal functional solution in the n = R = 6 case (as depicted in Figure 4-19) is not non-decreasing, a property that optimal controllers in WC must possess [3]. This lack of non-decreasing behavior is reflected in the overall cost of this 6-level sinusoidal quantizer being much greater than the benchmark linear case: J[f*]
4.3.3
=
2.075764121916171 > 0.96 = J[A*].
Analysis of Results
From these results, we can state concretely a couple of positive conclusions.
For one, we have
demonstrated that our FE model can be used not only in a verification capacity but also as a tool to make meaningful progress on WC through optimization frameworks and feasible set definitions. For another, we have discovered an additional function besides linear functions that can approximately satisfy the necessary functional equation at three, four, five and six important quadrature points. In fact, we strongly believe that, if we increase the number of quadrature points we optimize over and/or the number of weights in our feasible set, we can obtain even better approximations for any number of quadrature points! In addition, we can state some more observations about solutions to the variational equation using Figure 4-8. We note that our minimized solutions (with the exception of - *) naturally tend towards odd symmetry, which agrees with our intuition since all of the independent random variables in 7r(k2 , ,2 ) are symmetric. Indeed, we also notice that our optimal controllers are generally increasing, a characteristic that agrees with the result of Lemma 7 from Witsenhausen [3], which 18 Figures 4-15, 4-16, 4-17, 4-18, 4-19, and 4-20 and Tables 4.6, 4.7 and 4.8, are located in Section 4.4.
81
states that optimal controllers must be non-decreasing. Finally, for the specific case of n = R = 5, we note how -N,2,
T5,3,
and
T5,4
are all closely grouped around the origin. This makes sense because
the origin is where most of the underlying probability of X 0 is concentrated, and that, by sending more of this probability to this region, our controller is making the job of our MMSE estimator in Stage II easier (meaning less estimation error, which, in turn, means less overall cost). This latter behavior is also mirrored in the optimized controllers for n = R = 3 and n = R = 4. All of these observations agree with the rationale employed in previous attempts in solving WC by optimizing over (2n + 1)-quantization schemes. Thus, they provide great credence to both the veracity of our model as well as its potential as a tool for investigating WC in further detail in other research contexts.
4.4
Additional Plots for Chapter 4
82
1 x 10-6
x
0.9
1
0.8 0.7 0.6 0.5
0.8' 0.6
0.41
0.3
0.4
0.2 0.1
0.2
910 -8 6 -4 -2
10-6
x
Qi -8 -6 -4 -2
2 4- 6 8 10
(a) Relative Error for
f:
ER[f,
f](x)
x
--Z 4
6
9
0
(b) Relative Error of No: ER[No, No](x)
1.5 x 10-7
1
0.5
10-8
6 -4 -2
0 2 x
4
6
8 10
(c) Relative Error of g: ER[g,g](x)
Figure 4-9: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^.
83
=
0.5. We
1
4.5 x 10-3 4
-10-6
0.9 0.8
3.5
0.7
31 2.5
0.6
0.5
2
0.4 0.3
1.5
0.2 0.5
0.1
0-8
9
6 -4---- --2-4 6 8 10 x
(a) Relative Error for
f:
ER[f,
10-8 -G -4 -2-0 x
4-
IO
(b) Relative Error of No: ER[No,No](x)
f](X)
1.5 x 10-3
0.5
t-S4
2 2-4
x
2 4 6 8 Ii0
(c) Relative Error of g: ER[g, ](x)
Figure 4-10: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^.
84
=
1. We
i x-10-6 0.9
0.7 0.6
0.8
0.5
0.7 0.6 0.5
- 4 6 8 10
0.4
0.4
0.3
0.3
0.2
0.2 0.1
0.1 01 -8 -
-4 -2
x
QG -8 -(15 -4 -2
4 6 8 10
2
(a) Relative Error for f:
2
(b) Relative Error of No: ER[No, No](x)
ER[fj](X)
0.35 0.3 0.25 0.2 0.15 0.1
0.05 910-8 -6-4 -2
x
2
4
810
(c) Relative Error of g: ER[g,g](x)
Figure 4-11: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^.
85
5.
We
1.4 X 10-14
1.6 x 10-16 1.4 1.2
1.2 1
1
0.8 0.6
0.8 0.6 0.4 0.2
91048
0.4
0.2 -4-2
x
6
2
(a) Relative Error for
f:
S
0
8
- -4 -2 x
2- 4
0
(b) Relative Error for No: ER[No, No](x)
ER[f, f](x)
10-16
3.5
3, 2.5 2
1.5' 0.5 0
9
-4 20
x
810
(c) Relative Error of g: ER[g, ](x)
Figure 4-12: Relative Errors for Approximation of 1-Bit Controller with a denote functions approximated using our FE model with a hat, ^.
86
=
2.8.
We
1.4 x 10-14
0.81 0.6 0.4
1.2
0.2
0.8
01 -0.2 -0.4 -0.6 -0.8
0.6 0.4 0.2 -i
-0 o8 -6 -4 -2 0x 2 4 6 8 10
-
-
- -2
x
4
6
0
(b) Relative Error for N0 : ER[No, No](x)
(a) Relative Error for f: ER[f, f](x)
4;-x 10-16
3.5 3 2.5 2 1.5 1
0.5 910-8-6 -4 -2
0 2 4 G 8
0
(c) Relative Error of g: ER[g, g](x)
a = 5. We Figure 4-13: Relative Errors for Approximation of 1-Bit Controller with a = denote functions approximated using our FE model with a hat, ^.
87
1.2 X 10-14
0.8 0.6 0.4
0.8
0.2 0
0.6
-0.2 -0.4
0.4
-0.6:
0.2
-0.8
-8 -6 -4 2 0 2 x
(a) Relative Error for
Qi
46810 f:
ER[f,
f](x)
6 -8 -4 -2 U x
Z
I
4
Z5
I IU
(b) Relative Error for No: ER[No, No](x)
10-14
1 U
0 .8 0 .7 0 .6 0 .5 0 .4 0 .3: 0 .2 0 .1 0
0--
x
41
110
(c) Relative Error of g: ER[g, ](x)
Figure 4-14: Relative Errors for Approximation of 1-Bit Controller with a = 0 2 = 25. We denote functions approximated using our FE model with a hat, ^.
88
Optimal Weights t3*,2
-0.370552234771981 0.000071861774708
3,3
0.370689948937737
t3*,
Table 4.6: Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
0.4
0.3 0.2 0.1
0 -0.1
-0.3
-
-0.2
-0.4 -20
-15
-10
-5
0
5
10
15
20
Figure 4-15: Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
89
0.1
II
I
I
I
-4
-2
0
2
I
I
-
0.08
I
0.06 -
0.04
-
0.02
0 -
-0.02
-0.06
-
-0.08
-
-
-0.04
'
-0.1 -10
-8
-6
4
6
8
10
Figure 4-16: Necessary Functional Equation (IIG[r3*1]) for Optimal 3-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
90
Optimal Weights
f4*1
-0.498521259836172
t 4 ,2
-0.155696146588598 0.155941726138290 0.498769790186308
tZ*3
4* 4
Table 4.7: Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
0.5 0.4
0.3 0.2 0.1
0 -0.1 -0.2
-0.3 -0.4
-0.5
-20
-15
-10
-5
0
5
10
15
20
Figure 4-17: Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
91
0.8
0.6 0.4 0.2
0 -0.2 -0.4
-0.6 -0.8 -20
-15
-10
-5
0
5
10
15
20
Figure 4-18: Necessary Functional Equation (flG[T] I|) for Optimal 4-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
92
Optimal Weights -4.67326888267048 t*h2 6.617177471430276 -3.621888183288994 T 6 3 t6, 42.585931219917199 T6 ,5 -2.669563529874990 t6T,1
6 ,6
6.475537965147624
Table 4.8: Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
8 6 4 2
0 -2
-4
-6 -20
-15
-10
-5
0
5
10
15
20
Figure 4-19: Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
93
2.5
2 1.5
1 -0.5 0 -0.5
'
-1 -20
-15
-10
-5
0
5
10
15
20
Figure 4-20: Necessary Functional Equation (JIG[r*]|I) for Optimal 6-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
94
Gauss-Hermite Quadrature Values
R
{xi}_____
3
-8.660254037844387 0 8.660254037844387 _
-11.672071091694885 4
5
6
1
1
1
[
F0.045875854768068 1 0.454124145231932
-3.709818921513630
3.709818921513630 L 11.672071091694885 _ -14.284850069364028 -6.778130899871329 0 6.778130899871329 14.284850069364028 _ -16.621287167760595 -9.445879388768555 -3.083532950962971 3.083532950962971 9.445879388768555 16.621287167760595
{Wi_1
~ 0.166666666666667 0.666666666666666 0.166666666666667 _
_
0.454124145231932
L 0.045875854768068 j
1
0.011257411327721 0.222075922005613 0.533333333333332 0.222075922005613 _ 0.011257411327721 j 0.002555784402056 0.088615746041914 0.408828469556030 0.408828469556030 0.088615746041914 _ 0.002555784402056 j
Table 4.9: Gauss-Hermite Quadrature Nodes ({xi}jL 1 ) and Weights ({wi} were determined computationally for the case of (k = 0.2, o = 5).
95
). These values
4.5
Derivation of FE Model Approximations for Chapter 4
In this section, we will detail our derivations of the FE model formulas and equations that are derived from our approximated controller,
f [cZ].
Proof a)
No If [ ] ] (y) =
J:
- f [c.](x))dF(x)
'/(y
~0 iq(y - f[P1(Yi))
No [c(y).
b) Ni[f []](y) =
-
0f [](x)(y
f [c] (x))dF(x)
~ 6if[](i)#$(y - f[Z](zi)) i[f[c]](y).
c)
g I[f
N1 [f []](y) No[f []](y)
]I(Y)
fq]
JJWX\No[f[c]](y)
f[](x)
~
-
f P]()
(No [c'](y)
q!4y
-
f P1](X))
N(y - f[PJ(i)) No [ o](y) (
Q[
(y
d)
'[f []](y) = -y + g[f []](y) 96
dF(x) dF(x)
-y + g[](y) A?7[W](y).
e)
G[ f []I(x) = 2k 2 (fp](X) +
_ X)
[(y
(y - f[ c2](x))r7f[c ]](y)
[C](X))2 + n[f []](y)(y
-
f[](X))
-
-
21 dy
2k 2 (f[cj(x) - x) q(y - f[P](x))7]
2k 2 (f[C](x)
-
x) +
(y) [(y
-
j
+
f[C0](X))2 + T[](y)(y
Win[C](X, + f [p](x)) [x2 + x7
f[p](x)) - 2] dy
-
](x, + f [](x)) -21
f) k 2 E [(f
(Xo)
-
f=(f](X)
Xo)2] +JE [f 2p](Xo)] X)2 dF(x)
Q
Eiif 2 ](ii)
+
(f [CQl(y ) - i)
fi23f2 K](y,)
100 (
j0 0 2[ ](y)#(y -
f
Q
f[ ](x))dy) dF(x)dy
f [](x))) dF(x)
Q
Ewi (fj002[C]
-
?_,f 2
)+
" [] (xi
+
_
f2 [;]y,
+=
0i (fp](y) - i
f[](x))dF(x)) dy
Q
Q
Q
-
Q
+
7b
(y
jg02[c;2](y)
i 1
_f P1 V)
Q
k2
92 [f [C]](y)No[f [C]](y)dy
(x,
+
i1
= k2
j
Q 7bi
-
k2
E[g2
-
f 2[](x)dF(x) -
+-
(
f [c](x)) dF(x))
i=1
i=1
Q =2
7
(fp](y,)
)+
-
?_D,(f (] (;f) i=1
A i
p]
i=1
i=1
Q1 = k2
?bf2
-
i)
+ L 7_,if 2p](Z,)
_
[C )]. 97
(j=1 WIj2 p] (X, + f [p](LF))
i=1 j=1
i=1
Lwi[Q Lq&gP(xi + f [Q(j)))
-
Jf [K]]
98
Chapter 5
Conclusion
5.1
Contributions
In this thesis, we examined the necessary condition for optimality in WC. This necessary condition presents itself as a homogeneous integral equation in terms of the Stage I controller, be solved exactly by any optimal
f.
f,
and must
We started by presenting WC in full generality in two forms,
the classical version, originally put forth by Witsenhausen in [31, and a new variant using optimal transport theory, put forth by Wu and Verdu in [121. We then proceeded to derive the necessary condition in full using the first-order condition for optimizing functionals from calculus of variations. Having done this, we developed a computational model to investigate solutions to this necessary condition in the case when all the native random variables are Gaussian, a system specification that we had denoted as 7r(k 2 , a 2 ).
Using a finite element analysis approach, we specified elements
and detailed our basis functions. These basis functions were asymmetric rational basis functions, which we used primarily for their computational bias towards nondecreasing functions as well as because of their smoothness. We verified our model against theoretically derived expressions for the cases of both linear controllers and 1-bit quantizing controllers. We then used this finite element model within a mathematical optimizational framework to approximately solve the necessary condition at specified Gauss-Hermite quadrature points. We defined a simple family of n-parameter controllers (the n-level sinusoidal quantizers) and demonstrated that we could find controllers that approximately satisfy the necessary condition within this family.
99
5.2
Future Work
There are a number of directions for future research that the work in this thesis opens up. For one, our successful use of finite element analysis in a control-theoretic setting further enforces an already growing trend within the field and should encourage more analysis of control problems using similar numerical techniques. In addition, our repeated and successful use of Gaussian quadrature in approximating expectations and integrals involving Gaussian probability kernels should signal to other researchers that this simple numerical integration scheme can be used to quickly and accurately test the performance of their designed controllers before stress testing them using more sophisticated methods like Monte Carlo sampling. Moreover, our development of a versatile n-parameter controller family demonstrates that controllers need not be too complicated in order to satisfy the necessary condition.
In fact, a very
interesting research direction would be to increase the number of weights and generalize our nparameter controller family to a more sophisticated family of controllers. Additionally, some of the efficient search methods that have been used in both [51 and [6] with great success for the (2n+1)-bit quantization controllers could be employed within this feasible set. Finally, our work has shown that the necessary condition itself can be treated in a computational manner and is not as daunting of a object to study as it would appear. In fact, one could use the general structure of our model to study the effects of using different basis functions and/or unevenly sized elements in an effort to model even more accurately the equations and formulas that arise in WC. For example, one could look into using quadrature abscissa points as the boundary points of the elements instead of the uniformly spaced boundary points we utilized in our model.
100
Appendix A
MATLAB® Code In this appendix, we list the main MATLAB@ functions that we developed for our computational finite element model.
A.1
Function Arguments
In the functions below, the common arguments used are the following: " a is a function handle that maps the ordinal position of an element in the FE model (Section 4.2) to its location on the abscissa real axis; * Bnd refers to the supremum of the BI bounded interval in which we conduct the computation using the FE model, i.e., Bnd = Ko- (Section 4.2.1); * den is the denominator of a fractional quantity; * Del refers to the mesh size of our FE model, A (Section 4.2.1);
" f is a function handle representing the Stage I controller, typically composed of our FE approximation via approxRBF and requiring both an argument at which to evaluate it (i.e., x) and a vector of FE weights (i.e., w) (Section 4.2); " fVec is a vector comprising the values of f when f is evaluated at the abscissa values of the Gauss-Hermite quadrature [ XO, WO ] ; " j refers to the jth element of the FE model,
j
= 1,..., M (Section 4.2.1);
" k refers to the scaling factor for the Stage I cost in WC (Section 3.1); 101
" 1
indicates the first sub-basis function
when 1 = 1 and the second sub-basis function
'1
2
when 1 = 2 (Section 4.2.1); " m is the total number of elements, or number of points at which we place a basis function, which we formally denoted as M (Section 4.2.1); " num is the numerator of a fractional quantity; " tauVec is a vector of length n representing the weights of an n-level sinusoidal quantizer (Section 4.3.2); " w is a vector of length M representing the weights (ca), or values at each element's location, of the overall FE model (Section 4.2); " x is a general variable declaration at which to evaluate a given function, typically representing the initial Stage I random variable realization Xo (Section 3.1); " y is a general variable declaration at which to evaluate a given function, typically representing the output; [X0, WO ] are vectors of length
Q
(Section 4.2.2) representing the Gauss-Hermite quadrature
XO = {xi}-_O
,
(Section 2.3) abscissa points,
and the probability weights at those points
WO =
for the initial Stage 1 random variable Xo
{fwi}QI _
-
(O, .2 ) (Section 3.1);
[ X, W] are vectors of length Q (Section 4.2.2) representing the Gauss-Hermite quadrature (Section 2.3) abscissa points,
X=
[p};11
and the probability weights at those points,
W=
iO
for the additive white Gaussian noise V ~ .(O, 102
1) (Section 3.1);
* [X1, W1] are vectors of length R (Section 4.3.2) representing the Gauss-Hermite quadrature (Section 2.3) abscissa points, X1 = [{xi}R1
and the probability weights at those points,
W, =
IIG[ ]i
at which
A.2
{fWi~R-1]>
(Section 4.3.2) is evaluated.
Function Specifications
Next, we shall detail the MATLAB® functions that we developed to obtain the results in this thesis.
alpha.m This function calculates the approximate value of the right summand in the necessary functional equation G[f](y) (Theorem 4.1.1) using a FEM representation of the Stage I controller f via 77[Ci] (Section 4.2.2): Q-1
ret
: wjy[w](xj + y) (2(xj + y) 2 + T[W](Xj + y) - 2).
=
j=0
1
function
[ ret
]
=
alpha( x,
w,
f,
fVec,
XO,
WO,
X,
W
2
phi = @(t)
3
Xfx = X+arrayfun(@(t)
4
etaVals
5
diff = arrayfun(@(y)
6
FuncVals = etaVals.*(2.*diff.A2 + etaVals.*diff -
7
ret = WO'*FuncVals;
8
1/sqrt(2*pi).*exp(-1/2*t.^2); f(t,w),x);
= arrayfun(@(y) y,
eta(y,w,f,fVec,X0,WO), Xfx)-arrayfun(@(t)
end
103
Xfx);
f(t,w),
x); 2);
approxRBF.m This function creates a function handle for a FE model of a continuous function represented by the element weight vector w (Section 4.2.1):
val
[
function
]
val
approxRBF( x,
=
w, Del, m,
a
0;
2
val
3
curElem = getCurElem(x,m,a);
4
if
=
f[w].
curElem >=
1
if curElem