Jrl Syst Sci & Complexity (2007) 20: 251–261
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS FOR SYSTEM IDENTIFICATION∗ Le Yi WANG · G. George YIN
Received: 31 December 2006 c 2007 Springer Science + Business Media, LLC Abstract This paper studies identification of systems in which the system output is quantized, transmitted through a digital communication channel, and observed afterwards. The concept of the CR Ratio is introduced to characterize impact of communication channels on identification. The relationship between the CR Ratio and Shannon channel capacity is discussed. Identification algorithms are further developed when the channel error probability is unknown. Key words Communication channel, efficient estimation, system identification.
1 Introduction and Problem Formulation This paper studies identification of systems in which the system output must be quantized, transmitted through a digital communication channel, and observed afterwards. Communication errors introduce additional uncertainty that influences identification accuracy. This paper aims to characterize communication channels in terms of their impact on convergence rates of parameter estimators. To accomplish an information-oriented and algorithm-independent characterization of communication channels, we employ the Cram´er-Rao bounds of identification errors, with and without communication channels, which have been established in [1]. The concept of the CR Ratio is introduced. The relationship between the CR Ratio and Shannon channel capacity is discussed for the basic communication model of discrete symmetric channels (DSC). Identification algorithms are further developed when the channel error probability is unknown. The work here is a continuation of [1–5]. System identification with quantized data is investigated under gradient algorithms in [6]. Methods of data quantization and compression are comprehensively discussed in [7, 8]. Some motivating applications for studying system identification with binary-valued sensors can be found in [9, 10]. Integrated analysis of communication channels and control systems are explored in [11–13]. The characterization of channels in terms of CR lower bounds and identification with unknown communication channels that are developed in this paper are new. Le Yi WANG Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan 48202, USA. Email:
[email protected]. G. George YIN Department of Mathematics, Wayne State University, Detroit, Michigan 48202, USA. Email:
[email protected]. ∗ This research is supported in part by supported in part by the National Science Foundation under ECS-0329597, DMS-0603287, and DMS-0624849.
LE YI WANG · G. GEORGE YIN
252
Although the methodology of this paper is general, it can be explained more succinctly from a simple system setting. Consider a gain system yk = uk θ+dk , k = 1, 2, · · ·, where uk is the input, dk is the disturbance, and the gain θ is to be identified. Extensions to FIR or rational transfer functions can be readily achieved by using the methods of [4] and will not be repeated in this paper. The output yk is measured by a sensor of m thresholds −∞ < C1 < C2 < · · · < Cm < ∞. The sensor is represented by a linear combination of indicator functions sk = S(yk ) =
m+1
isik ,
(1)
i=1
where sik = I{yk ∈(Ci−1 ,Ci ]} with i = 1, 2, · · · , m + 1 and 1, if yk ∈ A, I{yk ∈A} = 0, otherwise. Hence sk = i for i = 1, 2, · · · , m + 1 implies that yk ∈ (Ci−1 , Ci ] with C0 = −∞ and Cm+1 = ∞. Two scenarios of system configuration shown in Figure 1 are considered. System identification with quantized sensors is depicted in Figure 1 (a) in which the observations on uk and sk are used. On the other hand, when sensor outputs of a system are transmitted through a communication channel and observed after transmission, the system parameters must be estimated by observing uk and wk , as shown in Figure 1 (b). d u
y
G
Sensor
s
(a) system identification with set-valued observations Channel Uncertainty
d u
G
y
Sensor
s
Communication Channel
w
(b) system identification with communication channels
Figure 1
System configurations
Assumption A Suppose that {dk } is a sequence of i.i.d. (independent and identically distributed) random variables. The accumulative distribution function F (·) of d1 is a twice continuously differentiable function. The moment generating function of d1 exists. Choose uk to be a constant. Without loss of generality, assume uk ≡ 1. Then yk = θ + dk .
(2)
Under Assumption A, {yk } is an i.i.d. sequence that has the accumulative distribution function F (· − θ). A sensor of m thresholds C1 , C2 , · · · , Cm divides the output range into m + 1 intervals (−∞, C1 ], (C1 , C2 ], · · · , (Cm , ∞). For the system in (2), the probability of {sik = 1} is pi = P {Ci−1 < yk ≤ Ci } = F (Ci − θ) − F (Ci−1 − θ) := Fi (θ).
(3)
The premise of our approach is that although the sensor threshold Ci only indicates if the output is in (Ci−1 , Ci ], the probability pi may provide more information about the unknown parameter θ.
253
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS
Let hi (θ) =
dFi (θ) dpi = = −f (Ci − θ) + f (Ci−1 − θ), dθ dθ
i = 1, 2, · · · , m + 1,
where f (·) is the density function. Then, the sensitivity of θ with respect to pi is Denote h(θ) = [h1 (θ), h2 (θ), · · · , hm+1 (θ)]T .
dθ dpi
(4) =
1 hi (θ) .
2 System Analysis with Known Transmission Channels 2.1 Identification Accuracy Lemma 1[1] The Cram´er-Rao Lower Bound for estimating θ based on observations of {sk } is
2 σCR (N, m, θ)
=
N
m+1 i=1
(hi (θ))2 pi (θ)
−1 .
The expressions in Lemma 1 depend only on the function (3) and its derivative (4), respectively. Consequently, it is valid for any problems in which a relationship between the unknown parameters and probabilities can be expressed in a function which has a continuously differentiable inverse. In addition, the findings from [1–5] indicate that by appropriate input design, system identification problems for finite impulse response models, rational models, unknown noise distributions, and Wiener nonlinear systems can all be reduced to a finite number of identification problems for a simple gain system. Consequently, it is a straightforward exercise to extend Lemma 1 to the case of vector parameters θ which may contain both system parameters and distribution function variables. Consider now the scenario of system configuration in which sensor outputs are not directly measured, but rather are transmitted through a communication channel, shown in Figure 1 (b). With yk given by (2) and sk given by (1), when the sensor output sk ∈ {1, 2, · · · , m + 1} is transmitted through a communication channel, the received sequence wk ∈ {1, 2, · · · , m + 1} is subject to channel noise and other uncertainties. When the communication channel is time invariant and memoryless, the relationship between sk and wk is characterized by the conditional probabilities πij = P {wk = i|sk = j}, i, j = 1, 2, · · · , m + 1. It follows that with pj := P {sk = j} for j = 1, 2, · · · , m + 1, ri = P {wk = i} =
m+1
P {wk = i|sk = j}P {sk = j} =
j=1
⎡ ⎢ Π =⎣
πij pj .
j=1
Let r = [r1 , r2 , · · · , rm+1 ]T , p = [p1 , p2 , · · · , pm+1 ]T . Then ⎤ ⎡ ··· π1,m+1 π11 ⎥ ⎢ .. .. r=⎣ ⎦ p := Π p, . . πm+1,1 · · · πm+1,m+1 where
m+1
π11 .. . πm+1,1
⎤ ··· π1,m+1 ⎥ .. .. ⎦. . . · · · πm+1,m+1
(5)
(6)
LE YI WANG · G. GEORGE YIN
254
Assumption B (a) Π is invertible. (b) All pi ’s are strictly positive. Remark 1 Assumption B (a) ensures that probability information obtained at the receiver of the communication channel can be used to deduce the probabilities of the sensor thresholds, −1 , the variance of the which are then used to estimate the system parameters. Since dp dr = Π −1 estimation error depends proportionally on the operator norm of Π . Under Assumption B, (5) yields that p = Π −1 r. If pi = 0, the corresponding sensor threshold is not used. Such pi can be eliminated from our consideration and the resulting p will satisfy Assumption B (b). dr i (θ) and h(θ) = [ h1 (θ), h2 (θ), · · · , hm+1 (θ)]T . Then h = dθ = Π dp Let hi (θ) = drdθ dθ = Π h. [1] Lemma 2 The Cram´er-Rao Lower Bound for estimating θ with observations on wk is m+1 −1 h2i 2 σ CR (N, m) = N . ri i=1 Define Dp = diag(p1 , p2 , · · · , pm+1 ), Dr = diag(r1 , r2 , · · · , rm+1 ), Sp = Dp , and Sr = √
m+1 h2i h2i T −1 T T −1 T −1 Dr . Then, m+1 i=1 pi = h Dp h, and i=1 ri = h Dr h = h Π Dr Π h. It follows that m+1 i=1
m+1 h2i h2i − = hT (Dp−1 − Π T Dr−1 Π )h = hT Sp−1 [I − (Sp Π T Sr−1 )(Sr−1 Π Sp )]Sp−1 h pi r i i=1 = v T (I − M T M )v,
where
v = Sp−1 h,
M = Sr−1 Π Sp .
(7)
2.2 Monotonicity Lemma 3 γ(M ) = 1, where M is given by (7) and γ(M ) is the largest singular value of M.
m+1 Proof In view of (6), we have i=1 πij = 1, j = 1, 2, · · · , m + 1. Also, by (17) r = Π p,
m+1 namely, ri = j=1 πij pj . It is standard (see [13, p.71, Eq.(2.5.8)]) that γ is equal to the induced norm γ(Sr−1 Π Sp ) = sup Sr−1 Π Sp x2 . x2 =1
For any x with x2 = x21 +x22 +· · ·+x2m+1 = 1, the elements of w = Sr−1 Π Sp x can be expressed as
m+1 √ √ √ √ √ [ p1 πi1 , · · · , pm+1 πi,m+1 ][ πi1 x1 , · · · , πi,m+1 xm+1 ]T j=1 πij pj xj wi = = . √ √ ri ri By Cauchy-Shwartz inequality and (5),
m+1
m+1 2 ( m+1 j=1 πij pj )( j=1 πij xj ) 2 = πij x2j , wi ≤ ri j=1
(8)
where the equality holds if and only if there exists a constant ci = 0 such that πij pj = ci πij x2j , for j = 1, 2, · · · , m + 1. It follows that w2 =
m+1 i=1
vi2 ≤
m+1 m+1 i=1 j=1
πij x2j =
m+1 j=1
x2j
m+1 i=1
πij =
m+1 j=1
x2j = 1.
(9)
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS
255
√ Moreover, for xj = pj , j = 1, 2, · · · , m + 1, w2 = 1. This implies that γ(Sr−1 Π Sp ) = 1. 2 2 Theorem 1 Under Assumption B, σ CR (N, m) ≥ σCR (N, m). T Proof Lemma 3 implies that I − M M ≥ 0. As a result, v T (I − M T M )v ≥ 0. Hence,
m+1 h2i
m+1 h2i i=1 pi ≥ i=1 ri . Consequently, 2 = σ CR
N
1
m+1 h2i ≥ i=1
ri
N
1
m+1 i=1
h2i pi
2 = σCR .
Figure 2 demonstrates convergence rates of parameter estimates that use either observations at the sensor site (before communication) or those after communication. In this example, disturbances are gaussian distributed with zero mean and variance σ 2 = 625. The true parameter value is θ = 100. The sensor threshold is C = 100 (this is in fact the optimal threshold). The communication channel is a discrete symmetric channel (DSC) with error probability pe = 0.25. Identification using Non−optimal Fixed Threshold 125 120 115
Estimates
110 105 100 95
Estimation based on Observations before Communication Estimation based on Observation after Communication True Value
90 85 80 75 0
100
200
300
400
500
600
700
800
900
1000
Time Steps
Figure 2
Convergence rates of estimates with observations before or after communications
2.3 CR Ratio of Communication Channels 2 From σCR = tion channel by
1 , (N hT Dp−1 h)
2 σ CR =
1 , (N hT Π T Dp−1 Π h)
η(p, h, Π ) =
we define the error ratio of a communica-
2 hT Dp−1 h σ CR . = T T −1 2 σCR h Π Dr Π h
(10)
h depends on actual function forms which satisfy only 1 T h = 0. Since h is not part of the communication channel, we introduce the following concept to characterize the worst-case impact of a communication channel on identification accuracy. Definition 1 The CR ratio of a communication channel is defined as the worst-case error ratio (11) η(p, Π ) = max η(p, h, Π ) s.t. 1 T h = 0. h=0
Definition 2 A communication channel is said to be degenerate if all singular values of M are equal to 1. Remark 2 Theorem 1 implies that η(p, Π ) ≥ 1. If M is degenerate, then M T M = I. As a result, m+1 h2 m+1 h2i i − = v T (I − M T M )v = 0, p r i i i=1 i=1
LE YI WANG · G. GEORGE YIN
256
2 2 and σ CR (N, m) = σCR (N, m). This is the case when the channel does not introduce uncertainty and η(p, Π ) = 1. Theorem 2 Under Assumption B, if the channel is not degenerate, then η(p, Π ) = γ 2 (M −1 ), where γ is the largest singular value. The following basic relationships will be used in derivations: r = Π p, 1 T Π = 1 T , 1 T Π −1 = T 1 , 1 T Dp = pT , and 1 T Dr = rT . Proof By defining v = Sr−1 Π h, we have h = Π −1 Sr v, 1 T Π −1 Sr v = 0, and
η(p, h, Π ) = η1 (p, v) =
v T Sr Π −T Sp−1 Sp−1 Π −1 Sr v v T M −T M −1 v = . T v v vT v
It follows that (10) can be equivalently expressed as η(p, Π ) = max v T M −T M −1 v v T v=1
s.t.
1 T Π −1 Sr v = 1 T Sr v = 0.
Apparently, since η(p, Π ) is a conditional maximization, η(p, Π ) ≤ γ 2 (M −1 ). On the other hand, since the communication channel is not degenerate, γ 2 (M −1 ) > 1. Let α ∈ R and v ∈ Rm+1 be an eigenvalue/eigenvector pair for M −T M −1 v = αv.
(12)
We will show that if α > 1, then v satisfies 1 T Sr v = 0. This will prove that η(p, Π ) = γ 2 (M −1 ) > 1. (12) can be expressed as v = αM M T v. This implies, noting M = Sr−1 Π Sp , that Sr v = αSr M M T v = αΠ Dp Π T Sr−1 v. It follows that 1 T Sr v = α11T Π Dp Π T Sr−1 v = αpT Π T Sr−1 v = αrT Sr−1 v = α11T Dr Sr−1 v = α11T Sr v.
(13)
Since α > 1, this implies 1 T Sr v = 0 as claimed. 2.4 CR Ratio and Channel Capacity The CR ratio characterizes accuracy of information during communication. It is different, in its meaning and numerical values, from Shannon’s channel capacity C which defines a channel’s capability in passing a flow of information. For example, when Π is not invertible the accuracy of information for system identification is lost. But the channel may still allow data flow through it. In this case, the channel capacity C > 0, but the CR ratio η(p, Π ) = ∞ for any p. However, the CR ratio η(p, Π ) and mutual information (channel capacity C is the maximum mutual information) are closely related. This can be seen clearly from the typical BSC (binary symmetric channels), which are special types of DMC (discrete memoryless channels). A BSC is characterized by the probability transition matrix pe 1 − pe , (14) Π = pe 1 − pe where pe is the transmission error probability. Both CR ratios and mutual information are functions of the input probability p. It can be shown that under DSC, the inverse CR ratio is a monotone function of the mutual information.
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS
257
Since the actual code probability into the channel can be modified by source coding,
m+1 we introduce the optimal CR ratio as η(Π ) = min η(p, Π ), where p satisfies pi > 0 and i=1 pi = 1. p
The optimal CR ratio indicates the optimal information transfer when the source code is optimally designed. Under DSC, the CR ratio achieves its minimum value when the mutual information achieves the channel capacity. For example, for pe = 0.1, C = 0.3681, and η(Π ) = 1.25; pe = 0.25, C = 0.1308, and η(Π ) = 2; pe = 0.4, C = 0.0201, and η(Π ) = 5. In general, if C = 0, it can be shown that Π will be singular and η(Π ) = ∞.
3
Parameter Identification with Unknown Communication Channels
3.1 Basic Relationships We will start with a basic type of communication channels to study the impact of communication channels on system identification. Suppose the transition probability matrix of a memoryless symmetric binary channel is given by (14). Nevertheless, different from the last section, here the error probability pe is a constant but unknown. If a sensor is binary valued, then the probability p = P {sk = 1} and r are related by r = (1 − pe )p + pe (1 − p). Note that p = F (C − θu) when u is a constant input. If pe is unknown, one cannot identify θ with this relationship alone. We shall consider two possible solutions. Note that for this channel, the channel capacity is C = 1 − h(pe ), where h(x) = −x log2 x − (1 − x) log2 (1 − x). When pe = 0.5, C = 0. In this case r = (1 − pe )p + pe (1 − p) = 0.5. That is, r does not provide any information on p, and in turn, on θ. In the following we assume pe < 0.5. 1) Binary Sensors with Input Design More information on θ can be provided by using more input values. Consider a 2-periodic signal with one period value u1 and u2 , u1 = u2 . This leads to the relationships ri = (1 − pe )F (C − θui ) + pe (1 − F (C − θui )),
i = 1, 2.
Eliminating pe from the above equations, we obtain r1 , r2 ) = r1 −r2 −[F (C −θu1 )−F (C −θu2 )]+2[F (C −θu1 )r2 −F (C −θu2 )r1 ] = 0. (15) H(θ, 2) Sensors with Multiple Thresholds More information on θ can also be provided by using more sensor thresholds. Consider a sensor with two thresholds C1 and C2 . Suppose u ≡ 1. This leads to the relationships ri = (1 − pe )F (Ci − θ) + pe (1 − F (Ci − θ)),
i = 1, 2.
as Eliminating pe from the above equations, we redefine H(·) r1 , r2 ) = r1 − r2 − (F (C1 − θ) − F (C2 − θ)) + 2(F (C1 − θ)r2 − F (C2 − θ)r1 ) = 0. (16) H(θ, In both cases, the relationship can be generically expressed as H(θ, r1 , r2 ) = 0.
(17)
This will be solved by stochastic approximation algorithms; see [14] and [15] for general references on stochastic approximation. The estimations are done under a fixed pe . In the last part of this section, we make a brief remark on how pe can be estimated alternatively.
LE YI WANG · G. GEORGE YIN
258
3.2 Parameter Sensitivities To gain insights on suitable algorithms for solving (17), we plot the function shapes in Figure 3. The true parameter value is θtrue = 100, and C = 100, u1 = 1, u2 = 10. Then, the r1 and r2 are calculated from ritrue = (1 − pe )F (C − θtrue u1 ) + pe (1 − F (C − θtrue ui )),
i = 1, 2,
where F is a normal distribution function with zero mean and σ 2 = 625. Under the calculated r1true and r2true , we plot H1 (θ, r1true , r2true ) as a function of θ with three values of transmission error probability: pe = 0.1, 0.25, 0.4 in Figure 3. One may observe also that for large transmission error probability, the sensitivity of the function near the true parameter value increases, rendering a more difficult problem numerically. Function Shapes for H(
Function Shapes for H( ), pe=0.25
) 0.3
0.3
0.2
0.2
0.1
function values
function values
0.4
0.1 0
−0.1 pe=0.1 pe=0.25 pe=0.4
−0.2
−0.2 −0.3
−0.3 −0.4
0 −0.1
−0.4
0
20
40
60
80
100
120
140
160
180
200
Figure 3 The function H1 (θ, r1true , r2true ) under different transmission error probabilities. The sensor threshold is C = 100
−0.5 0
C=70 C=85 C=100
20
40
60
80
100
120
140
160
180
200
Figure 4 The function H1 (θ, r1true , r2true ) under transmission error probability pe = 0.25 and different sensor thresholds C = 70, 85, 100
It should be pointed out that the threshold selection is an important aspect. To illustrate, we use the same system parameters and the input selection, except that the threshold takes values C = 70, 85, 100. Figure 4 shows that near the true parameter, the sensitivity changes significantly with the threshold values. 3.3 Stochastic Approximation Type Algorithms Assume that for each θ, the noisy observation {wk } = {wkθ } is a sequence of stationary random variables with θ-dependent probabilities P θ (·). Then for each θ, ri = P θ (wk = 1, ui being used) is the corresponding probabilities for i = 1, 2. To proceed, let r1 and r2 be estimated 1 2 and ζN . That is, for each fixed θ, let by their corresponding empirical measures ζN i ζN (θ)
N 1 θ = I N i=1 {wk =1,
ui being used}
for i = 1, 2.
(18)
Then, the parameter estimate is updated by 1 2 θN +1 = θN − εN H(θN , ζN (θN ), ζN (θN )),
(19)
where εN > 0 is the step size. Typical selection of εN includes small constant step size or εN = Nβα for some β > 0 and 0 < α < 1. In what follows, for definiteness, we work with εj = ∞ a sequence of decreasing step sizes satisfying εN ≥ 0, εN → 0 as N → ∞ and j
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS
259
throughout the rest of the section. Convergence of the algorithm (19) can be established by the ODE approach: Consider the limiting differential equation dθ = −H(θ, r1 (θ), r2 (θ)). dt
(20)
Assumption C For each fixed θ, the sequence {wkθ } is a stationary φ-mixing sequence with probabilities P θ (·) that is continuous with respect to θ. The ODE (20) has a unique asymptotically stable point (θtrue , r1true , r2true ). Remark 3 The shapes of the functions in Figure 4 indicate that θtrue is the unique equilibrium point of (20) and the function satisfies (θ − θtrue )H(θ, r1 (θ), r2 (θ)) > 0, θ = θtrue . The global asymptotic stability of the equilibrium point θtrue of (20) can be established since V = (θ − θtrue )2 is a radially unbounded strict Lyapunov function of the equilibrium point: dV = −2(θ − θtrue )H(θ, r1 (θ), r2 (θ)) < 0, dt
θ = θtrue .
Thus, Assumption C is verified. Theorem 3 Under Assumption C, the sequence of iterates {θN } defined in (19) is convergent in that θN → θtrue w.p.1 as N → ∞. Proof In view of the forms of H(·) given in (15) and (16), it is easily verified that H(·) is continuous with respect to its arguments. Define interpolations of the iterates as tN =
N
εi ,
θ0 (t) = θN ,
[tN , tN +1 ),
θN (t) = θ0 (t + tN ),
m(t) = max{N ; tN ≤ t}.
i=1
Then it can be verified that {θN (·)} is uniformly bounded and equicontinuos in the extended sense (see [15, p. 102]). Thus we can extract a convergent subsequence. Do so and still denote the subsequence by {θN (·)} for notational simplicity. We proceed to characterize the limit process. To this end, observe that the w.p.1 limit of θN (·) should coincide with the weak limit (in the sense of weak convergence of probability measures) of θN (·), which would be easier to obtain. Thus we need only characterize the limit process through its weak limit. Denote the index set χ = { : t ≤ tm < tm+1 −1 ≤ t + s}. Let ∆N be a sequence a positive real numbers satisfying ∆N → 0 and select an increasing sequence {m (N )} such that m(tN + t) = m1 (N ) < m2 (N ) < · · · ≤ m(tN + t + s) − 1, and that for m(tN + t) ≤ m ≤ m+1 ≤ m(tN + t + s) − 1, 1 m+1 (N )−1 εj → 1 as N → ∞. In what follows, denote the m (n) by m for simplicity. For j=m (N ) ∆N any bounded and continuous function ρ(·), any positive integer κ, any 0 < ti ≤ t with i ≤ κ, we have Eρ(θN (ti ), i ≤ κ)[θN (t + s) − θN (t)] m(tN +t+s)−1 εj H(θj , ζj1 (θj ), ζj2 (θj )) = Eρ(θN (ti ), i ≤ κ) j=m(tN +t)
= Eρ(θN (ti ), i ≤ κ)
m+1 −1
Em εj H(θj , ζj1 (θj ), ζj2 (θj )),
(21)
∈χ j=m
where Em denotes the conditional expectation with respect to Fm , the σ-algebra generated by past data up to m . The continuity of H(·) with respect to its arguments, and the weak continuity of ζji (θ) with respect to θ, then yield that the limit on the last line of (21) can be
LE YI WANG · G. GEORGE YIN
260 replaced by Eρ(θN (ti ), i ≤ κ)
m+1 −1
Em εj H(θm , ζj1 (θm ), ζj2 (θm ))
∈χ j=m
= Eρ(θN (ti ), i ≤ κ)
∆N
∈χ
1 ∆N
m+1 −1
εj Em H(θN (u), ζj1 (θN (u)), ζj2 (θN (u))),
j=m
for tm ≤ u < tm+1 . Approximating θN (u) by a finite-valued process as in [15, p.168 and p.257], and noting m+1 −1 H(θ, ζj1 (θ), ζj2 (θ)) → H(θ, r1 (θ), r2 (θ)) (m+1 − m ) j=m
for each fixed θ by the well-known Glivenko-Cantelli Theorem (since the φ-mixing property implies the ergodicity) enables us to conclude that θ(·), the limit of θN (·), satisfies
t
Eρ(θ(ti ), i ≤ κ)[θ(t + s) − θ(t) −
H(θ(u), r1 (θ(u)), r2 (θ(u)))du] = 0. 0
That is, the limit satisfies the ODE (20). Finally, θN → θtrue follows from a stability argument. The details are omitted for brevity. The reader is referred to [15] for further information. Figure 5 shows simulation results of the algorithm (19), under the conditions that the noise is normal distributed with zero mean and variance 625, C = 100, u1 = 1, u2 = 10. The step size is chosen as εN = N10 0.3 . The initial estimate is selected as θ0 = 140. Parameter Estimation with Unkknown Error Probability of Communication Channels
140 135 130
Estimates
125 120 115 110 105 100 0
100
200
300
400
500
600
700
800
900
1000
Time Steps
Figure 5 Stochastic approximation algorithms with step size εN =
10 N 0.3
Remark 4 Here the estimation is obtained by use of empirical distributions of the process related to ri (θ). An alternative approach is to estimate Π , the transition probability matrix of the communication channel, using a Wonham filtering technique. Due to the page limitation, the details are omitted; references on Wonham filtering can be found in [16, Chapter 8].
4 Conclusions Communication channels have significant impact on identification accuracy. This paper introduces the concept of CR ratios to investigate this impact from a complexity viewpoint that the CR lower bounds increase when communication errors increase. Relation of the CR ratio to the Shannon channel capacity is discussed. Algorithms are presented to identify system
INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS
261
parameters when the error probability of communication channels is unknown. Generalization of these findings to general quantization schemes and other communication channel models remain open issues at present. References [1] L. Y. Wang and G. Yin, Asymptotically efficient parameter estimation using quantized output observations, Automatica, to appear. [2] L. Y. Wang, G. Yin, and J. F. Zhang, System identification using quantized data, in SYSID 2006, Newcastle, Australia, March 2006. [3] L. Y. Wang, J. F. Zhang, and G. Yin, System identification using binary sensors, IEEE Trans. Automat. Control, 2003, 48: 1892–1907. [4] L. Y. Wang, G. Yin, and J. F. Zhang, Joint identification of plant rational models and noise distribution functions using binary-valued observations, Automatica, 2006, 42: 535–547. [5] Y. L. Zhao, L. Y. Wang, G. Yin, and J. F. Zhang, Identification of Wiener models with binary valued output observations, Automatica, to appear. [6] T. Wigren, Adaptive filtering using quantized output measurements, IEEE Trans. Signal Processing, 1998, 46: 3423–3426. [7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1992. [8] K. Sayood, Introduction to Data Compression (2nd Ed.), Morgan Kaufmann, 2000. [9] E. R. Caianiello and A. de Luca, Decision equation for binary systems: application to neural behavior, Kybernetik, 1966, 3: 33–40. [10] J. Sun, Y. Kim, and L.Y. Wang, HEGO signal processing and strategy adaptation for improved performance in lean burn engines with a lean NOx trap, Int. J. of Adaptive Control and Signal Processing, 2004, 18(2): 145–166. [11] X. Liu and A. Goldsmith, Wireless communication tradeoffs in distributed control, in 42nd IEEE Conference on Decision and Control, 2003, 1: 688–694. [12] A. M. Sayeed, A signal modeling framework for integrated design of sensor networks, in IEEE Workshop Statistical Signal Processing, 28 Sept. – 1 Oct., 2003. [13] L. Xiao, M. Johansson, H. Hindi, S. Boyd, and A. Goldsmith, Joint optimization of communication rates and linear systems, IEEE Trans. Automatic Control, 2003, 48(1): 148–153. [14] H.-F. Chen, Stochastic Approximation and Its Applications, Kluwer Academic, Dordrecht, Netherlands, 2002. [15] H.J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications (2nd Ed.), Springer-Verlag, New York, 2003. [16] G. Yin and Q. Zhang, Discrete-time Markov Chains: Two-Time-Scale Methods and Applications, Springer, New York, 2005.