information characterization of communication ... - Springer Link

Comment

Report 4 Downloads 124 Views

Jrl Syst Sci & Complexity (2007) 20: 251–261

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS FOR SYSTEM IDENTIFICATION∗ Le Yi WANG · G. George YIN

Received: 31 December 2006 c 2007 Springer Science + Business Media, LLC Abstract This paper studies identiﬁcation of systems in which the system output is quantized, transmitted through a digital communication channel, and observed afterwards. The concept of the CR Ratio is introduced to characterize impact of communication channels on identiﬁcation. The relationship between the CR Ratio and Shannon channel capacity is discussed. Identiﬁcation algorithms are further developed when the channel error probability is unknown. Key words Communication channel, eﬃcient estimation, system identiﬁcation.

1 Introduction and Problem Formulation This paper studies identiﬁcation of systems in which the system output must be quantized, transmitted through a digital communication channel, and observed afterwards. Communication errors introduce additional uncertainty that inﬂuences identiﬁcation accuracy. This paper aims to characterize communication channels in terms of their impact on convergence rates of parameter estimators. To accomplish an information-oriented and algorithm-independent characterization of communication channels, we employ the Cram´er-Rao bounds of identiﬁcation errors, with and without communication channels, which have been established in [1]. The concept of the CR Ratio is introduced. The relationship between the CR Ratio and Shannon channel capacity is discussed for the basic communication model of discrete symmetric channels (DSC). Identiﬁcation algorithms are further developed when the channel error probability is unknown. The work here is a continuation of [1–5]. System identiﬁcation with quantized data is investigated under gradient algorithms in [6]. Methods of data quantization and compression are comprehensively discussed in [7, 8]. Some motivating applications for studying system identiﬁcation with binary-valued sensors can be found in [9, 10]. Integrated analysis of communication channels and control systems are explored in [11–13]. The characterization of channels in terms of CR lower bounds and identiﬁcation with unknown communication channels that are developed in this paper are new. Le Yi WANG Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan 48202, USA. Email: [email protected]. G. George YIN Department of Mathematics, Wayne State University, Detroit, Michigan 48202, USA. Email: [email protected]. ∗ This research is supported in part by supported in part by the National Science Foundation under ECS-0329597, DMS-0603287, and DMS-0624849.

LE YI WANG · G. GEORGE YIN

252

Although the methodology of this paper is general, it can be explained more succinctly from a simple system setting. Consider a gain system yk = uk θ+dk , k = 1, 2, · · ·, where uk is the input, dk is the disturbance, and the gain θ is to be identiﬁed. Extensions to FIR or rational transfer functions can be readily achieved by using the methods of [4] and will not be repeated in this paper. The output yk is measured by a sensor of m thresholds −∞ < C1 < C2 < · · · < Cm < ∞. The sensor is represented by a linear combination of indicator functions sk = S(yk ) =

m+1

isik ,

(1)

i=1

where sik = I{yk ∈(Ci−1 ,Ci ]} with i = 1, 2, · · · , m + 1 and 1, if yk ∈ A, I{yk ∈A} = 0, otherwise. Hence sk = i for i = 1, 2, · · · , m + 1 implies that yk ∈ (Ci−1 , Ci ] with C0 = −∞ and Cm+1 = ∞. Two scenarios of system conﬁguration shown in Figure 1 are considered. System identiﬁcation with quantized sensors is depicted in Figure 1 (a) in which the observations on uk and sk are used. On the other hand, when sensor outputs of a system are transmitted through a communication channel and observed after transmission, the system parameters must be estimated by observing uk and wk , as shown in Figure 1 (b). d u

y

G

Sensor

s

(a) system identification with set-valued observations Channel Uncertainty

d u

G

y

Sensor

s

Communication Channel

w

(b) system identification with communication channels

Figure 1

System conﬁgurations

Assumption A Suppose that {dk } is a sequence of i.i.d. (independent and identically distributed) random variables. The accumulative distribution function F (·) of d1 is a twice continuously diﬀerentiable function. The moment generating function of d1 exists. Choose uk to be a constant. Without loss of generality, assume uk ≡ 1. Then yk = θ + dk .

(2)

Under Assumption A, {yk } is an i.i.d. sequence that has the accumulative distribution function F (· − θ). A sensor of m thresholds C1 , C2 , · · · , Cm divides the output range into m + 1 intervals (−∞, C1 ], (C1 , C2 ], · · · , (Cm , ∞). For the system in (2), the probability of {sik = 1} is pi = P {Ci−1 < yk ≤ Ci } = F (Ci − θ) − F (Ci−1 − θ) := Fi (θ).

(3)

The premise of our approach is that although the sensor threshold Ci only indicates if the output is in (Ci−1 , Ci ], the probability pi may provide more information about the unknown parameter θ.

253

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS

Let hi (θ) =

dFi (θ) dpi = = −f (Ci − θ) + f (Ci−1 − θ), dθ dθ

i = 1, 2, · · · , m + 1,

where f (·) is the density function. Then, the sensitivity of θ with respect to pi is Denote h(θ) = [h1 (θ), h2 (θ), · · · , hm+1 (θ)]T .

dθ dpi

(4) =

1 hi (θ) .

2 System Analysis with Known Transmission Channels 2.1 Identification Accuracy Lemma 1[1] The Cram´er-Rao Lower Bound for estimating θ based on observations of {sk } is

2 σCR (N, m, θ)

=

N

m+1 i=1

(hi (θ))2 pi (θ)

−1 .

The expressions in Lemma 1 depend only on the function (3) and its derivative (4), respectively. Consequently, it is valid for any problems in which a relationship between the unknown parameters and probabilities can be expressed in a function which has a continuously diﬀerentiable inverse. In addition, the ﬁndings from [1–5] indicate that by appropriate input design, system identiﬁcation problems for ﬁnite impulse response models, rational models, unknown noise distributions, and Wiener nonlinear systems can all be reduced to a ﬁnite number of identiﬁcation problems for a simple gain system. Consequently, it is a straightforward exercise to extend Lemma 1 to the case of vector parameters θ which may contain both system parameters and distribution function variables. Consider now the scenario of system conﬁguration in which sensor outputs are not directly measured, but rather are transmitted through a communication channel, shown in Figure 1 (b). With yk given by (2) and sk given by (1), when the sensor output sk ∈ {1, 2, · · · , m + 1} is transmitted through a communication channel, the received sequence wk ∈ {1, 2, · · · , m + 1} is subject to channel noise and other uncertainties. When the communication channel is time invariant and memoryless, the relationship between sk and wk is characterized by the conditional probabilities πij = P {wk = i|sk = j}, i, j = 1, 2, · · · , m + 1. It follows that with pj := P {sk = j} for j = 1, 2, · · · , m + 1, ri = P {wk = i} =

m+1

P {wk = i|sk = j}P {sk = j} =

j=1

⎡ ⎢ Π =⎣

πij pj .

j=1

Let r = [r1 , r2 , · · · , rm+1 ]T , p = [p1 , p2 , · · · , pm+1 ]T . Then ⎤ ⎡ ··· π1,m+1 π11 ⎥ ⎢ .. .. r=⎣ ⎦ p := Π p, . . πm+1,1 · · · πm+1,m+1 where

m+1

π11 .. . πm+1,1

⎤ ··· π1,m+1 ⎥ .. .. ⎦. . . · · · πm+1,m+1

(5)

(6)

LE YI WANG · G. GEORGE YIN

254

Assumption B (a) Π is invertible. (b) All pi ’s are strictly positive. Remark 1 Assumption B (a) ensures that probability information obtained at the receiver of the communication channel can be used to deduce the probabilities of the sensor thresholds, −1 , the variance of the which are then used to estimate the system parameters. Since dp dr = Π −1 estimation error depends proportionally on the operator norm of Π . Under Assumption B, (5) yields that p = Π −1 r. If pi = 0, the corresponding sensor threshold is not used. Such pi can be eliminated from our consideration and the resulting p will satisfy Assumption B (b). dr i (θ) and h(θ) = [ h1 (θ), h2 (θ), · · · , hm+1 (θ)]T . Then h = dθ = Π dp Let hi (θ) = drdθ dθ = Π h. [1] Lemma 2 The Cram´er-Rao Lower Bound for estimating θ with observations on wk is m+1 −1 h2i 2 σ CR (N, m) = N . ri i=1 Deﬁne Dp = diag(p1 , p2 , · · · , pm+1 ), Dr = diag(r1 , r2 , · · · , rm+1 ), Sp = Dp , and Sr = √

m+1 h2i h2i T −1 T T −1 T −1 Dr . Then, m+1 i=1 pi = h Dp h, and i=1 ri = h Dr h = h Π Dr Π h. It follows that m+1 i=1

m+1 h2i h2i − = hT (Dp−1 − Π T Dr−1 Π )h = hT Sp−1 [I − (Sp Π T Sr−1 )(Sr−1 Π Sp )]Sp−1 h pi r i i=1 = v T (I − M T M )v,

where

v = Sp−1 h,

M = Sr−1 Π Sp .

(7)

2.2 Monotonicity Lemma 3 γ(M ) = 1, where M is given by (7) and γ(M ) is the largest singular value of M.

m+1 Proof In view of (6), we have i=1 πij = 1, j = 1, 2, · · · , m + 1. Also, by (17) r = Π p,

m+1 namely, ri = j=1 πij pj . It is standard (see [13, p.71, Eq.(2.5.8)]) that γ is equal to the induced norm γ(Sr−1 Π Sp ) = sup Sr−1 Π Sp x2 . x2 =1

For any x with x2 = x21 +x22 +· · ·+x2m+1 = 1, the elements of w = Sr−1 Π Sp x can be expressed as

m+1 √ √ √ √ √ [ p1 πi1 , · · · , pm+1 πi,m+1 ][ πi1 x1 , · · · , πi,m+1 xm+1 ]T j=1 πij pj xj wi = = . √ √ ri ri By Cauchy-Shwartz inequality and (5),

m+1

m+1 2 ( m+1 j=1 πij pj )( j=1 πij xj ) 2 = πij x2j , wi ≤ ri j=1

(8)

where the equality holds if and only if there exists a constant ci = 0 such that πij pj = ci πij x2j , for j = 1, 2, · · · , m + 1. It follows that w2 =

m+1 i=1

vi2 ≤

m+1 m+1 i=1 j=1

πij x2j =

m+1 j=1

x2j

m+1 i=1

πij =

m+1 j=1

x2j = 1.

(9)

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS

255

√ Moreover, for xj = pj , j = 1, 2, · · · , m + 1, w2 = 1. This implies that γ(Sr−1 Π Sp ) = 1. 2 2 Theorem 1 Under Assumption B, σ CR (N, m) ≥ σCR (N, m). T Proof Lemma 3 implies that I − M M ≥ 0. As a result, v T (I − M T M )v ≥ 0. Hence,

m+1 h2i

m+1 h2i i=1 pi ≥ i=1 ri . Consequently, 2 = σ CR

N

1

m+1 h2i ≥ i=1

ri

N

1

m+1 i=1

h2i pi

2 = σCR .

Figure 2 demonstrates convergence rates of parameter estimates that use either observations at the sensor site (before communication) or those after communication. In this example, disturbances are gaussian distributed with zero mean and variance σ 2 = 625. The true parameter value is θ = 100. The sensor threshold is C = 100 (this is in fact the optimal threshold). The communication channel is a discrete symmetric channel (DSC) with error probability pe = 0.25. Identification using Non−optimal Fixed Threshold 125 120 115

Estimates

110 105 100 95

Estimation based on Observations before Communication Estimation based on Observation after Communication True Value

90 85 80 75 0

100

200

300

400

500

600

700

800

900

1000

Time Steps

Figure 2

Convergence rates of estimates with observations before or after communications

2.3 CR Ratio of Communication Channels 2 From σCR = tion channel by

1 , (N hT Dp−1 h)

2 σ CR =

1 , (N hT Π T Dp−1 Π h)

η(p, h, Π ) =

we deﬁne the error ratio of a communica-

2 hT Dp−1 h σ CR . = T T −1 2 σCR h Π Dr Π h

(10)

h depends on actual function forms which satisfy only 1 T h = 0. Since h is not part of the communication channel, we introduce the following concept to characterize the worst-case impact of a communication channel on identiﬁcation accuracy. Definition 1 The CR ratio of a communication channel is deﬁned as the worst-case error ratio (11) η(p, Π ) = max η(p, h, Π ) s.t. 1 T h = 0. h=0

Definition 2 A communication channel is said to be degenerate if all singular values of M are equal to 1. Remark 2 Theorem 1 implies that η(p, Π ) ≥ 1. If M is degenerate, then M T M = I. As a result, m+1 h2 m+1 h2i i − = v T (I − M T M )v = 0, p r i i i=1 i=1

LE YI WANG · G. GEORGE YIN

256

2 2 and σ CR (N, m) = σCR (N, m). This is the case when the channel does not introduce uncertainty and η(p, Π ) = 1. Theorem 2 Under Assumption B, if the channel is not degenerate, then η(p, Π ) = γ 2 (M −1 ), where γ is the largest singular value. The following basic relationships will be used in derivations: r = Π p, 1 T Π = 1 T , 1 T Π −1 = T 1 , 1 T Dp = pT , and 1 T Dr = rT . Proof By deﬁning v = Sr−1 Π h, we have h = Π −1 Sr v, 1 T Π −1 Sr v = 0, and

η(p, h, Π ) = η1 (p, v) =

v T Sr Π −T Sp−1 Sp−1 Π −1 Sr v v T M −T M −1 v = . T v v vT v

It follows that (10) can be equivalently expressed as η(p, Π ) = max v T M −T M −1 v v T v=1

s.t.

1 T Π −1 Sr v = 1 T Sr v = 0.

Apparently, since η(p, Π ) is a conditional maximization, η(p, Π ) ≤ γ 2 (M −1 ). On the other hand, since the communication channel is not degenerate, γ 2 (M −1 ) > 1. Let α ∈ R and v ∈ Rm+1 be an eigenvalue/eigenvector pair for M −T M −1 v = αv.

(12)

We will show that if α > 1, then v satisﬁes 1 T Sr v = 0. This will prove that η(p, Π ) = γ 2 (M −1 ) > 1. (12) can be expressed as v = αM M T v. This implies, noting M = Sr−1 Π Sp , that Sr v = αSr M M T v = αΠ Dp Π T Sr−1 v. It follows that 1 T Sr v = α11T Π Dp Π T Sr−1 v = αpT Π T Sr−1 v = αrT Sr−1 v = α11T Dr Sr−1 v = α11T Sr v.

(13)

Since α > 1, this implies 1 T Sr v = 0 as claimed. 2.4 CR Ratio and Channel Capacity The CR ratio characterizes accuracy of information during communication. It is diﬀerent, in its meaning and numerical values, from Shannon’s channel capacity C which deﬁnes a channel’s capability in passing a ﬂow of information. For example, when Π is not invertible the accuracy of information for system identiﬁcation is lost. But the channel may still allow data ﬂow through it. In this case, the channel capacity C > 0, but the CR ratio η(p, Π ) = ∞ for any p. However, the CR ratio η(p, Π ) and mutual information (channel capacity C is the maximum mutual information) are closely related. This can be seen clearly from the typical BSC (binary symmetric channels), which are special types of DMC (discrete memoryless channels). A BSC is characterized by the probability transition matrix pe 1 − pe , (14) Π = pe 1 − pe where pe is the transmission error probability. Both CR ratios and mutual information are functions of the input probability p. It can be shown that under DSC, the inverse CR ratio is a monotone function of the mutual information.

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS

257

Since the actual code probability into the channel can be modiﬁed by source coding,

m+1 we introduce the optimal CR ratio as η(Π ) = min η(p, Π ), where p satisﬁes pi > 0 and i=1 pi = 1. p

The optimal CR ratio indicates the optimal information transfer when the source code is optimally designed. Under DSC, the CR ratio achieves its minimum value when the mutual information achieves the channel capacity. For example, for pe = 0.1, C = 0.3681, and η(Π ) = 1.25; pe = 0.25, C = 0.1308, and η(Π ) = 2; pe = 0.4, C = 0.0201, and η(Π ) = 5. In general, if C = 0, it can be shown that Π will be singular and η(Π ) = ∞.

3

Parameter Identification with Unknown Communication Channels

3.1 Basic Relationships We will start with a basic type of communication channels to study the impact of communication channels on system identiﬁcation. Suppose the transition probability matrix of a memoryless symmetric binary channel is given by (14). Nevertheless, diﬀerent from the last section, here the error probability pe is a constant but unknown. If a sensor is binary valued, then the probability p = P {sk = 1} and r are related by r = (1 − pe )p + pe (1 − p). Note that p = F (C − θu) when u is a constant input. If pe is unknown, one cannot identify θ with this relationship alone. We shall consider two possible solutions. Note that for this channel, the channel capacity is C = 1 − h(pe ), where h(x) = −x log2 x − (1 − x) log2 (1 − x). When pe = 0.5, C = 0. In this case r = (1 − pe )p + pe (1 − p) = 0.5. That is, r does not provide any information on p, and in turn, on θ. In the following we assume pe < 0.5. 1) Binary Sensors with Input Design More information on θ can be provided by using more input values. Consider a 2-periodic signal with one period value u1 and u2 , u1 = u2 . This leads to the relationships ri = (1 − pe )F (C − θui ) + pe (1 − F (C − θui )),

i = 1, 2.

Eliminating pe from the above equations, we obtain r1 , r2 ) = r1 −r2 −[F (C −θu1 )−F (C −θu2 )]+2[F (C −θu1 )r2 −F (C −θu2 )r1 ] = 0. (15) H(θ, 2) Sensors with Multiple Thresholds More information on θ can also be provided by using more sensor thresholds. Consider a sensor with two thresholds C1 and C2 . Suppose u ≡ 1. This leads to the relationships ri = (1 − pe )F (Ci − θ) + pe (1 − F (Ci − θ)),

i = 1, 2.

as Eliminating pe from the above equations, we redeﬁne H(·) r1 , r2 ) = r1 − r2 − (F (C1 − θ) − F (C2 − θ)) + 2(F (C1 − θ)r2 − F (C2 − θ)r1 ) = 0. (16) H(θ, In both cases, the relationship can be generically expressed as H(θ, r1 , r2 ) = 0.

(17)

This will be solved by stochastic approximation algorithms; see [14] and [15] for general references on stochastic approximation. The estimations are done under a ﬁxed pe . In the last part of this section, we make a brief remark on how pe can be estimated alternatively.

LE YI WANG · G. GEORGE YIN

258

3.2 Parameter Sensitivities To gain insights on suitable algorithms for solving (17), we plot the function shapes in Figure 3. The true parameter value is θtrue = 100, and C = 100, u1 = 1, u2 = 10. Then, the r1 and r2 are calculated from ritrue = (1 − pe )F (C − θtrue u1 ) + pe (1 − F (C − θtrue ui )),

i = 1, 2,

where F is a normal distribution function with zero mean and σ 2 = 625. Under the calculated r1true and r2true , we plot H1 (θ, r1true , r2true ) as a function of θ with three values of transmission error probability: pe = 0.1, 0.25, 0.4 in Figure 3. One may observe also that for large transmission error probability, the sensitivity of the function near the true parameter value increases, rendering a more diﬃcult problem numerically. Function Shapes for H(

Function Shapes for H( ), pe=0.25

) 0.3

0.3

0.2

0.2

0.1

function values

function values

0.4

0.1 0

−0.1 pe=0.1 pe=0.25 pe=0.4

−0.2

−0.2 −0.3

−0.3 −0.4

0 −0.1

−0.4

0

20

40

60

80

100

120

140

160

180

200

Figure 3 The function H1 (θ, r1true , r2true ) under diﬀerent transmission error probabilities. The sensor threshold is C = 100

−0.5 0

C=70 C=85 C=100

20

40

60

80

100

120

140

160

180

200

Figure 4 The function H1 (θ, r1true , r2true ) under transmission error probability pe = 0.25 and diﬀerent sensor thresholds C = 70, 85, 100

It should be pointed out that the threshold selection is an important aspect. To illustrate, we use the same system parameters and the input selection, except that the threshold takes values C = 70, 85, 100. Figure 4 shows that near the true parameter, the sensitivity changes signiﬁcantly with the threshold values. 3.3 Stochastic Approximation Type Algorithms Assume that for each θ, the noisy observation {wk } = {wkθ } is a sequence of stationary random variables with θ-dependent probabilities P θ (·). Then for each θ, ri = P θ (wk = 1, ui being used) is the corresponding probabilities for i = 1, 2. To proceed, let r1 and r2 be estimated 1 2 and ζN . That is, for each ﬁxed θ, let by their corresponding empirical measures ζN i ζN (θ)

N 1 θ = I N i=1 {wk =1,

ui being used}

for i = 1, 2.

(18)

Then, the parameter estimate is updated by 1 2 θN +1 = θN − εN H(θN , ζN (θN ), ζN (θN )),

(19)

where εN > 0 is the step size. Typical selection of εN includes small constant step size or εN = Nβα for some β > 0 and 0 < α < 1. In what follows, for deﬁniteness, we work with εj = ∞ a sequence of decreasing step sizes satisfying εN ≥ 0, εN → 0 as N → ∞ and j

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS

259

throughout the rest of the section. Convergence of the algorithm (19) can be established by the ODE approach: Consider the limiting diﬀerential equation dθ = −H(θ, r1 (θ), r2 (θ)). dt

(20)

Assumption C For each ﬁxed θ, the sequence {wkθ } is a stationary φ-mixing sequence with probabilities P θ (·) that is continuous with respect to θ. The ODE (20) has a unique asymptotically stable point (θtrue , r1true , r2true ). Remark 3 The shapes of the functions in Figure 4 indicate that θtrue is the unique equilibrium point of (20) and the function satisﬁes (θ − θtrue )H(θ, r1 (θ), r2 (θ)) > 0, θ = θtrue . The global asymptotic stability of the equilibrium point θtrue of (20) can be established since V = (θ − θtrue )2 is a radially unbounded strict Lyapunov function of the equilibrium point: dV = −2(θ − θtrue )H(θ, r1 (θ), r2 (θ)) < 0, dt

θ = θtrue .

Thus, Assumption C is veriﬁed. Theorem 3 Under Assumption C, the sequence of iterates {θN } deﬁned in (19) is convergent in that θN → θtrue w.p.1 as N → ∞. Proof In view of the forms of H(·) given in (15) and (16), it is easily veriﬁed that H(·) is continuous with respect to its arguments. Deﬁne interpolations of the iterates as tN =

N

εi ,

θ0 (t) = θN ,

[tN , tN +1 ),

θN (t) = θ0 (t + tN ),

m(t) = max{N ; tN ≤ t}.

i=1

Then it can be veriﬁed that {θN (·)} is uniformly bounded and equicontinuos in the extended sense (see [15, p. 102]). Thus we can extract a convergent subsequence. Do so and still denote the subsequence by {θN (·)} for notational simplicity. We proceed to characterize the limit process. To this end, observe that the w.p.1 limit of θN (·) should coincide with the weak limit (in the sense of weak convergence of probability measures) of θN (·), which would be easier to obtain. Thus we need only characterize the limit process through its weak limit. Denote the index set χ = { : t ≤ tm < tm+1 −1 ≤ t + s}. Let ∆N be a sequence a positive real numbers satisfying ∆N → 0 and select an increasing sequence {m (N )} such that m(tN + t) = m1 (N ) < m2 (N ) < · · · ≤ m(tN + t + s) − 1, and that for m(tN + t) ≤ m ≤ m+1 ≤ m(tN + t + s) − 1, 1 m+1 (N )−1 εj → 1 as N → ∞. In what follows, denote the m (n) by m for simplicity. For j=m (N ) ∆N any bounded and continuous function ρ(·), any positive integer κ, any 0 < ti ≤ t with i ≤ κ, we have Eρ(θN (ti ), i ≤ κ)[θN (t + s) − θN (t)] m(tN +t+s)−1 εj H(θj , ζj1 (θj ), ζj2 (θj )) = Eρ(θN (ti ), i ≤ κ) j=m(tN +t)

= Eρ(θN (ti ), i ≤ κ)

m+1 −1

Em εj H(θj , ζj1 (θj ), ζj2 (θj )),

(21)

∈χ j=m

where Em denotes the conditional expectation with respect to Fm , the σ-algebra generated by past data up to m . The continuity of H(·) with respect to its arguments, and the weak continuity of ζji (θ) with respect to θ, then yield that the limit on the last line of (21) can be

LE YI WANG · G. GEORGE YIN

260 replaced by Eρ(θN (ti ), i ≤ κ)

m+1 −1

Em εj H(θm , ζj1 (θm ), ζj2 (θm ))

∈χ j=m

= Eρ(θN (ti ), i ≤ κ)

∆N

∈χ

1 ∆N

m+1 −1

εj Em H(θN (u), ζj1 (θN (u)), ζj2 (θN (u))),

j=m

for tm ≤ u < tm+1 . Approximating θN (u) by a ﬁnite-valued process as in [15, p.168 and p.257], and noting m+1 −1 H(θ, ζj1 (θ), ζj2 (θ)) → H(θ, r1 (θ), r2 (θ)) (m+1 − m ) j=m

for each ﬁxed θ by the well-known Glivenko-Cantelli Theorem (since the φ-mixing property implies the ergodicity) enables us to conclude that θ(·), the limit of θN (·), satisﬁes

t

Eρ(θ(ti ), i ≤ κ)[θ(t + s) − θ(t) −

H(θ(u), r1 (θ(u)), r2 (θ(u)))du] = 0. 0

That is, the limit satisﬁes the ODE (20). Finally, θN → θtrue follows from a stability argument. The details are omitted for brevity. The reader is referred to [15] for further information. Figure 5 shows simulation results of the algorithm (19), under the conditions that the noise is normal distributed with zero mean and variance 625, C = 100, u1 = 1, u2 = 10. The step size is chosen as εN = N10 0.3 . The initial estimate is selected as θ0 = 140. Parameter Estimation with Unkknown Error Probability of Communication Channels

140 135 130

Estimates

125 120 115 110 105 100 0

100

200

300

400

500

600

700

800

900

1000

Time Steps

Figure 5 Stochastic approximation algorithms with step size εN =

10 N 0.3

Remark 4 Here the estimation is obtained by use of empirical distributions of the process related to ri (θ). An alternative approach is to estimate Π , the transition probability matrix of the communication channel, using a Wonham ﬁltering technique. Due to the page limitation, the details are omitted; references on Wonham ﬁltering can be found in [16, Chapter 8].

4 Conclusions Communication channels have signiﬁcant impact on identiﬁcation accuracy. This paper introduces the concept of CR ratios to investigate this impact from a complexity viewpoint that the CR lower bounds increase when communication errors increase. Relation of the CR ratio to the Shannon channel capacity is discussed. Algorithms are presented to identify system

INFORMATION CHARACTERIZATION OF COMMUNICATION CHANNELS

261

parameters when the error probability of communication channels is unknown. Generalization of these ﬁndings to general quantization schemes and other communication channel models remain open issues at present. References [1] L. Y. Wang and G. Yin, Asymptotically eﬃcient parameter estimation using quantized output observations, Automatica, to appear. [2] L. Y. Wang, G. Yin, and J. F. Zhang, System identiﬁcation using quantized data, in SYSID 2006, Newcastle, Australia, March 2006. [3] L. Y. Wang, J. F. Zhang, and G. Yin, System identiﬁcation using binary sensors, IEEE Trans. Automat. Control, 2003, 48: 1892–1907. [4] L. Y. Wang, G. Yin, and J. F. Zhang, Joint identiﬁcation of plant rational models and noise distribution functions using binary-valued observations, Automatica, 2006, 42: 535–547. [5] Y. L. Zhao, L. Y. Wang, G. Yin, and J. F. Zhang, Identiﬁcation of Wiener models with binary valued output observations, Automatica, to appear. [6] T. Wigren, Adaptive ﬁltering using quantized output measurements, IEEE Trans. Signal Processing, 1998, 46: 3423–3426. [7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1992. [8] K. Sayood, Introduction to Data Compression (2nd Ed.), Morgan Kaufmann, 2000. [9] E. R. Caianiello and A. de Luca, Decision equation for binary systems: application to neural behavior, Kybernetik, 1966, 3: 33–40. [10] J. Sun, Y. Kim, and L.Y. Wang, HEGO signal processing and strategy adaptation for improved performance in lean burn engines with a lean NOx trap, Int. J. of Adaptive Control and Signal Processing, 2004, 18(2): 145–166. [11] X. Liu and A. Goldsmith, Wireless communication tradeoﬀs in distributed control, in 42nd IEEE Conference on Decision and Control, 2003, 1: 688–694. [12] A. M. Sayeed, A signal modeling framework for integrated design of sensor networks, in IEEE Workshop Statistical Signal Processing, 28 Sept. – 1 Oct., 2003. [13] L. Xiao, M. Johansson, H. Hindi, S. Boyd, and A. Goldsmith, Joint optimization of communication rates and linear systems, IEEE Trans. Automatic Control, 2003, 48(1): 148–153. [14] H.-F. Chen, Stochastic Approximation and Its Applications, Kluwer Academic, Dordrecht, Netherlands, 2002. [15] H.J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications (2nd Ed.), Springer-Verlag, New York, 2003. [16] G. Yin and Q. Zhang, Discrete-time Markov Chains: Two-Time-Scale Methods and Applications, Springer, New York, 2005.

Recommend Documents

Spatio-temporal channel characterization: theoretical ... - Springer Link