This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
1
Convergence Analysis of the Variance in Gaussian Belief Propagation Qinliang Su and Yik-Chung Wu
Abstract It is known that Gaussian belief propagation (BP) is a low-complexity algorithm for (approximately) computing the marginal distribution of a high dimensional Gaussian distribution. However, in loopy factor graph, it is important to determine whether Gaussian BP converges. In general, the convergence conditions for Gaussian BP variances and means are not necessarily the same, and this paper focuses on the convergence condition of Gaussian BP variances. In particular, by describing the messagepassing process of Gaussian BP as a set of updating functions, the necessary and sufficient convergence conditions of Gaussian BP variances are derived under both synchronous and asynchronous schedulings, with the converged variances proved to be independent of the initialization as long as it is chosen from the proposed set. The necessary and sufficient convergence condition is further expressed in the form of a semi-definite programming (SDP) optimization problem, thus can be verified more efficiently compared to the existing convergence condition based on computation tree. The relationship between the proposed convergence condition and the existing one based on computation tree is also established analytically. Numerical examples are presented to corroborate the established theories.
Index Terms Gaussian belief propagation, loopy belief propagation, graphical model, factor graph, message passing, sum-product algorithm, convergence.
Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to
[email protected]. This work was supported in part by the HKU seed Funding Programme, Project No. 201311159066. The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, (e-mail:
[email protected],
[email protected]).
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
2
I. I NTRODUCTION In signal processing and machine learning, many problems eventually come to the issue of computing the marginal mean and variance of an individual random variable from a high dimensional joint Gaussian probability density function (PDF). A direct way of marginalization involves the computation of the inverse of the precision matrix in the joint Gaussian PDF. The inverse operation is known to be computationally expensive for a high dimensional matrix, and would introduce heavy communication overhead when carried out in distributed scenarios. By representing the joint PDF with a factor graph [1], Gaussian BP provides an alternative to (approximately) calculate the marginal mean and variance for each individual random variable by passing messages between neighboring nodes in the factor graph [2]–[5]. It is known that if the factor graph is of tree structure, the belief means and variances calculated with Gaussian BP both converge to the true marginal means and variances, respectively [1]. For a factor graph with loops, if the belief means and variances in Gaussian BP converge, the true marginal means and approximate marginal variances are obtained [6]. Recently, a novel message-passing algorithm has been proposed in [7] for inference in Gaussian graphical model by choosing a special set of nodes to break loops in the graph. Although this algorithm computes the true marginal means and improves the accuracy of variances, it still needs to use Gaussian BP as an underlying inference algorithm. Thus, in this paper, we focus on the message-passing algorithm of Gaussian BP only. With the ability to provide the true marginal means upon convergence, Gaussian BP has been successfully applied in low-complexity detection and estimation problems arising in communication systems [8]–[10], fast solver for large sparse linear systems [11], [12], sparse Bayesian learning [13], estimation in Gaussian graphical model [14], etc. In addition, the distributed property inherited from message passing algorithms is particularly attractive for applications requiring distributed implementation, such as distributed beamforming [15], inter-cell interference mitigation [16], distributed synchronization and localization in wireless sensor networks [17]– [19], distributed energy efficient self-deployment in mobile sensor networks [20], distributed rate control in Ad Hoc networks [21], and distributed network utility maximization [22]. Moreover, Gaussian BP is also exploited to provide approximate marginal variances for large-scale sparse Bayesian learning in a computationally efficient way [23]. However, Gaussian BP only works under the prerequisite that the belief means and variances
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
3
calculated from the updating messages do converge. So far, several sufficient convergence conditions have been proposed, which can guarantee the means and variances converge simultaneously [6], [24], [25]. However, in general, the belief means and variances do not necessarily converge under the same condition. It is reported in [24] that if the variances converge, the convergence of the means can always be observed when suitable damping is imposed. Thus, it is important to ensure the variances of Gaussian BP to converge in the first place. In the pioneering work [24], the message-passing procedure of Gaussian BP in a loopy factor graph is equivalently translated into that in a loop-free computation tree [26]. Based on the computation tree, an almost necessary and sufficient convergence condition of variances is derived. It is proved that if the spectral radius of the infinite dimensional precision matrix of the computation tree is smaller than one, the variances converge; and if this spectral radius is larger than one, the Gaussian BP becomes ill-posed. However, this condition is only ‘almost necessary and sufficient’ since it does not cover the scenario when the spectral radius is equal to one. More crucially, this convergence condition requires the evaluation of the spectral radius of an infinite dimensional matrix, which is almost impossible to be calculated in practice. In this paper, with the fact that the messages in Gaussian BP can be represented by linear and quadratic parameters, the message-passing process of Gaussian BP is described as a set of updating functions of parameters. Based on the monotonically non-decreasing and concave properties of the updating functions, the necessary and sufficient convergence condition of messages’ quadratic parameters is derived first. Then, with the relation between BP messages’ quadratic parameters and belief variances, the necessary and sufficient convergence condition of belief variances is developed under both synchronous and asynchronous schedulings. The initialization set under which the variances are guaranteed to converge is proposed as well. The convergence condition derived in this paper is proved to be equivalent to a semi-definite programming (SDP) problem, thus can be easily verified. Furthermore, the relationship between the proposed convergence condition and the existing one based on computation tree is also established. Our result fills in the missing part of the convergence condition in [24] on the case when the spectral radius of the infinite dimensional matrix is equal to one. The rest of this paper is organized as follows. Gaussian BP is reviewed in Section II. Section III analyzes the updating process of Gaussian BP messages, followed by the necessary and sufficient convergence condition of quadratic parameters in the messages in Section IV. The July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
4
derivation of the necessary and sufficient convergence condition of belief variances is presented in Section V. Relationship between the condition proposed in this paper and the existing one based on computation tree is established in Section VI. Numerical examples are presented in Section VII, followed by conclusions in Section VIII. The following notations are used throughout this paper. For two vectors, x1 ≥ x2 and x1 > x2 mean the inequalities are held in all corresponding elements. Notations λ(G) and λmax (G) represent any eigenvalue and the maximal eigenvalue of matrix G, respectively. Notation ρ(G) means the spectral radius of G. Notations [·]i and [·]ij represent the i-th and (i, j)-th element of a vector and matrix, respectively. II. G AUSSIAN B ELIEF P ROPAGATION In general, a Gaussian PDF can be written as { } 1 T T f (x) ∝ exp − x Px + h x , 2
(1)
where x = [x1 , x2 , · · · , xN ]T with N being the number of random variables; P ≻ 0 is the precision matrix with pij being its (i, j)-th element; and h = [h1 , h2 , · · · , hN ]T is the linear coefficient vector. In many signal processing applications, we want to find the marginalized PDF ∫ p(xi ) = f (x)dx1 · · · dxi−1 dxi+1 · · · dxN . For a Gaussian f (x), this can be done by completing } { the square inside the exponent of (1) as f (x) ∝ exp − 12 (x − P−1 h)T P(x − P−1 h) , and then the marginalized PDF is obtained as
{ } (xi − µi )2 p(xi ) ∝ exp − , 2σi2
(2)
with the mean µi = [P−1 h]i and the variance σi2 = [P−1 ]ii . However, the required matrix inverse is computationally expensive for a large dimensional P, and introduces heavy communication overhead to the network when the information pij is located distributedly. N N N ∏ ∏ ∏ On the other hand, the Gaussian PDF in (1) can be expanded as f (x) ∝ fi (xi ) fjk (xj , xk ), i=1 j=1 k=j+1 } { pii 2 where fi (xi ) = exp − 2 xi + hi xi and fjk (xj , xk ) = exp {−pjk xj xk }. Based on this expansion, a factor graph1 can be constructed by connecting each variable xi with its associated factors fi (xi ) and fij (xi , xj ). It is known that Gaussian BP can be applied on this factor graph by passing 1
The factor graph is assumed to be connected in this paper.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
5
messages among neighboring nodes to obtain the true marginal mean and approximate marginal variance for each variable. In Gaussian BP, the departing and arriving messages corresponding to variables i and j are updated as mdi→j (xi , t) ∝ fi (xi ) + 1) ∝
mak→i (xi , t),
(3)
k∈N (i)\j
∫ mai→j (xj , t
∏
fij (xi , xj ) mdi→j (xi , t) dxi ,
(4)
where (i, j) ∈ E with E , {(i, j)|pij ̸= 0, for i, j ∈ V} being the set of index pairs of any two connected variables and V , {1, 2, · · · , N } being the set of indices of all variables; t is the time index; N (i) is the set of indices of neighboring variable nodes of node i, and N (i)\j is the set N (i) but excluding the node j. After obtaining the messages mak→i (xi , t), the belief at variable node i is equal to
∏
bi (xi , t) ∝ fi (xi )
mak→i (xi , t).
(5)
k∈N (i)
It has been proved that bi (xi , t) always converges to p(xi ) in (2) exactly if the factor graph is of tree structure [1], [6]. For loopy factor graph, if bi (xi , t) converges, the mean of bi (xi , t) will converge to the true mean µi , while the converged variance is an approximation of σi2 [6]. { va (t) } a Assume the arriving message is in Gaussian form of mai→j (xj , t) ∝ exp − i→j2 x2j + βi→j (t)xj , a a where vi→j (t) and βi→j (t) are the arriving precision and arriving linear coefficient, respectively. } { vd (t) d (t)xi , where Inserting it into (3), we obtain mdi→j (xi , t) ∝ exp − i→j2 x2i + βi→j ∑ a d vi→j (t) = pii + vk→i (t), (6) k∈N (i)\j d βi→j (t) = hi +
∑
a βk→i (t)
(7)
k∈N (i)\j
are the departing precision and linear coefficient, respectively. Furthermore, substituting the departing message mdi→j (xi , t) into (4), we obtain { } ∫ ( )2 2 d d d pij vi→j (t) βi→j (t) − pij xj pij βi→j (t) 2 mai→j (xj , t+1) ∝ exp × exp − x − x − x dxi . i j d d d 2 2vi→j (t) j vi→j (t) vi→j (t) (8)
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
6 d If vi→j (t) > 0, the integration equals to a constant, and thus mai→j (xj , t+1) ∝ exp
Therefore,
a vi→j (t
+ 1) and
{
p2ij d 2vi→j
x2 − (t) j
d pij βi→j (t) d vi→j (t)
a βi→j (t
+ 1) are updated as 2 − dpij , vi→j (t) a vi→j (t + 1) = not defined, d − pijdβi→j (t) , vi→j (t) a βi→j (t + 1) = not defined,
d if vi→j (t) > 0
(9)
otherwise, d if vi→j (t) > 0
(10)
otherwise,
d where ‘not defined’ arises since under the case of vi→j (t) ≤ 0, the integration in (8) becomes a (t), the infinite and the message mai→j (xj , t + 1) loses its Gaussian form. After obtaining vk→i
variance of belief at each iteration is computed as σi2 (t) =
pii +
1
∑
k∈N (i)
a (t) vk→i
.
(11)
From (9) and (10), it can be seen that the messages of Gaussian BP maintain the Gaussian d form only when vi→j (t) > 0 for all t ≥ 0. Therefore, we have the following lemma. d Lemma 1. The messages of Gaussian BP are always in Gaussian form if and only if vi→j (t) > 0
for all t ≥ 0. III. A NALYSIS OF THE M ESSAGE -PASSING P ROCESS a In this section, we first analyze the updating process of vi→j (t), and then derive the condition
to guarantee the BP messages being maintained at Gaussian form. a A. Updating Function of vi→j (t) and Its Properties
Under the Gaussian form of messages, substitutting (6) into (9) gives a vi→j (t
+ 1) = −
pii +
∑
p2ij a k∈N (i)\j vk→i (t)
.
(12)
By writing (12) into a vector form, we obtain va (t + 1) = g(va (t)),
(13)
where g(·) is a vector-valued function containing components gij (·) with (i, j) ∈ E arranged in ascending order first on j and then on i, and gij (·) is defined as gij (w) , −
pii +
∑
p2ij k∈N (i)\j
wki
;
July 31, 2014
(14) DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
} xj .
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
7 a va (t) and w are vectors containing elements vi→j (t) and wij , respectively, both with (i, j) ∈ E
arranged in ascending order first on j and then on i. Moreover, define the set { } ∑ W , w pii + wki > 0, ∀ (i, j) ∈ E .
(15)
k∈N (i)\j
Now, we have the following proposition about g(·) and W. Proposition 1. The following claims hold: P1) For any w1 ∈ W, if w1 ≤ w2 , then w2 ∈ W; P2) For any w1 ∈ W, if w1 ≤ w2 , then g(w1 ) ≤ g(w2 ); P3) gij (w) is a concave function with respect to w ∈ W. Proof: Consider two vectors w ¯ and w ˆ in W, which contain elements w¯ki and wˆki with (k, i) ∈ E arranged in ascending order first on k and then on i. For any w ¯ ∈ W, according to ∑ the definition of W in (15), we have pii + k∈N (i)\j w¯ki > 0. Then, if w ¯ ≤ w, ˆ it can be easily ∑ seen that pii + k∈N (i)\j wˆki > 0 as well. Thus, we have w ˆ ∈ W. The first-order derivative of gij (w) with respect to wki for k ∈ N (i)\j is computed to be p2ij ∂gij > 0. (16) = ∑ ∂wki (pii + k∈N (i)\j wki )2 Thus, gij (w) is a continuous and strictly increasing function with respect to the components wki for k ∈ N (i)\j with w ∈ W. Hence, we have if w1 ≤ w2 , then g(w1 ) ≤ g(w2 ). p2
To prove the third property, consider the simple function − piiij+r under the domain of r > −pii p2
2p2
ij first. It can be easily derived that the second order derivative of − piiij+r is − (pii +r) 3 < 0, where the
p2
inequality holds due to pii + r > 0. Thus, − piiij+r is a concave function with respect to r > −pii . p2
p2
Since gij (w) = − pii +∑ ij is the composition of the concave function − piiij+r and the k∈N (i)\j wki ∑ affine mapping k∈N (i)\j wki , the composition function gij (w) is also concave with respect to } { ∑ ∑ w pii + k∈N (i)\j wki > 0, ∀ (i, j) ∈ E as k∈N (i)\j wki > −pii [28, p. 79]. Due to W , ∑ defined in (15), then w ∈ W implies that k∈N (i)\j wki > −pii . Thus, we can infer that gij (w) is a concave function with respect to w ∈ W. Notice that convexity is also exploited in [29] to derive the convergence condition for a general min-sum message-passing algorithm, which reduces to Gaussian BP when it is applied to the particular quadratic function
1 T x Px 2
− hT x. The convexity in [29] means that the quadratic
function can be decomposed into summation of convex functions, which is different from the convexity in the updating function gij (w) here. July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
8
B. Condition to maintain the Gaussian Form of Messages First, define the following set2 S1 , {w ∈ W |w ≤ g(w) } .
(17)
( ) With notations g(t) (w) , g g(t−1) (w) and g(0) (w) , w, the following proposition can be established. Proposition 2. The set S1 has the following properties: P4) S1 is a convex set; P5) If s ∈ S1 , then s < 0; P6) If s ∈ S1 , then g(t) (s) ∈ S1 and g(t) (s) ≤ g(t+1) (s) for all t ≥ 0. Proof: As gij (w) is a concave function for w ∈ W according to P3), thus gij (w)−wij is also concave for w ∈ W. The set of w ∈ W which satisfies the condition of gij (w)−wij ≥ 0 is a convex set [28]. Thus, S1 = {w |w ≤ g(w) with w ∈ W} = {gij (w) − wij ≥ 0 for all (i, j) ∈ E with w ∈ W} is also a convex set. If s ∈ S1 , we have s ≤ g(s) and s ∈ W. According to the definition of W in (15), pii + ∑ k∈N (i)\j ski > 0. Putting this fact into the definition of gij (·) in (14), it is obvious that g(s) < 0. Therefore, we have the relation that s ≤ g(s) < 0 for all s ∈ S1 . Finally, if s ∈ S1 , we have s ≤ g(s) and s ∈ W. Hence, g(s) ∈ W according to the P1). Applying g(·) on both sides of s ≤ g(s) and using P2), we obtain g(s) ≤ g(2) (s). Furthermore, since g(s) ∈ W and g(s) ≤ g(2) (s), we also have g(s) ∈ S1 . By induction, we can prove in general that g(t) (s) ∈ S1 and g(t) (s) ≤ g(t+1) (s) for all t ≥ 0. Now, we give the following theorem. Theorem 1. There exists at least one initialization va (0) such that the messages of Gaussian BP are maintained at Gaussian form for all t ≥ 0 if and only if S1 ̸= ∅. Proof: Sufficient Condition: 2
The simpler symbol S is reserved for later use.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
9
If S1 ̸= ∅, by choosing an initialization va (0) ∈ S1 and applying P6), it can be inferred that va (t) ∈ S1 for all t ≥ 0. Moreover, according to the definitions of W in (15) and S1 in (17), we ∑ a have S1 ⊆ W, and thereby va (t) ∈ W for all t ≥ 0, or equivalently pii + k∈N (i)\j vk→i (t) > d 0 for all (i, j) ∈ E according to the definition of set W in (15). Due to vi→j (t) = pii + ∑ a d k∈N (i)\j vk→i (t) as given in (6), we obtain vi→j (t) > 0 for all (i, j) ∈ E and t ≥ 0. According
to Lemma 1, it can be inferred that the messages are maintained at Gaussian form for all t ≥ 0. Necessary Condition: We prove this necessary condition by contradiction. When S1 = ∅, suppose that there exists a v ¯a (0) such that the messages are maintained at Gaussian form for all t ≥ 0. According to Lemma ∑ d a (t) > 0 for all (i, j) ∈ E, or equivalently 1, we can infer that v¯i→j (t) = pii + k→N (i)\j v¯k→i v ¯a (t) , g(t) (¯ va (0)) ∈ W for all t ≥ 0 from the definition of W in (15). Now, choose another initialization va (0) satisfying both v ¯a (0) ≤ va (0) and va (0) ≥ 0. Due to v ¯a (0) ∈ W and v ¯a (0) ≤ va (0), by using P2), we have g (¯ va (0)) ≤ g (va (0)), that is, v ¯a (1) ≤ va (1). Furthermore, substituting va (0) ≥ 0 into (14) gives va (1) ≤ 0, and thereby va (1) ≤ va (0). Combining with v ¯a (1) ≤ va (1) leads to v ¯a (1) ≤ va (1) ≤ va (0). Due to the assumption v ¯a (1) ∈ W, by applying P2) to v ¯a (1) ≤ va (1) ≤ va (0), we obtain v ¯a (2) ≤ va (2) ≤ va (1). Combining with the fact va (1) ≤ 0 as proved above, we have v ¯a (2) ≤ va (2) ≤ va (1) ≤ 0. By induction, it can be derived that v ¯a (t + 1) ≤ va (t + 1) ≤ va (t) ≤ 0, ∀ t ≥ 1.
(18)
Then, from the definition of W in (15), the assumption v ¯a (t) ∈ W is equivalent to pii + ∑ a a ¯k→i (t) > 0 for all (i, j) ∈ E, where v¯k→i (t) are the elements of v ¯a (t) with (k, i) ∈ E k∈N (i)\j v ∑ a arranged in the same order as va (t). By rearranging the terms in pii + k∈N (i)\j v¯k→i (t) > 0, ∑ a a we obtain v¯γ→i (t) > −pii − k∈N (i)\j,γ v¯k→i (t). Applying v ¯a (t) ≤ 0 shown in (18) into this ∑ a a a inequality, it can be inferred that v¯γ→i (t) > −pii − k∈N (i)\j,γ v¯k→i (t) ≥ −pii , that is, v¯γ→i (t) > −pii for all (γ, i) ∈ E. Combining with va (t) ≥ v ¯a (t) shown in (18), for all (i, j) ∈ E, we have a (t) > −pjj . vi→j
(19)
It can be seen from (18) and (19) that va (t) is a monotonically non-increasing and lower bounded sequence. Thus, va (t) must converge to a vector va∗ , that is, va∗ = g(va∗ ). On the other hand, substituting (12) into (19) gives − pii +∑
p2ij
> −pjj . Due to ∑ a v ¯a (t) ∈ W, it can be inferred from P1) and (18) that va (t) ∈ W, or equivalently pii + k∈N (i)\j vk→i (t) > a k∈N (i)\j vk→i (t)
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
10
∑
p2
ij a a k∈N (i)\j vk→i (t) > pjj . Since v (t) converges ∑ p2ij a∗ to va∗ , taking the limit on both sides of the inequality gives pii + k∈N (i)\j vk→i . Hence, ≥ pjj ∑ a∗ pii + k∈N (i)\j vk→i > 0. From the definition of W in (15), we have va∗ ∈ W. Combining with
0. Together with the fact pjj > 0, we obtain pii +
va∗ = g(va∗ ), according to the definition of S1 in (17), it is clear that va∗ ∈ S1 . This contradicts with the prerequisite S1 = ∅. Therefore, S1 cannot be ∅. IV. N ECESSARY AND S UFFICIENT C ONVERGENCE C ONDITION
OF
M ESSAGE PARAMETER
va (t) According to Theorem 1, Gaussian form of messages may only be maintained for some va (0). Thus, the choice of initialization va (0) also plays an important role in the convergence of va (t). A. Synchronous Scheduling In this section, we will investigate the convergence condition of va (t) under synchronous scheduling, which means that va (t) is updated as va (t + 1) = g(va (t)). The necessary and sufficient convergence condition under nonnegative initialization set is derived first. Then, the convergence condition is proved to hold under a more general initialization set. Lemma 2. va (t) converges to the same point va∗ ∈ S1 for all va (0) ≥ 0 if and only if S1 ̸= ∅. Proof: See Appendix A. Obviously, the initialization va (0) = 0 used by the convergence condition in [24] is implied by the proposed initialization set va (0) ≥ 0 in Lemma 2. In the following, we show that the initialization set could be further expanded to the set { } (t) A , {w ≥ 0} ∪ {w ≥ w0 |w0 ∈ int(S1 ) } ∪ w ≥ w0 w0 ∈ S1 and w0 = lim g (0) , (20) t→∞
where int(S1 ) means the interior of S1 . Notice that the set A consists of the union of three parts. •
The first part {w ≥ 0} corresponds to the initialization set given in Lemma 2.
•
For the case S1 ̸= ∅ but int(S1 ) = ∅, the second part is empty. Also, according to Lemma 2, we have lim g(t) (0) = va∗ ∈ S1 . Thus, the third part of set A reduces to {w ≥ va∗ }. Due t→∞
to the fact that va∗ ∈ S1 implies va∗ < 0 given by P5), we have {w ≥ 0} ( {w ≥ va∗ }. •
For the case int(S1 ) ̸= ∅, according to (63), we have w0 ≤ va∗ for any w0 ∈ int(S1 ). This implies {w ≥ va∗ } ⊆ {w ≥ w0 |w0 ∈ int(S1 )}. Obviously, in this case, we also have {w ≥ 0} ( {w ≥ w0 |w0 ∈ int(S1 )}.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
11
Therefore, if S1 ̸= ∅, the set A is always larger than the set {w ≥ 0}. Theorem 2. va (t) converges to the same point va∗ for all va (0) ∈ A if and only if S1 ̸= ∅. Proof: The first part {w ≥ 0} of A corresponds to the case covered in Lemma 2, thus has already been established. To prove the sufficiency, we consider the case int (S1 ) ̸= ∅ and the case int(S1 ) = ∅ but S1 ̸= ∅, respectively. For the case int (S1 ) ̸= ∅, it is known from (20) that A = {w ≥ w0 |w0 ∈ int(S1 ) } in this case. To prove the theorem, we will first prove that v ¯a (t) converges to the same point va∗ for all v ¯a (0) ∈ int(S1 ). Then, notice that for any va (0) ∈ {w ≥ w0 |w0 ∈ int(S1 ) }, it can be upper bounded by a point from {w ≥ 0} and lower bounded by a point from int(S1 ). Because both the upper bound and lower bound converge to va∗ , we can easily extend the convergence to the much larger set va (0) ∈ {w ≥ w0 |w0 ∈ int(S1 )}. First, we prove that v ¯a (t) converges to va∗ for all v ¯a (0) ∈ int(S1 ). To do this, we establish the following three facts first. 1) For any v ¯a (0) ∈ int(S1 ), according to P6), we have v ¯a (t) ≤ v ¯a (t + 1) and v ¯a (t) ∈ S1 . With P5), we also have v ¯a (t) < 0. It can be seen that v ¯a (t) is a monotonically non-decreasing and upper bounded sequence, thus v ¯a (t) converges to a fixed point, which is denoted as v ¯a∗ . On the other hand, let va (0) ≥ 0 correspond to a point in the first part of A, due to v ¯a (0) < 0, we have v ¯a (0) < va (0). By using P2), applying g(·) to v ¯a (0) < va (0) for t times gives v ¯a (t) ≤ va (t). Since v ¯a (t) and va (t) converge to v ¯a∗ and va∗ , respectively, taking the limit on both sides of v ¯a (t) ≤ va (t) gives v ¯a∗ ≤ va∗ . Due to v ¯a (0) ∈ int(S1 ), the definition of S1 in (17) implies that v ¯a (0) < g(¯ va (0)), that is, v ¯a (0) < v ¯a (1). Combining with v ¯a (t) ≤ v ¯a (t + 1) established above, we obtain v ¯a (0) < v ¯a∗ . Together with the fact v ¯a∗ ≤ va∗ , we obtain v ¯a (0) < v ¯a∗ ≤ va∗ .
(21)
2) For a v ¯a (0) ∈ int (S1 ), construct a vector v ˆa (0) = λ¯ va (0) + (1 − λ)va∗ ,
(22)
where λ ∈ (0, 1). Since gij (w) is a concave function over w ∈ S1 and {¯ va (0), va∗ } ∈ S1 , we have g(λ¯ va (0) + (1 − λ)va∗ ) ≥ λg(¯ va (0)) + (1 − λ)g(va∗ ), or equivalently g(λ¯ va (0) + (1 − λ)va∗ ) ≥ λ¯ va (1) + (1 − λ)va∗ . According to P2), applying g(·) to this inequality gives July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
12
g(2) (λ¯ va (0) + (1 − λ)va∗ ) ≥ g(λ¯ va (1) + (1 − λ)va∗ ). Because of the concave property of gij (w), it is known that g(λ¯ va (1) + (1 − λ)va∗ ) ≥ λ¯ va (2) + (1 − λ)va∗ . Thus, we have g(2) (λ¯ va (0) + (1 − λ)va∗ ) ≥ λ¯ va (2) + (1 − λ)va∗ . By induction, it can be inferred that g(t) (λ¯ va (0)+(1−λ)va∗ ) ≥ λ¯ va (t)+(1−λ)va∗ for all t ≥ 0. With v ˆa (0) = λ¯ va (0)+(1−λ)va∗ given by (22), we obtain v ˆa (t) ≥ λ¯ va (t) + (1 − λ)va∗ ,
(23)
where v ˆa (t) , g(t) (ˆ va (0)). Since {¯ va (0), va∗ } ∈ S1 and S1 is a convex set as indicated in P4), we have v ˆa (0) = λ¯ va (0) + (1 − λ)va∗ ∈ S1 . According to P6), v ˆa (t) , g(t) (ˆ va (0)) is a monotonically non-decreasing sequence. Furthermore, according to P5), v ˆa (t) is also upper bounded by 0. Thus, v ˆa (t) converges to a fixed point, which is denoted as v ˆa∗ . Taking limits on both sides of (23) gives v ˆa∗ ≥ λ¯ va∗ + (1 − λ)va∗ =v ¯a∗ + (1 − λ)(va∗ − v ¯a∗ ).
(24)
3) From v ¯a (0) < v ¯a∗ ≤ va∗ given by (21), we have v ¯a∗ − v ¯a (0) > 0 and va∗ − v ¯a (0) > 0. If λ ∈ (0, 1) and is close to 1, the relation (1 − λ)(va∗ − v ¯a (0)) < v ¯a∗ − v ¯a (0) always holds, that is, ∃ λ ∈ (0, 1), s.t. (1 − λ)(va∗ − v ¯a (0)) < v ¯a∗ − v ¯a (0).
(25)
Rearranging the terms in (25) gives ∃ λ ∈ (0, 1), s.t. λ¯ va (0) + (1 − λ)va∗ < v ¯a∗ . Due to λ¯ va (0) + (1 − λ)va∗ = v ˆa (0) given in (22), thus (25) implies that there always exists a λ ∈ (0, 1) such that v ˆa (0) < v ¯a∗ . Now, applying g(·) to v ˆa (0) < v ¯a∗ for t times, from P2), we obtain g(t) (ˆ va (0)) ≤ g(t) (¯ va∗ ), or equivalently v ˆa (t) ≤ v ¯a∗ . Taking the limit on both sides of v ˆa (t) ≤ v ¯a∗ , it can be inferred that ∃ λ ∈ (0, 1), s.t. v ˆa∗ ≤ v ¯a∗ .
(26)
Now, we make use of (21), (24) and (26) to prove that the convergence limit v ¯a∗ is identical to the convergence limit va∗ . Combining (21) and (26) gives ∃ λ ∈ (0, 1), s.t. v ˆa∗ ≤ v ¯a∗ ≤ va∗ .
(27)
From the second inequality of (27), we have (1 − λ)(va∗ − v ¯a∗ ) ≥ 0. Substituting this relation into (24), we can infer that v ˆa∗ ≥ v ¯a∗ . Comparing this result to the first inequality of (27), we July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
13
obtain v ˆa∗ = v ¯a∗ . Due to v ˆa∗ = v ¯a∗ , (24) becomes (1 − λ)(va∗ − v ¯a∗ ) ≤ 0. For λ ∈ (0, 1), the relation is equivalent to va∗ ≤ v ¯a∗ . Combining it with the second inequality in (27), we obtain v ¯a∗ = va∗ .
(28)
Next, we will extend the initialization set to the whole set A = {w ≥ w0 |w0 ∈ int(S1 )}. For any v ¯a (0) ∈ A, we can always find a vla (0) ∈ int(S1 ) and a vua (0) ≥ 0 such that vla (0) ≤ v ¯a (0) ≤ vua (0). According to P2), applying g(·) to this inequality repeatedly gives g(t) (vla (0)) ≤ g(t) (¯ va (0)) ≤ g(t) (vua (0)). Since vla (t) = g(t) (vla (0)) and vua (t) = g(t) (vua (0)) both converge to va∗ , thus v ¯a (t) converges to va∗ , too. Now, consider the case int(S1 ) = ∅ but S1 ̸= ∅. In this case, according to the discussion after (20), A = {w ≥ va∗ }. For any v ¯a (0) ∈ A, we can always find a vua (0) ∈ {w ≥ 0} such that va∗ ≤ v ¯a (0) ≤ vua (0). Applying g(·) to this inequality for t times, from P2), we obtain va∗ ≤ g(t) (¯ va (0)) ≤ g(t) (vua (0)). Since vua (t) converges to va∗ , thus v ¯a (t) also converges to va∗ . On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages va (t) cannot converge in this case. B. Asynchronous Scheduling In this section, convergence conditions under asynchronous scheduling schemes will be investigated. Modified from the synchronous updating equation in (12), arriving precision in asynchronous scheduling is updated as a vi→j (t + 1) = −
pii +
∑
p2ij , a a k∈N (i)\j vk→i (τk→i (t))
∀ t ∈ Ti→j
(29)
a a where Ti→j is the set of time instants at which vi→j (t) are updated; 0 ≤ τk→i (t) ≤ t is the last a updated time instant of vk→i (t) at time t. In this paper, we only consider the totally asynchronous
scheduling defined as follows. a (t) Definition 1. (Totally Asynchronous Scheduling) [27]: The sets Ti→j are infinite, and τi→j a a (t) = ∞ for all (i, j) ∈ E. (t) ≤ t and lim τi→j satisfies 0 ≤ τi→j t→∞
In order to prove the convergence of va (t) under asynchronous scheduling given by Definition 1, we need to find a sequence of nonempty sets {U(m)} for m ≥ 0 such that the following two sufficient conditions are satisfied [27]: July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
14
•
Synchronous Convergence Condition: the sequence of sets satisfy 1) U(m + 1) ⊆ U(m); 2) g(w) ∈ U (m + 1) for ∀ w ∈ U (m); 3) U(m) converges to a fixed point;
•
Box Condition: U(m) can be written as product of subsets of individual components Ui→j (m) for all (i, j) ∈ E. While the formal proof of asynchronous convergence under these conditions are provided in
a [27], we give a brief intuitive explanation below. Suppose when t ≥ tm , vi→j (t) ∈ Ui→j (m)
for all (i, j) ∈ E, or expressed using the Box Condition va (t) ∈ U(m) for t ≥ tm . Now, a a (t) = ∞, we can consider the updating of an individual component vi→j (t). Due to lim τk→i
always find a
(i→j) tm+1
> tm such that when t ≥
(i→j) tm+1 ,
we have
t→∞ a (t) τk→i
≥ tm and thereby
a a a a (t)) into function gij (·) in (τk→i (t)) ∈ Uk→i (m) for all k ∈ N (i)\j. Putting vk→i (τk→i vk→i a (29) and applying condition 2) in Synchronous Convergence Condition, we have vi→j (t + 1) ∈ (i→j)
(i→j)
Ui→j (m + 1) for t ≥ tm+1 . Thus, by choosing tm+1 = max tm+1 , we have when t ≥ tm+1 , (i,j)∈E
a vi→j (t) ∈ Ui→j (m + 1) for all (i, j) ∈ E. Further due to conditions 1) and 3) in Synchronous
Convergence Condition, we know that va (t) will gradually converge to a fixed point. Theorem 3. va (t) converges for any va (0) ∈ A and all choices of schedulings if and only if S1 ̸= ∅. Proof: We first prove the sufficient condition. Since S1 ̸= ∅, as given in the proof of Theorem 2, for any va (0) ∈ A, there always exist an element v ¯a (0) ∈ int (S1 ) ∪ {va∗ } and v ˆa (0) ≥ 0 such that v ¯a (0) ≤ va (0) ≤ v ˆa (0). Construct a sequence of sets as U(m) = {w |¯ va (m) ≤ w ≤ v ˆa (m) } .
(30)
Since v ¯a (0) ∈ int (S1 ) ∪ {va∗ }, we have v ¯a (m) ≤ v ¯a (m + 1) according to P6). Further, for any v ˆa (0) ≥ 0, putting v ˆa (0) into (14), we have v ˆa (1) < 0, thereby v ˆa (1) < v ˆa (0). Since v ˆa (m) converges according to Theorem 2, then v ˆa (m) ∈ W for all m ≥ 0. Thus, according to P2), by applying g(·) to v ˆa (1) < v ˆa (0) and by induction, we have v ˆa (m + 1) ≤ v ˆa (m). By exploiting v ¯a (m) ≤ v ¯a (m + 1), v ˆa (m + 1) ≤ v ˆa (m) and the definition of U(m) in (30), we have U(m + 1) ⊆ U (m).
(31)
Now, for any w ∈ U (m), by applying g(·) to v ¯a (m) ≤ w ≤ v ˆa (m), we obtain v ¯a (m + 1) ≤ g(w) ≤ v ˆa (m + 1). Thus, from the definition of U(m) in (30), we have the relation g (w) ∈ U (m + 1), ∀ w ∈ U (m). July 31, 2014
(32) DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
15
Furthermore, according to Theorem 2, we have lim v ¯a (m) = lim v ˆa (m) = va∗ , hence U(m) m→∞
m→∞
will converge to the fixed point va∗ . Thus, the set U(m) satisfies the Synchronous Convergence Condition [27]. Furthermore, according to the definition of U(m) in (30), va (m) ∈ U (m) means a a a a v¯i→j (m) ≤ vi→j (m) ≤ vˆi→j (m) for all (i, j) ∈ E, that is, vi→j (m) ∈ Ui→j (m) with Ui→j (m) = a a {wij |¯ vi→j (m) ≤ wij ≤ vˆi→j (m)}. Thus, U(m) can be represented in form of product of subsets
of individual components Ui→j (m) for all (i, j) ∈ E, and thereby the Box Condition is satisfied. Hence, the sufficient condition is proved. On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages va (t) cannot converge in this case. From Theorems 2 and 3, it can be seen that the convergence condition of va (t) under asynchronous scheduling is the same as that under synchronous scheduling. V. N ECESSARY AND S UFFICIENT C ONVERGENCE C ONDITION OF B ELIEF VARIANCE σi2 (t) Although the convergence conditions for va (t) are derived in the last section, what we are really interested in is the convergence condition for the belief variance σi2 (t). According to (11), the variance σi2 (t) and va (t) are related by σi2 (t) = ∑ 1 a (t). = pii + k∈N (i) vk→i σ 2 (t)
pii +
1
∑
k∈N (i)
a vk→i (t)
, or equivalently
i
A. Convergence Condition In order to investigate the convergence condition of σi2 (t), we present the following lemma first. Lemma 3. If S1 ̸= ∅ and va (0) ∈ A, then lim
1 2 t→∞ σi (t)
> 0 for all i ∈ V or lim
1 2 t→∞ σi (t)
= 0 for all
i ∈ V. Proof: In the proof, we first consider the special case under tree-structured factor graph as well as the impact of initializations. Then, for the case of loopy factor graph, we prove that there exists at least one node γ ∈ V such that lim
1 2 t→∞ σγ (t)
lim 21 t→∞ σγ (t)
> 0 for all γ ∈ V or lim
1 2 t→∞ σγ (t)
≥ 0. At last, we demonstrate that either
= 0 for all γ ∈ V will hold.
First, if the factor graph is of tree structure, according to properties of BP in a tree structured factor graph, it is known that lim σγ2 (t) = [P−1 ]γγ [1]. Due to P−1 ≻ 0, we have lim σγ2 (t) = t→∞
t→∞
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
16
[P−1 ]γγ > 0, and thereby lim
1 2 t→∞ σγ (t)
> 0 for all γ ∈ V. In the following, we focus on the factor
graph containing loops. If S1 ̸= ∅, according to Theorem 2, va (t) converges to the same point for any initialization ∑ a va (0) ∈ A. Due to σ21(t) = pii + k∈N (i) vk→i (t), then lim σ21(t) always exists and is identical t→∞
i
i
for all v (0) ∈ A. Therefore, we only need to consider a specific initialization, e.g. va (0) = 0. a
1 2 t→∞ σγ (t)
Second, we prove lim
≥ 0 for at least one node γ ∈ V. Due to the factor graph containing
loops, we can always find a node γ such that |N (γ)| ≥ 2, where | · | means the cardinality of a set. Denote nodes i, j as two neighbors of node γ, that is, i, j ∈ N (γ). Since va (0) = 0, from ∑ a a (t) (t) + vi→γ (63), we have va∗ ≤ va (t). Applying this relation to σ21(t) = pγγ + k∈N (γ)\i vk→γ γ
gives 1 ≥ pγγ + 2 σγ (t)
∑
a a∗ vk→γ (t) + vi→γ .
(33)
k∈N (γ)\i
By further using va∗ ≤ va (t), we can easily obtain ∑ ∑ a a∗ vk→γ (t) + vi→γ = pγγ + pγγ +
a a∗ a vk→γ (t) + vi→γ + vj→γ (t)
k∈N (γ)\i,j
k∈N (γ)\i
∑
≥ pγγ +
a∗ a vk→γ + vj→γ (t).
(34)
k∈N (γ)\j
It can be further shown that ∑ a∗ a vk→γ + vj→γ (t) pγγ + k∈N (γ)\j
∑ ∑ a a∗ ∑ p + v (t − 1) pγγ + k∈N (γ)\j vk→γ jj k∈N (j)\γ k→j a a pjj + ∑ ∑ = vk→j (t − 1) + vj→γ (t) a a∗ pjj + k∈N (j)\γ vk→j (t − 1) pγγ + k∈N (γ)\j vk→γ k∈N (j)\γ ∑ a∗ 2 ∑ p + v p γγ a k∈N (γ)\j k→γ a pjj + ∑ ∑ γj = vk→j (t − 1) − a a∗ pjj + k∈N (j)\γ vk→j (t − 1) pγγ + k∈N (γ)\j vk→γ k∈N (j)\γ ∑ a∗ ∑ p + v γγ b k∈N (γ)\j k→γ a a∗ pjj + ∑ = vk→j (t − 1) + vγ→j , (35) a pjj + k∈N (j)\γ vk→j (t − 1) a
(
k∈N (j)\γ
where the equality = holds because pjj + b
∑
) a a (t) = −p2γj from (12); v (t − 1) vj→γ k∈N (j)\γ k→j
a∗ = − pγγ +∑ and = is obtained by using the limiting form of (12) vγ→j
p2γj a∗ k∈N (γ)\j vk→γ
. Combining
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
17
(34) and (35) gives pγγ +
∑
∑
a a∗ vk→γ (t)+vi→γ ≥
k∈N (γ)\i
a∗ pγγ + k∈N (γ)\j vk→γ ∑ a pjj + k∈N (j)\γ vk→j (t −
1)
∑
pjj +
a a∗ vk→j (t − 1) + vγ→j .
k∈N (j)\γ
(36) Due to va∗ ∈ S1 from Lemma 2, then va∗ ∈ W. From the definition of W in (15), for all (ν0 , ν1 ) ∈ E , we have pν0 ν0 +
∑
a∗ vk→ν > 0. 0
(37)
k∈N (ν0 )\ν1
By using (37) and va (t) ≥ va∗ implied by (63), we can infer pν0 ν0 +
∑ k∈N (ν0 )\ν1
a vk→ν (t) > 0 0
for all (ν0 , ν1 ) ∈ E, and thereby
∑ a∗ pν0 ν0 + k∈N (ν0 )\ν1 vk→ν 0 ∑ > 0. a pν1 ν1 + k∈N (ν1 )\ν0 vk→ν (t) 1
(38)
Since the factor graph contains loops, there always exists a walk (j0 , j1 , · · · , jt−1 , jt , i) with jt = γ such that all (jℓ−1 , jℓ ) ∈ E and |N (jℓ )| ≥ 2 for ℓ = 0, 1, · · · , t. Then, using (36) repeatedly on such a walk gives pγγ +
∑
a a∗ vk→γ (t) + vi→γ
k∈N (γ)\i
∑ t−1 a∗ ∏ ∑ pjℓ+1 jℓ+1 + k∈N (jℓ+1 )\jℓ vk→j ℓ+1 a pj0 j0 + ∑ ≥ vk→j (0) + vja∗1 →j0 0 a p + v (ℓ) jℓ jℓ k∈N (jℓ )\jℓ+1 k→jℓ ℓ=0 k∈N (j0 )\j1 ∑ t−1 a∗ pjℓ+1 jℓ+1 + k∈N (jℓ+1 )\jℓ vk→j ( )∏ ℓ+1 ∑ = pj0 j0 + vja∗1 →j0 , (39) a p + v (ℓ) jℓ jℓ k→j k∈N (j )\j ℓ ℓ ℓ+1 ℓ=0
where the last equality holds due to va (0) = 0. Due to |N (j0 )| ≥ 2, there exists a node ν such that j1 ∈ N (j0 )\ν. By exploiting the fact that va∗ < 0 due to va∗ ∈ S1 , we obtain ∑ a∗ pj0 j0 + vja∗1 →j0 ≥ pj0 j0 + k∈N (j0 )\ν vk→j . Combining with (37), we can infer that 0 pj0 j0 + vja∗1 →j0 > 0. Putting (38) and (40) into (39), we obtain pγγ + result to (33) gives
1 σγ2 (t)
(40)
∑ k∈N (γ)\i
a a∗ vk→γ (t) + vi→γ > 0. Applying this
> 0 for all t ≥ 0, and thereby lim
1 2 t→∞ σγ (t)
1 2 t→∞ σγ (t)
At last, we will prove that lim
≥ 0.
> 0 for all γ ∈ V or lim
1 2 t→∞ σγ (t)
= 0 for all γ ∈ V. Notice
that the relation in (35) holds for any (γ, j) ∈ E. Thus, taking the limit on both sides of (35) ∑ ∑ a∗ a∗ and recognizing that lim σ21(t) = pγγ + k∈N (γ) vk→γ and lim σ21(t) = pjj + k∈N (j) vk→j , we t→∞
obtain
γ
t→∞
j
∑ a∗ pγγ + k∈N (γ)\j vk→γ 1 1 ∑ = lim · lim 2 . a∗ t→∞ σγ2 (t) t→∞ pjj + k∈N (j)\γ vk→j σj (t)
July 31, 2014
(41)
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
18
Since pγγ +
∑ k∈N (γ)\j
1 2 t→∞ σj (t)
0, then lim lim 21 t→∞ σj (t)
a∗ vk→γ > 0 for all (γ, j) ∈ E given in (37), according to (41), if lim
1 2 t→∞ σγ (t)
> 0 for all nodes j ∈ N (γ), and vice versa. Similarly, if lim
1 2 t→∞ σγ (t)
>
= 0, then
= 0 for all nodes j ∈ N (γ), and vice versa. Recall that we have proved there exists a 1 2 t→∞ σγ (t)
node γ with lim
for all i ∈ V or lim Obviously, if
≥ 0. Therefore, for a fully connected factor graph, either lim
1 2 t→∞ σi (t) lim 21 t→∞ σi (t)
1 2 t→∞ σi (t)
>0
= 0 for all i ∈ V. ̸= 0, then lim σi2 (t) exists, and hence the convergence condition of
σi2 (t) is the same as that of
t→∞
1 . σi2 (t)
1 2 t→∞ σi (t)
However, Lemma 3 reveals that the scenario lim
=0
cannot be excluded. Thus, we have the following theorem. Theorem 4. σi2 (t) with i ∈ V converges to the same positive value for all va (0) ∈ A under synchronous and asynchronous schedulings if and only if S ̸= ∅, where { } ∑ S , w w ∈ S1 and pii + wki > 0, ∀ i ∈ V .
(42)
k∈N (i)
Proof: First, we prove if S ̸= ∅, then σi2 (t) converges for any va (0) ∈ A. From the definition of S in (42), if S ̸= ∅, then S1 ̸= ∅. According to Theorems 2 and 3, va (t) converges to va∗ for all va (0) ∈ A under both synchronous and asynchronous schedulings. Thus, lim σ21(t) = t→∞ i ∑ ∑ a∗ a . Due to S ̸= ∅, there exists a w such (t) will converge to pii + k∈N (i) vk→i pii + k∈N (i) vk→i ∑ that w ∈ S1 and pii + k∈N (i) wki > 0 for all i ∈ V. Due to w ∈ S1 , it can be inferred from (63) ∑ ∑ a∗ that w ≤ va∗ . Putting this result into pii + k∈N (i) wki > 0, we obtain pii + k∈N (i) vk→i >0 and its inverse exists. Thus, if S ̸= ∅, the variance σi2 (t) = pii +
∑ 1 k∈N (i)
a∗ vk→i
pii +
∑
1 k∈N (i)
a vk→i (t)
converges to
> 0.
Next, we prove by contradiction that if σi2 (t) converges for va (0) ∈ A, then S ̸= ∅. Suppose ∑ that S = ∅. This assumption contains two cases: 1) S1 = ∅; 2) S1 ̸= ∅ and pii + k∈N (i) wki ≤ 0 for some i ∈ V. First, consider the case S1 = ∅. Due to S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed in factor graph cannot maintain Gaussian form, hence va (t) becomes undefined. Therefore, the variance σi2 (t) cannot converge in this case, which contradicts with the prerequisite that σi2 (t) converges. Second, consider the case S1 ̸= ∅. Due to S1 ̸= ∅, according to Theorems 2 and 3, va (t) converges to va∗ ∈ S1 for any va (0) ∈ A. ∑ a∗ According to Lemma 3, we further have lim σ21(t) = pii + k∈N (i) vk→i > 0 for all i ∈ V or t→∞ ∑ ∑ a∗ a∗ lim σ21(t) = pii + k∈N (i) vk→i = 0 for all i ∈ V. If pii + k∈N (i) vk→i > 0 for all i ∈ V,
t→∞
combining with va∗ ∈ S1 , we can infer from the definition of S in (42) that va∗ ∈ S, which
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
19
contradicts with the assumption S = ∅. On the other hand, if pii + i ∈ V, obviously, the variance σi2 (t) =
pii +
1
∑
k∈N (i)
a vk→i (t)
∑ k∈N (i)
a∗ vk→i = 0 for all
cannot converge, which contradicts with
the prerequisite that σi2 (t) converges. In summary, we have proved that if σi2 (t) converges for all va (0) ∈ A, then S = ∅ cannot hold, or equivalently we must have S = ̸ ∅. Remark 1: Due to the belief mean µi (t) =
hi +
∑
a k∈N (i) βk→i (t) 2 σi (t)
, if the convergence condition
of belief variance σi2 (t) is already known, the convergence of µi (t) is determined by that a a (t) is updated in accordance to (t). As pointed out in [24], βk→j of message parameters βk→i
a set of linear equations upon the convergence of belief variances. Thus, after deriving the convergence condition of σi2 (t), we can then apply the theories from linear equations to analyze the convergence condition of belief mean µi (t). However, the convergence condition of belief mean µi (t) would be lengthy and will be investigated in the future. B. Verification Using Semi-definite Programming Next, we show that whether S is ∅ can be determined by solving the following SDP problem [30] min w, α
s.t.
α
(43) ∑
pii +
wki
pij
k∈N (i)\j
α + pii +
pij ∑
−wij wki ≥ 0,
≽ 0, ∀ (i, j) ∈ E; ∀ i ∈ V.
k∈N (i)
The SDP problem (43) can be solved efficiently by existing softwares, such as CVX [31] and SeDuMi [32], etc. Theorem 5. S ̸= ∅ if and only if the optimal solution α∗ of (43) satisfies α∗ < 0. Proof: First, notice that the SDP problem in (43) is equivalent to the following optimization
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
20
problem min w, α
α
(44)
s.t. wij − gij (w) ≤ 0, ∑ − pii − wki ≤ 0,
∀ (i, j) ∈ E; ∀ (i, j) ∈ E;
k∈N (i)\j
∑
− pii −
wki ≤ α,
∀ i ∈ V.
k∈N (i)
If S ̸= ∅, according to definition of S in (42), there must exist a w such that w ≤ g(w), w ∈ W, ∑ and pii + k∈N (i) wki > 0 for all i ∈ V. Obviously, these three conditions are equivalent to ∑ ∑ wij − gij (w) ≤ 0, −pii − k∈N (i)\j wki < 0 for all (i, j) ∈ E and −pii − k∈N (i) wki < 0 for ∑ all i ∈ V. Thus, by defining α = −pii − k∈N (i) wki , it can be seen that (w, α) satisfies the ∑ constraints in (44). Due to −pii − k∈N (i) wki < 0, thus we have α < 0. Since α is a feasible solution of the minimization problem in (44), the optimal solution of (44) cannot be greater than α, hence it can be inferred that α∗ < 0. Next, we prove the reverse also holds. If (w∗ , α∗ ) is the optimal solution of (44) with α∗ < 0, ∗ (w∗ , α∗ ) must satisfy the constraints in (44), thus the following three conditions hold: 1) wij − ∑ ∑ ∗ ∗ ≤ α∗ . For the second constraint, ≤ 0; 3) −pii − k∈N (i) wki gij (w∗ ) ≤ 0; 2) −pii − k∈N (i)\j wki ∑ p2 ∗ = 0, the function gij (w∗ ) = − pii +∑ ij if −pii − k∈N (i)\j wki ∗ becomes undefined, thus k∈N (i)\j wki ∑ ∑ ∗ ∗ < 0. = 0 will never happen. Hence, we always have −pii − k∈N (i)\j wki −pii − k∈N (i)\j wki ∑ ∗ < 0. Now, For the third constraint, due to α∗ < 0, it can be inferred that −pii − k∈N (i) wki ∑ ∑ ∑ ∗ ∗ ∗ comparing −pii − k∈N (i)\j wki ≤ 0, −pii − k∈N (i)\j wki < 0 and −pii − k∈N (i) wki < 0 with
the definition of set S in (42), we have w∗ ∈ S, and hence S ̸= ∅. Remark 2: Using the alternating direction method of multipliers (ADMM) [27], the SDP problem in (43) can be reformulated into N locally connected low-dimensional SDP subproblems, thus can be solved distributively. Not only this avoids ( the gathering of information at )4 ) (∑ N of directly a central processing unit, the complexity is also reduced from O i=1 |N (i)| ) (∑ N 4 per iteration of using ADMM technique. Furthermore, since solving SDP to O i=1 |N (i)| what we are really interested in is to know whether S ̸= ∅, ADMM can stop its updating immediately if an intermediary vector is found to be within S. Thus, the required number of iterations can be reduced significantly. Because the derivation of ADMM is well-documented July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
21
in [27], [33], we do not give the details here. It should be emphasized that despite of the low complexity of ADMM at each iteration, due to the unknown number of required iterations, it cannot be proved that the overall complexity is always lower than that of direct matrix inverse O(N 3 ). Remark 3: As discussed after (20), we know that A = {w ≥ w0 |w0 ∈ int(S1 ) } if int(S1 ) ̸= ∅. In particular, with a proof similar to that of Theorem 5, it can be shown that if the SDP min w, α
s.t.
α
(45)
∑
pii +
wki
pij
k∈N (i)\j
pij
α − wij
≽ 0, ∀ (i, j) ∈ E
has a solution α∗ < 0, then the optimal w∗ must be a point in int(S1 ). VI. R ELATIONSHIP WITH
THE
C ONDITION BASED ON C OMPUTATION T REE
In this section, we will establish the relation between the proposed convergence condition and the one proposed in [24], which is the best currently available. For the convergence condition in [24], the original PDF in (1) is first transformed into the normalized form } { 1 T˜ T ˜ ˜ ˜ P˜ x+h x ˜ , f (˜ x) ∝ exp − x 2
(46)
1 ˜ = diag− 12 (P)h with diag(P) ˜ = diag− 12 (P)Pdiag− 12 (P) and h where x ˜ = diag 2 (P)x, P
denoting a diagonal matrix by only retaining the diagonal elements of P. Then, Gaussian BP ˜ using the updating equations in (6), (7), (9) and (10). Under the ˜ and h is carried out with P a ˜ we denote the corresponding message parameters as v ˜ and h, normalized P ˜a (t) and β˜ (t). With v ˜a (t), the corresponding belief variance can be calculated using (11) as σ ˜i2 (t) =
p˜ii +
1
∑
k∈N (i)
a v˜k→i (t)
.
The variance of the original PDF f (x) can be recovered from p1ii σ ˜i2 (t). Moreover, under the ˜ S˜1 , A˜ and S˜ can also be defined similar to (15), (17), (20) and (42), ˜ the sets W, normalized P, respectively. By noticing that the message-passing process of Gaussian BP in the factor graph can be translated exactly into the message passing in a computation tree, [24] proposed a convergence condition for belief variance σ ˜i2 (t) in terms of computation tree. A computation tree Tγ,∆ is constructed by choosing one of the variable nodes γ as root node and N (γ) as its direct descendants, connected through factor node f˜γ,ν (˜ xγ , x˜ν ) = exp{−˜ pγν x˜γ x˜ν }, where ν is the July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
22
x1
x2
x3
x1
x2
x4
x4
x1
x4
x3
x4
""
x2
x3
""
x1
""
x3
x3
x3
x4
x2
x1
""
x1
x4
Fig. 1: Illustration of computation tree T(1, 4) and computation sub-tree Ts (1, 4, 4) within the dashed line.
index of any descendant. For each descendant ν, the neighbors of ν excluding its parent node are connected as next level descendants, also connected through the corresponding factor nodes. The process is repeated until the computation tree Tγ,∆ has ∆ + 1 layers of variables nodes (including the root node). The computation tree is completed by further connecting factor node ˜ i x˜i } to variable node x˜i for i ∈ V. Furthermore, a computation subtree f˜i (˜ xi ) = exp{− 1 x˜2 + h 2 i
Ts (γ, ν, ∆) can be obtained by cutting the branch of xν in the first layer from T(γ, ∆). An example of computation tree T(1, 4) as well as computation subtree Ts (1, 4, 4) corresponding to a fully connected factor graph with four variables are illustrated in Fig. 1. By assigning to each variable node with a new index i = 1, 2, · · · , |Tγ,∆ |, Tγ,∆ can be viewed as a factor graph for a new PDF, where |Tγ,∆ | denotes the number of variable nodes in Tγ,∆ . The new PDF represented by Tγ,∆ is
{ } 1 T ˜ T ˜ ˜ fγ,∆ (˜ xγ,∆ ) ∝ exp − x ˜γ,∆ + hγ,∆ x ˜γ,∆ , ˜ Pγ,∆ x 2 γ,∆
(47)
˜ γ,∆ are the corresponding precision matrix and ˜ γ,∆ and h where x ˜γ,∆ is a |Tγ,∆ | × 1 vector; P linear coefficient vector, respectively. According to the equivalence between the message-passing process in original factor graph and that in computation tree [6], [24], under the prerequisite of ˜ γ,∆ ≻ 0 and v P ˜a (0) = 0, it is known that ˜ −1 ]11 . σ ˜γ2 (∆) = [P γ,∆
July 31, 2014
(48)
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
23
˜ γ\ν,∆ Similarly, the computation sub-tree Ts (γ, ν, ∆) can also be viewed as a factor graph with P being the corresponding symmetric precision matrix in the PDF. Furthermore, if (48) holds, we also have 2 ˜ −1 ]11 , σ ˜γ\ν (∆) = [P γ\ν,∆ 2 where σ ˜γ\ν (∆) ,
(49)
˜ γ,∆ − In [24, Proposition 25], it is proved that if lim ρ(P ∆→∞ ˜ γ,∆ − I) > 1, Gaussian BP I) < 1, the variance σ ˜γ2 (t) converges for v ˜a (0) = 0; but if lim ρ(P ∆→∞ ˜ γ,∆ − I) always exist and are identical for becomes ill-posed. Notice that the limit lim ρ(P 1 ∑ . a p˜γγ + k∈N (γ)\ν v˜k→γ (∆)
∆→∞
all γ ∈ V [24, Lemma 24]. The following lemma and theorem reveal the relation between the proposed convergence condition and that based on computation tree. ˜ γ,∆ ≻ 0 for all γ ∈ V and ∆ ≥ 0, then v Lemma 4. If P ˜a (t) converges to v ˜a∗ ∈ S˜1 for v ˜a (0) = 0. Proof: See Appendix B. Now, we give the following theorem. Theorem 6. S˜ ̸= ∅ if and only if either of the following two conditions holds: ˜ γ,∆ − I) < 1; 1) lim ρ(P ∆→∞ ∑ ˜ γ,∆ − I) = 1, p˜γγ + 2) lim ρ(P ∆→∞
k∈N (γ)
a∗ ˜ ν,∆ ≻ 0 for all ν ∈ V v˜k→γ ̸= 0 with v ˜a (0) = 0, and P
and ∆ ≥ 0. Proof: Sufficient Condition: ˜ γ,∆ − I) < 1, then S˜ ̸= ∅. With the monotonically increasing First, we prove that if lim ρ(P ∆→∞ ˜ γ,∆ − I) [24, Lemma 23] and the assumption that its limit is smaller than 1, we property of ρ(P ˜ γ,∆ − I) ≤ ρu with ρu , lim ρ(P ˜ γ,∆ − I) ∈ (0, 1). Expressing in terms of eigenvalues, have ρ(P ∆→∞ ˜ γ,∆ ) − 1 ≤ ρu , or equivalently we have −ρu ≤ λ(P ˜ γ,∆ ) ≤ 1 + ρu . 1 − ρu ≤ λ(P { } −1 1 ˜ ˜ For a symmetric Pγ,∆ , the eigenvalues of Pγ,∆ are λ(P˜ ) , and (50) is equivalent to
(50)
γ,∆
1 ˜ −1 ) ≤ 1 . ≤ λ(P γ,∆ 1 + ρu 1 − ρu
July 31, 2014
(51)
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
24
˜ −1 ) > 0, that is, Due to 0 < ρu < 1, it can be obtained from (51) that λ(P γ,∆ ˜ −1 ≻ 0, ∀ γ ∈ V and ∆ ≥ 0. P γ,∆
(52)
Notice that the diagonal element of a symmetric matrix is always smaller or equal to the ˜ −1 ]ii ≤ λmax (P ˜ −1 ). Combining with the fact matrix’s maximum eigenvalue [34], that is, [P γ,∆ γ,∆ ˜ −1 ]ii ≤ that for a positive definite matrix, its diagonal elements are positive, we obtain 0 < [P γ,∆
˜ −1 ). Since the upper bound of (51) applies to all eigenvalues, we obtain λmax (P γ,∆ ˜ −1 ]ii ≤ 0 < [P γ,∆
1 . 1 − ρu
(53)
˜ −1 ]11 = σ Due to [P ˜γ2 (∆) from (48) and σ ˜γ2 (∆) = p˜γγ +∑ 1 v˜a (∆) , it can be inferred from γ,∆ k∈N (γ) k→γ ∑ a ˜ −1 ≻ 0 from (52), we have (53) that p˜γγ + k∈N (γ) v˜k→γ (∆) ≥ 1 − ρu for all ∆ ≥ 0. Due to P γ,∆ a ˜ Pγ,∆ ≻ 0, and according to Lemma 4, v ˜ (t) always converges to v ˜a∗ ∈ S˜1 . Thus, it can be ∑ a∗ inferred that p˜γγ + k∈N (γ) v˜k→γ ≥ 1 − ρu . With ρu < 1, we obtain ∑ a∗ v˜k→γ > 0, ∀ γ ∈ V. (54) p˜γγ + k∈N (γ)
Combining (54) with v ˜a∗ ∈ S˜1 , it can be inferred that v ˜a∗ ∈ S˜ by the definition, thus S˜ ̸= ∅. ˜ γ,∆ ≻ 0, according to Lemma 4, v Next, we prove the second scenario. Due to P ˜a (t) converges ∑ to v ˜a∗ ∈ S˜1 , and thereby S˜1 ̸= ∅ and p˜γγ + k∈N (γ) v˜a∗ exists. Due to S˜1 ̸= ∅, according to ∑ ∑ Lemma 3, it is known that p˜γγ + k∈N (γ) v˜a∗ > 0 for all γ ∈ V or p˜γγ + k∈N (γ) v˜a∗ = 0 for all ∑ ∑ γ ∈ V. From the prerequisite p˜γγ + k∈N (γ) v˜a∗ ̸= 0, we can infer that p˜γγ + k∈N (γ) v˜a∗ > 0 ˜ thus S˜ ̸= ∅. holds for all γ ∈ V. Combining with v ˜a∗ ∈ S˜1 , we can see that v ˜a∗ ∈ S, Necessary Condition: In the following, we will first prove that the precision matrix of a computation tree can always be written into a special structure by re-indexing the nodes in the tree. Based on the special structure, the necessity is proved by the method of induction. First, notice that changing indexing schemes in the computation tree does not affect the positive ˜ γ,∆ . So, we consider an indexing scheme definiteness of the corresponding precision matrix P
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
25
˜ γ,∆+1 for ∆ ≥ 0 can be represented in the form such that the precision matrix P T T T p˜γγ ˜ ak1 \γ,∆ ˜ ak2 \γ,∆ · · · ˜ ak|N (γ)| \γ,∆ ˜ ˜ ak1 \γ,∆ Pk1 \γ,∆ 0 ··· 0 . .. .. ˜ γ,∆+1 = ˜ .. P . . , ak2 \γ,∆ 0 .. .. .. .. . . 0 . . ˜ ˜ ak|N (γ)| \γ,∆ 0 ··· 0 Pk|N (γ)| \γ,∆
(55)
where ki ∈ N (γ) for i = 1, 2, · · · , |N (γ)|; ˜ aki \γ,∆ = [˜ pki γ , 0, · · · , 0]T is a vector with length |Ts (ki , γ, ∆)|. Notice that the (∆ + 1)-order computation tree T(γ, ∆ + 1) consists of the root node [˜ x]γ and a set of ∆-order computation sub-trees Ts (ki , γ, ∆) with the corresponding root node ki . The lower-right block diagonal structure of (55) can be easily obtained by assigning consecutive indices to nodes inside each sub-tree Ts (ki , γ, ∆). Moreover, since there is only aki \γ,∆ contains one connection from node γ to the root node ki of each sub-tree Ts (ki , γ, ∆), ˜ only one nonzero element p˜ki γ . Then, by assigning the smallest index to the root node in each Ts (ki , γ, ∆), the only nonzero element in ˜ aki \γ,∆ must locate at the first position. Therefore, the ˜ γ,∆+1 can be represented in the form of (55). precision matrix P ˜ γ,∆ = p˜γγ > 0. Suppose P ˜ γ,∆ ≻ 0 for some ∆ ≥ 0, we need When ∆ = 0, obviously, P ˜ γ,∆+1 ≻ 0. From (55), it is clear that P ˜ γ,∆+1 ≻ 0 if and only if the following two to prove P conditions are satisfied [34, p. 472]: ( ) ˜ k1 \γ,∆ , · · · , P ˜k Bdiag P ≻ 0, |N (γ)| \γ,∆ ∑ ˜ −1 ˜ ˜ aTk\γ,∆ P p˜γγ − k\γ,∆ ak\γ,∆ > 0,
(56) (57)
k∈N (γ)
where Bdiag(·) denotes block diagonal matrix with the elements located along the main diagonal. ˜ k,∆ ≻ 0 for all k ∈ V by assumption, then its sub-matrices P ˜ k\γ,∆ ≻ 0 for all (k, γ) ∈ E, Due to P thus the first condition (56) holds. On the other hand, for the second condition (57), we write ∑ ∑ a 2 (∆) ˜k\γ ˜ a = p ˜ − p2kγ · σ p˜γγ − ˜ aTk\γ,∆ P−1 γγ k\γ,∆ k\γ,∆ k∈N (γ)
k∈N (γ) b
= p˜γγ +
∑
a v˜k→γ (∆ + 1),
(58)
k∈N (γ) a
where the equality = holds since ˜ ak\γ,∆ has only one nonzero p˜kγ in the first element, and p˜2kγ b 2 2 2 ˜ −1 ]11 = σ ∑ = [P ˜ (∆) from (49); and = holds due to −p · σ ˜ (∆) = − kγ k\γ k\γ k\γ,∆ p˜γγ + v˜a (∆) k∈N (γ)\ν
July 31, 2014
k→γ
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
26 a v˜k→γ (∆ + 1) given in (12). If S˜ ̸= ∅, according to Theorem 4, we have ∑ a∗ p˜γγ + v˜k→γ > 0.
(59)
k∈N (γ)
Due to S˜ ⊆ S˜1 , then S˜ ̸= ∅ also implies S˜1 ̸= ∅. According to (63), it can be inferred that ∑ a v ˜a (t) ≥ v ˜a∗ . Combining with (59), we obtain p˜γγ + k∈N (γ) v˜k→γ (t) > 0. Substituting the result ˜ γ,∆ ≻ 0 into (58), it can be inferred that the second condition (57) holds as well. Thus, we have P ∑ a∗ for all γ ∈ V and ∆ > 0. Furthermore, from (59), it is obvious that p˜γγ + k∈N (γ) v˜k→γ ̸= 0. ˜ γ,∆ ≻ 0, we obtain I − (I − P ˜ γ,∆ ) ≻ 0, and hence 1 − λmax (I − P ˜ γ,∆ ) > 0. Since P ˜ γ,∆ From P ˜ γ,∆ ) = represents a tree-structured factor graph, it is proved in [24, Proposition 15] that λmax (I− P ˜ γ,∆ − I). Therefore, it can be obtained that ρ(P ˜ γ,∆ − I) < 1, and thereby ρ(P ˜ γ,∆ − I) ≤ 1. lim ρ(P
(60)
∆→∞
˜ γ,∆ − I) < 1, due to P ˜ −1 ≻ 0 from (52), we have P ˜ γ,∆ ≻ 0. Together Finally, if lim ρ(P γ,∆ ∆→∞ ˜ γ,∆ − I) < 1, the conditions with (54), it can be inferred that under the prerequisite of lim ρ(P ∆→∞ ∑ a∗ ˜ γ,∆ ≻ 0 and p˜γγ + ˜k→γ ̸= 0 are automatically satisfied. Therefore, if S˜ ̸= ∅, we have P k∈N (γ) v ∑ ˜ γ,∆ ≻ 0. ˜ γ,∆ − I) < 1 or lim ρ(P ˜ γ,∆ − I) = 1, p˜γγ + v˜a∗ ̸= 0 and P either lim ρ(P ∆→∞
∆→∞
k∈N (γ)
k→γ
˜ γ,∆ − From Theorems 4 and 6, it can be obtained that the variance σγ2 (t) converges as lim ρ(P ∆→∞ ˜ γ,∆ − I) > 1, which are consistent with the results proposed I) < 1, and diverges as lim ρ(P ∆→∞
in [24]. Moreover, it can be seen from Theorem 6 that it is not sufficient to determine the ˜ γ,∆ − I) = 1 only. This fills in the gap of [24] convergence of variance σ ˜γ2 (t) by using lim ρ(P ∆→∞ ˜ γ,∆ − I) = 1. Albeit with similar conclusions to [24], we need to in the scenario of lim ρ(P ∆→∞
˜ γ,∆ −I) < 1 proposed in [24] is not easy to check in practice emphasize that the criterion lim ρ(P ∆→∞ due to the infinite dimension, while our condition S˜ ̸= ∅ can be verified by solving an SDP problem given in Theorem 5. Moreover, the initialization is expanded from the a single choice v ˜a (0) = 0 in [24] to a much larger set v ˜a (0) ∈ A˜ in this paper. The flexibility on the choice of initialization is useful to accelerate the convergence of variance σi2 (t) if the initialization is chosen close to the convergent point.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
27
VII. N UMERICAL E XAMPLES In this section, numerical experiments are presented to corroborate the theories in this paper. The example is based on the 20 × 20 precision matrices P constructed as 1, if i = j pij = , ζ · θmod(i+j,10)+1 , if i = ̸ j
(61)
where ζ is a coefficient indicating the correlation strength among variables; and θk is the kth element of the vector θ = [0.13, 0.10, 0.71, −0.05, 0, 0.12, 0.07, 0.11, −0.02, −0.03]T . The varying of correlation strength ζ induces a series of matrices, and the positive definite constraint P ≻ 0 required by a valid PDF is guaranteed when ζ < 0.5978. Fig. 2 illustrates how the optimal solution α∗ of (43) varies with the correlation strength ζ. It can be seen that the optimal solution α∗ always exists and the condition α∗ < 0 holds for all ζ ≤ 0.5859, while no feasible solution exists in the SDP problem (43) when ζ > 0.5859. According to Theorem 4, this means that if ζ ≤ 0.5859, the variance σi2 (t) with i ∈ V converges to the same point for all initializations va (0) ∈ A under both synchronous and asynchronous schedulings. On the other hand, if ζ > 0.5859, the variance σi2 (t) cannot converge. To verify the convergence of belief variances under ζ ≤ 0.5859, Fig. 3 shows how the variance σ12 (t) of the 1-st variable evolves as a function of t when ζ = 0.5858, which is slightly smaller than 0.5859. It can be observed that the variance σ12 (t) converges to the same value under both synchronous and asynchronous schedulings and different initializations of va (0) = 0 and va (0) = w∗ , where w∗ is the optimal solution of (45). For the asynchronous case, a scheduling with 30% chance of not updating the messages at each iteration is considered. On the other hand, Fig. 4 verifies the divergence of variance σ12 (t) when ζ = 0.5860, which is slightly larger than 0.5859. In this figure, synchronous scheduling and the same initializations as that of Fig. 3 are used. It can be seen that σ12 (t) fluctuates as iterations proceed, and does not show sign of convergence. VIII. C ONCLUSIONS In this paper, the necessary and sufficient convergence condition for the variances of Gaussian BP was developed for synchronous and asynchronous schedulings. The initialization set of the proposed condition is much larger than the usual choice of a single point of zero. It is proved that
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
28
Convergence Metric of Variance −0.2 −0.3
X: 0.5859 Y: −0.2844
Convergence Metric: α
−0.4 −0.5 ζ=0.5978
−0.6 −0.7 −0.8 −0.9 −1 0.25
0.3
0.35
0.4 0.45 0.5 Correlation Strength (ζ)
0.55
0.6
0.65
Fig. 2: The value of α∗ under different correlation strength ζ. Convergence of Variance with ζ=0.5858 va(0) = w∗
1.47 1.46
asynchronous synchronous
1.45
σ21(t)
1.44
va(0) = 0
1.43 1.42 1.41 1.4 1.39 1.38
0
20
40
60
80 100 120 140 Number of Iterations (t)
160
180
200
Fig. 3: Illustration for the convergence of variance σ12 (t) under different schedulings and initializations with w∗ being the optimal solution of (45).
the convergence condition can be verified efficiently by solving an SDP problem. Furthermore, the relationship between the convergence condition proposed in this paper and the one based on computation tree was established. The relationship fills in a missing piece of the result in [24] where the spectral radius of computation tree is equal to one. Numerical examples were further proposed to verify the proposed convergence conditions.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
29
Divergence of Variance with ζ=0.5860 15
10 va(0) = 0
σ21(t)
5
0 va(0) = w∗ −5
−10 0
200
400 600 800 Number of Iterations (t)
1000
1200
Fig. 4: Illustration for the divergence of variance σ12 (t) with w∗ being the optimal solution of (45).
A PPENDIX A P ROOF OF L EMMA 2 First, we prove that va (t) converges for any va (0) ≥ 0 given S1 ̸= ∅. For any w ∈ S1 , according to P5), we have w < 0. Thus, for any va (0) ≥ 0, the relation w ≤ va (0) always holds. Notice that w ∈ W due to w ∈ S1 and S1 ⊆ W. Applying P2) to w ≤ va (0), we obtain g(w) ≤ va (1). Combining it with w ≤ g(w) from P6) gives w ≤ va (1). On the other hand, substituting va (0) ≥ 0 into (14) gives va (1) < 0.
(62)
Due to va (0) ≥ 0, thus va (1) ≤ va (0). Combining it with w ≤ va (1) gives w ≤ va (1) ≤ va (0). Applying g(·) to w ≤ va (1) < va (0), it can be inferred from P2) that g(w) ≤ va (2) < va (1). Together with w ≤ g(w) as claimed by P6), we obtain w ≤ va (2) < va (1). By induction, we can infer that w ≤ va (t + 1) ≤ va (t).
(63)
It can be seen from (63) that va (t) is a monotonically non-increasing but lower bounded sequence, thus it converges. Next, we prove that va (t) converges to the same point for all va (0) ≥ 0. For any va (0) ≥ 0, according to P2), applying g(·) on both sides of 0 ≤ va (0) gives g(0) ≤ g(va (0)) = va (1). July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
30
Combining this relation with (62) leads to g(0) ≤ va (1) ≤ 0. Applying g(·) on this inequality for t more times, it can be obtained from P2) that g(t+1) (0) ≤ va (t + 1) ≤ g(t) (0) for all t ≥ 0. By denoting lim g(t) (0) = va∗ , it is obvious that va (t) also converges to the va∗ . t→∞
Finally, since va∗ is a fixed point of g(·), hence va∗ = g(va∗ ). From (63), we obtain w ≤ va∗ . Due to w ∈ S1 , thus w ∈ W by definition. Applying P1), it can be inferred that va∗ ∈ W. Combining with the fact va∗ = g(va∗ ), according to the definition of S1 in (17), we obtain va∗ ∈ S1 . On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed in factor graph cannot be maintained at the Gaussian form, thus the parameters of messages va (t) cannot converge in this case. A PPENDIX B P ROOF OF L EMMA 4 ˜ γ,∆ ≻ 0, then P ˜ −1 ≻ 0, hence the diagonal elements [P ˜ −1 ]ii > 0 [34]. With Due to P γ,∆ γ,∆ ∑ a ˜ −1 ]11 and 21 = p˜γγ + ˜k→γ (t), we have σ ˜γ2 (∆) = [P γ,∆ k∈N (γ) v σ ˜γ (t) ∑ a a v˜k→γ (t) > −˜ vν→γ (t). (64) p˜γγ + k∈N (γ)\ν
˜ for t ≥ 0. Obviously, v ˜ Suppose v ˜ Next, we prove v ˜a (t) ∈ W ˜a (0) = 0 ∈ W. ˜a (t) ∈ W ˜ in (15), this is equivalent to holds for some t ≥ 0, and according to the definition of W ∑ a a (t) > 0. Substituting this inequality into (14) gives v˜ν→γ p˜νν + k∈N (ν)\γ v˜k→ν (t + 1) < 0. Then ∑ a a applying v˜ν→γ (t + 1) > 0 for all (ν, γ) ∈ E, or (t + 1) < 0 into (64) gives p˜γγ + k∈N (γ)\ν v˜k→γ ˜ Thus, we have v ˜ for all t ≥ 0. equivalently v ˜a (t + 1) ∈ W. ˜a (t) ∈ W Finally, substituting v ˜a (0) = 0 into (13) gives v ˜a (1) < 0, and thereby v ˜a (1) < v ˜a (0). From ˜ and P2), applying g(·) on both sides of v v ˜a (t) ∈ W ˜a (1) < v ˜a (0) for t times gives v ˜a (t + 1) ≤ v ˜a (t).
(65)
a (t) > −˜ pγγ From v ˜a (0) = 0 and (65), we have v ˜a (t) ≤ 0. Putting the result into (64) gives v˜ν→γ
for all (ν, γ) ∈ E. Together with (65), it can be seen that v ˜a (t) is a monotonically non-increasing and lower bounded sequence. Thus, v ˜a (t) converges to a fixed point v ˜a∗ with v ˜a∗ = g(˜ va∗ ). Moreover, from v ˜a (1) < 0 and (65), we have v ˜a∗ < 0. Putting the result into the limiting form ∑ a∗ ˜ Combining of (64) gives p˜γγ + k∈N (γ)\ν v˜k→γ > 0 for all (ν, γ) ∈ E, or equivalently v ˜a∗ ∈ W. with v ˜a∗ = g(˜ va∗ ), we obtain v ˜a∗ ∈ S˜1 by the definition of S˜1 . July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
31
R EFERENCES [1] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, pp. 498-519, 2001. [2] H. A. Loeliger, “An introduction to factor graphs,” IEEE Signal Process. Mag., vol. 21, pp. 28-41, 2004. [3] H. A. Loeliger, J. Dauwels, H. Junli, S. Korl, P. Li, and F. R. Kschischang, “The Factor graph approach to model-based signal processing,” Proc. IEEE, vol. 95, no. 6, pp. 1295-1322, June 2007. [4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, Aug. 2006. [5] D. Koller and N. Friedman. Probabilistic Graphical Models Principles and Techniques. MIT Press, 2009. [6] Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” Neural Comput., vol. 13, pp. 2173-2200, 2001. [7] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky, “Feedback message passing for inference in Gaussian graphical models,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4135-4150, 2012. [8] A. Montanari, B. Prabhakar, and D. Tse, “Belief propagation based multiuser detection,” in Proc. IEEE Information Theory Workshop, Punta del Este, Uruguay, Mar. 2006. [9] Q. Guo and D. Huang, “EM-based joint channel estimation and detection for frequency selective channels using Gaussian message passing,” IEEE Trans. Signal Process., vol. 59, pp. 4030-4035, 2011. [10] Q. Guo and P. Li, “LMMSE turbo equalization based on factor graphs,” IEEE J. Sel. Areas Commun., vol. 26, pp. 311-319, 2008. [11] Y. El-Kurdi, W. J. Gross, and D. Giannacopoulos, “Efficient implementation of Gaussian belief propagation solver for large sparse diagonally dominant linear systems,” IEEE Trans. Magn., vol. 48, no. 2, pp. 471474, Feb. 2012. [12] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson, and D. Dolev, “Gaussian belief propagation solver for systems of linear equations,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2008, pp. 1863-1867. [13] X. Tan and J. Li, “Computationally efficient sparse Bayesian learning via belief propagation,” IEEE Trans. Signal Process., vol. 58, pp. 2010-2021, 2010. [14] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky, “Estimation in Gaussian graphical models using tractable subgraphs: a walk-sum analysis,” IEEE Trans. Signal Process., vol. 56, pp. 1916-1930, 2008. [15] N. Boon Loong, J. S. Evans, S. V. Hanly, and D. Aktas, “Distributed downlink beamforming with cooperative base stations,” IEEE Trans. Inf. Theory, vol. 54, pp. 5491-5499, 2008. [16] F. Lehmann, “Iterative mitigation of intercell interference in cellular networks based on Gaussian belief propagation,” IEEE Trans. Veh. Technol., vol. 61, pp. 2544-2558, 2012. [17] M. Leng and Y. C. Wu, “Distributed clock synchronization for wireless sensor networks using belief propagation,” IEEE Trans. Signal Process., vol.59, no.11, pp. 5404-5414, Nov. 2011. [18] A. Ahmad, D. Zennaro, E. Serpedin, and L. Vangelista, “A factor graph approach to clock offset estimation in wireless sensor networks,” IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4244-4260, July 2012. [19] M. Leng, T. Wee-Peng, and T. Q. S. Quek, “Cooperative and distributed localization for wireless sensor networks in multipath environments,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 31253128. [20] Y. Song, B. Wang, Z. Shi, K. Pattipati, and S. Gupta, “Distributed algorithms for energy-efficient even self-deployment in mobile sensor networks,” in IEEE Trans. Mobile Compt., 2013, pp. 1-14.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
32
[21] G. Zhang, W. Xu, and Y. Wang, “Fast distributed rate control algorithm with QoS support in Ad Hoc networks,” in Proc. IEEE Global Telecommunications Conf. (Globecom), 2010. [22] D. Dolev, A. Zymnis, S. P. Boyd, D. Bickson, and Y. Tock, “Distributed large scale network utility maximization,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2009, pp. 829-833. [23] M. W. Seeger and D. P. Wipf, “Variational Bayesian inference techniques,” IEEE Signal Process. Mag., vol. 27, no. 6, pp. 8191, Nov. 2010. [24] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J. Mach. Learn. Res., vol. 7, no.1, pp. 2031-2064, 2006. [25] C. C. Moallemi and B. Van Roy, “Convergence of min-sum message passing for quadratic optimization,” IEEE Trans. Inf. Theory, vol. 55, pp. 2413-2423, 2009. [26] S. Tatikonda and M. I. Jordan, “Loopy belief propagation and Gibbs measures,” in Proc. 18th Annu. Conf. Uncertainty Artif. Intell. (UAI-02), San Francisco, CA, Aug. 2002, pp.493-500. [27] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall, 1989. [28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. [29] C. C. Moallemi and B. Van Roy, “Convergence of Min-Sum Message Passing for Convex Optimization,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 2041-2050, 2010. [30] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, no. 1, pp. 49-95, Mar. 1996. [31] M. Grant, S. Boyd, and Y. Ye, cvx: Matlab software for disciplined convex programming. [Online]. Available: http: //www.stanford.edu/∼boyd/cvx/ [32] J. F. Sturm, “Using SeDumi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optim. Methods Software, no. 11-22, pp.625-653, 1999. [33] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating method of multipliers,” Found. Trends in Mach. Learn., vol. 3, no. 1, pp. 1-122, 2011. [34] R. Horn and C.R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 1985.
Qinliang Su received the B.Eng degree from Chongqing University in 2007 and M.Eng degree from Zhejiang University in 2010 (both with honors), respectively. He is currently pursuing the Ph.D degree with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong. His current research interests include signal processing for wireless communications and networks; statistical inference and learning; graphical models; distributed and parallel computation; convex optimization theories.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
33
Yik-Chung Wu received the B.Eng. (EEE) degree in 1998 and the M.Phil. degree in 2001 from the University of Hong Kong (HKU). He received the Croucher Foundation scholarship in 2002 to study Ph.D. degree at Texas A&M University, College Station, and graduated in 2005. From August 2005 to August 2006, he was with the Thomson Corporate Research, Princeton, NJ, as a Member of Technical Staff. Since September 2006, he has been with HKU, currently as an Associate Professor. He was a visiting scholar at Princeton University, in summer 2011. His research interests are in general area of signal processing and communication systems, and in particular distributed signal processing and communications; optimization theories for communication systems; estimation and detection theories in transceiver designs; and smart grid. Dr. Wu served as an Editor for IEEE Communications Letters, is currently an Editor for IEEE Transactions on Communications and Journal of Communications and Networks.
July 31, 2014
DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.