Nonparametric Probability Density Estimation for Sensor Networks Using Quantized Measurementst Aleksandar Dogandzc' and Benhong Zhang ECpE Department, Iowa State University 3119 Coover Hall, Ames, IA 50011 email: {ald, zhangbh}@ iastate. edu Abstract- We develop a nonparametric method for estimating the probability distribution function (pdf) describing the physical phenomenon measured by a sensor network. The measurements are collected by sensor-processor elements (nodes) deployed in the region of interest; the nodes quantize these measurements and transmit only one bit per observation to a fusion center. We model the measurement pdf as a Gaussian mixture and develop a Fisher scoring algorithm for computing the maximum likelihood (ML) estimates of the unknown mixture probabilities. We also estimate the number of mixture components as well as their means and standard deviation. Numerical simulations demonstrate the performance of the proposed method.
I. Introduction In sensor-network applications, parametric statistical models for phenomena observed by sensor-processor elements (nodes) are often unknown or difficult to find.' Furthermore, as the demand for sensor network increases, one cannot expect that parametric inference approaches will always be economical or feasible [1]. When parametric statistical models exist for the measured phenomenon, their utility is questionable if the network contains compromised nodes working to disrupt its operation [11], [12]. Consequently, it is of interest to develop nonparametric estimation and detection techniques for sensor-network environments [1]-[10]. An overview of nonparametric methods developed so far for sensor-network environments is given in [1]. Nonparametric estimation of the mean-signal level using quantized data is considered in [3]-[6], whereas change estimation and detection methods are developed in [7]. Distributed kernel-based learning and decentralized detection are studied in [1] and [8]-[10]. In [2], a Bayesian nonparametric method is developed for distributed self-localization of sensor networks. A distributed expectationmaximization (EM) algorithm is developed in [13] for parametric probability density function (pdf) estimation of the physical phenomenon measured by the network. In [2], [8][10], and [13], it is assumed that the nodes communicate highresolution measurements and that quantization effects can be
neglected. In this paper, we focus on nonparametric estimation of the pdf describing spatial measurements collected by the nodes within the network. To save energy and radio-frequency (RF)
tThis work was supported by the National Science Foundation under Grant CCF-0545571. 'This is true, for example, in large-scale sensor networks operating in timevarying environments [3]-[5].
1-4244-1037-1/07/$25.00 C2007 IEEE.
communication bandwidth, the nodes coarsely quantize their measurements and convey the quantized information to a fusion center. Here, we assume that each node transmits a single bit per observation to the fusion center. We then model the pdf of the unquantized measurements (termed measurement pdf) as a Gaussian mixture and estimate the mixture parameters from the quantized data. To the best of our knowledge, this is the first work on nonparametric density estimation from heavily quantized data. The estimation of the entire measurement pdf provides additional information compared with the nonparametric approaches in e.g. [3]-[10]. For example, a multimodal estimate of this pdf may indicate that the nodes within the region of interest sense several distinct phenomena. Furthermore, even if the nodes sense a single phenomenon, obtaining a nonparametric pdf estimate may be useful for discovering and combating the compromised nodes. We introduce our nonparametric measurement model in Section II and derive the maximum likelihood (ML) estimator of the mixture probabilities in Section III. In Section IV, we develop calibration approaches for selecting the number of mixture components (Section IV-A) and mixture-component means and standard deviation (Section IV-B). Finally, concluding remarks are given in Section VI.
II. Measurement Model Assume that the region of interest contains N active nodes whose measurements Yn, n = 1, 2, ... , N are modeled as independent, identically distributed (i.i.d.) with unknown pdf fy(.). To conserve energy and RF bandwidth, each node n quantizes its local observation yn before conveying it to the fusion center. Hence, the measurements yn collected at the nodes are not available at the fusion center. As in [6], we assume that each node n compares its yn with a single threshold Tn and transmits a binary value b
°:
Yn >
Tn
otherwise
(1)
error-free to the fusion center. The number of active nodes N corresponds to the number of bits received by the fusion center. In addition to facilitating energy and RF-bandwidth efficiency, this quantization scheme greatly simplifies analog-to-digital conversion, reducing it to signal-level comparison [6]. Our goal is to estimate the unknown measurement pdf fy(.) from
759
the binary values bn, n = 1, 2 ... , N. As in [6], we assume that this pdf is non-zero on a finite support (YMIN, YMAX), which is always satisfied in practice for well-chosen values of YMIN and YMAX 2 We assume that the nodes n C {1, 2, . . ., N} select random thresholds Tn i.i.d. from a probability mass function (pmf) 7(T) with values ranging within the support (YMIN, YMAX) 3 Here, we select 7(T) as the discrete uniform pmf over the set {YMIN+A, YMIN+2A,... , YMAX 2A, YMAX Ai} containing T TYMAX A YMIN 1
elements. Clearly, if additional knowledge about fy (.) is available, it should be incorporated into the choice of 7(T) . A cheap node manufacturing process may yield random thresholds as well; in this case, the random thresholds are a result of the manufacturing process (where we need to ensure only that the range of interest is properly covered) rather than being selected by the nodes. Motivated by the standard kernel density estimation (KDE) approach, we approximate the pdf fy (.) using a K-component distribution mixture: K
fy, K, h, v (Y; P)
Eh
(
h)
[Pl,P2,. .. ,PK]
(4)
and "T denotes a transpose. In the KDE approach, which requires availability of the unquantized measurements yn, n = 1, 2,. . ., N, fyi() is approximated using an N-component distribution mixture of the form (3) with means VKDE,k = Yk and mixture probabilities PKDE,k = 1/N, k = 1, 2, ... , N. (For completeness of presentation, the KDE approach is briefly described in the Appendix; it is a standard part of modern textbooks on nonparametric statistics [14]-[16].) Kernel-based pdf approximations provide a theoretical basis for many nonparametric regression schemes (e.g. the Nadaraya-Watson estimator [16, Ch. 5.1], see also [1, eq. (2)]); such regression schemes have been applied to sensornetwork data modeling and node localization in e.g. [1] and [8]-[9]. In this paper, we choose the commonly used standardnormal kernel function [14, Ch. 3.3.2], [15, Ch. 4.2], [16, Ch. 3.1.2]:
4() = JVl(x; 0, 1)
P[bn
(5)
2For example, YMIN and YMAX may be determined using the sensor measurement-range specifications. 3After choosing the thresholds, the nodes inform the fusion center about their choices, which is done in the network setup stage.
=
P[yn > T.] = jfY)K,hv(y; p) dy I:= Pk (Y ) dydyT
1]
K
i
K
E
j(Vk
k=l
h
T
Tn)
(6a)
(6b)
an p
(6c)
D)
(6d)
where
(3)
where * Vk, k = 1,2,... ,K and h are the means and standard deviation of the mixture components whose shape is determined by the pdf ,(); in nonparametric statistics, i(.) and h are referred to as the kernel function and smoothing parameter (respectively) [14]-[16]; * Pk, k = 1, 2, ... , K are the unknown mixture probabilities forming p
where JV(x; ,u, u2) denotes the Gaussian pdf of a random variable x with mean ,u and variance cr2. The choice of () does not significantly affect the performance of our proposed method (provided that "reasonable" kernels are selected, such as those in [14, Table 3.1], [15, Fig. 4.10], or [16, Table 3.1]), but the choice of the smoothing parameter h is crucial, see Section V. The selection of smoothing parameter implies a tradeoff between systematic and random errors (i.e. bias and variance) [14]; this important insight is true for all nonparametric density estimation methods. Using the density approximation (3), we derive the probability that the node n transmits bn 1 to the fusion center:
aT
=: =
[an, 1, an,2, (
,
* *
..
)
(D
an,K] ):
..
: (
and D (.) denotes the cumulative distribution function (cdf) of the standard normal random variable. Here, (6b) and (6c) follow by using (3) and (5), respectively. More complex quantization schemes can be easily incorporated into the above measurement model. We first develop an ML method for estimating the mixture probabilities p at the fusion center for specified K, vk, k = 1, 2, ... , K, and h (Section III), and then discuss calibration with goal to determine K, vk, k = 1, 2,. . ., K, and h (Section IV). The ML estimation of p developed in Section III is needed to implement the calibration algorithm in Section IV.
III. ML Estimation of the Mixture Probabilities We derive a Fisher scoring iteration for computing the ML estimates of the mixture probabilities p. In this section, we treat the number of mixture components K, mixturecomponent means Vk, k = 1, 2, ... , K, and standard deviation h as known. Their estimation is performed in the calibration stage outlined in Section IV.4 Without loss of generality, we select the node indices n so that the thresholds Tn are sorted in a non-decreasing order: T1 < T2 < ... < TN.
(7)
Now, define VIN min{V1, V2, ... , VK } and VMAX = max{vl, v2,... , VK}, and denote by n,IN the index of the node having the smallest threshold that is greater than V,IN3 h and by nMAX the index of the node having the largest 4Typically, these quantities do not vary rapidly and hence do not need to be frequently updated as we monitor the phenomenon of interest over time.
760
threshold that is smaller than V~X+ 3 h. If the mixturecomponent means and standard deviation are chosen correctly, then the nodes with indices n C {fnmIN, nMIN+ 1,... nMAX} convey relevant information about the measurement pdf. We construct the log-likelihood function for the quantized data: nMAX
1(0)
{bn ln[aT p(O)] + (1 -bn) ln[1 aT P(O)]}
= n=nmIN
to be maximized with respect to unknown parameters
0
[,02,1...OK]T
=
(9a)
which are related to the mixture-probability vector p as follows:
P(O)
=
[P1(01),P2(02),. .. ,PK(OK)1)
=
p(O)
(9b)
exp Ok k= 1,2,... ,K. (9c) exp 01 Here, 0k, k = 1, 2, ... , K are allowed to range over the entire real line; furthermore, the constraint that the mixture probabilities sum to one is automatically satisfied, see (9c). Hence, using this parameterization allows us to formulate our ML estimation as an unconstrained optimization problem. Since the elements of p are probabilities, directly employing the "natural parametrization" p would require constrained optimization to maximize the likelihood. Differentiating (8) with respect to 0 yields the score vector
Pk(0k)
=
Kfl
where the damping factor 6t is chosen to ensure that the log likelihood (8) increases, e > 0 is a small diagonal-loading factor, and IK denotes the identity matrix of size K. Note that the parameter vector 0 is not identifiable; for example, adding a constant to all elements of 0 yields the same mixture probabilities p(O). Consequently, T(0) is singular and needs to be "regularized" by the loading term e in the Fisher scoring iteration (13). We initialize (13) with 0 (0) equal to the K x 1 vector of zeros: (14) 0(0) = OKxl Denote by 0(°) an estimate of 0 obtained upon convergence of (13). Then, the ML estimate of the mixture probability vector is unique (even though 0(°) is not) and given by
P
[P1,P2,
..
,PK]T = P(0(°)
(15)
In the following section, we propose a calibration method for selecting the number of mixture components, their means, and standard deviation.
IV. Calibration We develop a scheme for estimating the number of mixture components and mixture-component means (Section IV-A) as well as the smoothing parameter h (Section IV-B).
A. Selection of the Number and Means of the Mixture Components
Throughout this section, we set the smoothing parameter h to nMAX (abn 1 -b ) a small value h = ho. Let us start with the following choice of the mixture-component means: e n=nwaIN an P(s) h -ean P(t) (10) p(o) p(O)T ] an [P(O) {Vo,k, k = 1, 2, ... . Ko} 2 (T. T.+l ) where we have used the identity n c
01(0)
-
{
f O. 1. 2. .Nf (16) (1a) where we have defined bo = 1, bN+1 0, TO = YMIN,TN+1 = bn
ap(O)T
P(0)
_
p(0) p(0)T
with
diag {p (01), P2 (02), ,PK(Q9K)} (I lb) Differentiating (10) with respect to OT and talkng the expecN 1, 2 tation with respect to the quantized data b2, r yields:
P(0)
T( )
=
=
TOO
]E
[P(0) p(0)P(0)T]
nMAX
{
S
anp(0)
ar aTa 1- aj p(T)] f
p(O) p(0)T].
(12) We now utilize the score vector (10) and de-rivative matrix (12) to construct the (damped) Fisher scoririg iteration for maximizing (8) with respect to 0:5 * [P(0)
0(t+l)
0(t) +
6t [T(0(t)) + ElK]
1
~(t(0)
(0t)
(13)
5Fisher scoring algorithms for ML estimation are disc ussed in, e.g, [17,
Ch. 9.2].
+
=
1. bn+l
=
0,
...
YM,AX. Recall that the thresholds T12 are sorted in a nondecreasing order [see (7)] and that bn = 1 and b1+± = 0 imply Yn > T12 and Yn+1 < T12+±. Here Ko, our initial
choice for the number of mixture components, is simply the number of (bn, b1+±) = (1, 0) patterns (jumps) in the sequence bo, bi, ... , bN, bN+l, indicating evidence of the presence of probability mass of fy(.) in the neighborhood of T12 and Tn±+. For this choice, we obtain the ML estimate p0o Po(0o )) = [Po,1,Po,2, ,po,KOIT of the mixture-probability vector using the Fisher scoring algorithm in (13) or a gradienttype search [in the case where Ko is large and inverting T(0(t)) +cE'Kbecomes computationally challenging]. Observe that small h = ho (i.e. ho shrinking towards zero), (16), and p0 are approximately the ML estimates of the smoothing parameter and mixture-component means and probabilities: they jointly maximize the log-likelihood function in (8). Unfortunately, substituting these ML estimates into (3) leads to a poor estimate of fy(.), consisting of Dirac impulses at the mixture-component means. However, their use for calibration is justified by the fact that KO 1p0,k . 1((y -vo,k)/ho) is a good estimate of the measurement cdf. This estimate
761
components with indices 1, 2, . . ., k-1, k + 1, ... , K (i.e. excluding the kth mixture component):
0.5
0.4
p
0.3-
where
0.2
0-k(h)
k(h)
(18a)
P k(0 k(h))
nMAX (h)
0.1
=
argmax 0k
E {bn ln[an,-k(h)Tp-k(O-k)]
n=nMIN (h)
+(1- bn) ln[l-an,-k(h)T p_k(t_k)] } (18b) ~~~~~0
-5
5
V
Fig. 1. Bimodal pdf describing the phenomenon observed by the network, for q= 0.25, ,ug = = 2, and a 2= (2/3)2.
is piecewise constant since the terms JD((y -VO,k)ho) approach u(y -vO,k) as ho shrinks towards zero, where u(.) denotes the unit-step function. This is analogous to shrinking the KDE smoothing parameter towards zero in the classical unquantized-measurement scenario, which maximizes the nonparametric likelihood and yields the piecewise-constant empirical distribution function [15, Def. 2.1]. Now, we reduce the number of mixture components by keeping only those indices k that satisfy
PO,k
> 10K0 (17) (corresponding to non-negligible ML mixture probability estimates) and determine K, VMIN, VMAX, nmIN, and nMAX accordingly. Clearly, the estimate of the measurement cdf obtained after the above "pruning" [with h = ho and K mixture components selected using (17), followed by the iteration in (13)] will be close to the cdf estimate based on all Ko mixture components. Applying the pruning rule (17) leads to a compressed representation of the measurement pdf approximation that can be efficiently communicated to destinations other than the fusion center. Employing this compressed representation may reduce pdf estimation error (compared with the pdf estimate that utilizes all Ko mixture components). It also leads to significant computational savings in the smoothingparameter selection method, described in the following section.
B. Cross-Validation (CV) Based Selection of the Smoothing Parameter We now assume that K and vk, k = 1,2,... ,K are known (obtained using the calibration scheme in Section IV-A) and focus on selecting the smoothing parameter h. Hence, in this section, we emphasize the dependence of a = a, (h) and nmIN= nmIN(h), nMAX= nMAX(h) on h.
Let us fix h and denote by p k(h) the ML estimate of the (K - 1) x 1 mixture-probability vector [P1, P2, ,Pk- 1,Pk+1,. ... PK]T obtained using the mixture
D
k(h)
an,-k(h)
n=nmIN )(h)
bn
with 0-k (h) obtained using Fisher scoring iterations, as in (13). To estimate the smoothing parameter h, we minimize the following cross-validation cost function:
ZED_k(h)
with respect to h, where D_k, shown in (20) at the bottom of the page, approximates the Kullback-Leibler distance (up to an additive constant) between the pmf of b1, b2, . , bN and the estimate of this pmf computed using the mixture components with indices 1, 2 ... 1, k-1, k + 1, ... . K.
V. Numerical Examples
To assess the performance of the proposed methods, we consider a region of interest containing N active nodes with random thresholds Tn generated from the discrete uniform pmf 7F(T) described by YMIN =-5, YMAX = 5, and T = 210, see also (2). The measurements Yn at the nodes were generated i.i.d. from the following bimodal pdf (depicted in Fig. 1 for q 0.25,/g =-p,, = 2, and oj2 = (2/3)2):
fy (y)
=
q uniform (y ;
p,ut- 1, p, + l) + (I q) A/(y;
an,k-
pg, orJ2)
(21) and quantized using the random thresholds (as described above) yielding bn, n = 1, 2, ... , N. [Here, uniform(y; a, b) denotes the pdf of a uniform random variable y over the support (a, b).] We then applied the calibration scheme in Section IV with ho = (YMAX -YMIN)/(1ON) and determined K using (17). Finally, for a given h, we computed the ML estimates p of the mixture probabilities using the Fisher scoring algorithm in Section III. All Fisher scoring iterations were initialized with 0(0) set to zero, see (14). Our performance metric is the mean integrated square error (MISE) defined as (see e.g. [14, Ch. 3.12] and [16, Ch. 2]):
MISE = E [J[f (v) - f(y)]2dy]
(22)
YMIN
(Here, the expectation is taken with respect to the quantized data bn, n = 1, ... , N and fy(y) denotes the estimate of fy (y) whose performance we evaluate.) In all nonparametric estimation approaches, both the bias and variance components
l[n_()p-k(h)] + (1- bn ) l[-n,-k(h) p-k(h)] ... .
(19)
k=1
nMAX(h)
[an,i(h), an,2(h)
IK
CV(h)
-
nmIN(h) +
1
(h), an,k+l(h),... an,K(h)] T 762
(20a) (20b)
cn 7v
U!J .0
ffixed h
fixed h
CD.6;
estimated h using CV
-
--ci n
0.7x
KDE
\
KDE
lhilinn n using t\v
a
0.6-
C
0.5 w
CA
0.3 0.2
u
3
0.3
-
0.3 1
cD.20A
02-06
C0.1
0
0 4:0
0.1
0.2
0.4
h
0.6
0.8
fix
6
0.2
--
\
_
_--_
0.2
0
1
0.4
h
0.6
-- --
--X-
0.8
1
,
ed
-- est:imated1 h UsingB CV:\ ~~~~~-
X,
_
---
8
lSD
6 C~~~~~ \
fixed h estimated h using CV
-KODE
7
6
4 w
34
3
3
2
2
0:.2
0.4
0.6
OI
0.8
0.2
0.4
0.6
0.8
h
Fig. 2. MISE of our nonparametric measurement pdf estimator for fixed h as a function of h, our nonparametric pdf estimator with h estimated using CV, and KDE method for unquantized measurements, assuming q 0.25,ug = = 2, a2 = (2/3)2, and (top) N = 500 and (bottom) N 100 active nodes.
Fig. 3. MISE of our nonparametric measurement pdf estimator for fixed h as a function of h, our nonparametric pdf estimator with h estimated using CV, and KDE method for unquantized measurements, assuming q 0.25,ug = = 1, a2 = (2/3)2, and (top) N = 500 and (bottom) N 100 active nodes.
contribute to this MISE (i.e. bias cannot be neglected, as is typically the case in parametric scenarios). We compare the proposed approach with the KDE method, which utilizes the unquantized measurements yn from the sensors to estimate their pdf. The MISE performance of nonparametric methods depends critically on the level of smoothing which, in our case, is controlled by the smoothing parameter h. Hence, in Figs. 2-5, we present the MISE of our nonparametric measurement pdf estimator as a function of h, where we have varied the number of active nodes N as well as q,uup,,ug, and oJ We estimated the MISE using 100 independent realizations of the quantized data, where calibration was performed in each of the 100 trials. The Fisher scoring algorithms converged in 15 iterations (on average). Figs. 2-5 also show the MISE of our nonparametric pdf estimator with h selected using the CV approach in Section IV-B and the MISE of the KDE method. Even though our approach requires only a single-bit transmission per node, its MISE performance is comparable to that of the KDE method, which utilizes unquantized measurements.
observed by a sensor network from quantized data collected at a fusion center. Further research will focus on finding a better (simpler and more accurate) calibration scheme for selecting the smoothing parameter h, analyzing the impact of communication errors between the nodes and the fusion center, utilizing more complex quantization schemes, such as those developed in the context of distributed source coding with side information [18], and developing nonparametric hypothesis tests that will provide reliable inference about the observed phenomenon that is comparable to that achieved using exact parametric models.
VI. Concluding Remarks
Appendix. Kernel Density Estimation If the "raw" unquantized measurements Yi, Y2, YN are available at the fusion center, we estimate the pdf fy(.) using the classical kernel density estimator:
fKDE,hy(y)
=
Nh
Y
E 1=
(A.1)
)
h
y
with standard normal kernel [see (5)] and KDE smoothing hy selected using the normal reference rule [14, Ch. 3.4], [15, p. 135]:
parameter
a nonparametric method for estimating the distribution function of the spatial measurements probability
We derived
763
h= 1.06 Y
mi
{s
Y([o.75N]+1) -Y([o.25N]+1) ~~~~~~~1.34
.
N-
1/5
lun 7. C0.E6
\
0 -7
fixed h estimated h using CV KDE
-
c0.s
Lu
Uw
0.3
C0.2 X
0
w U)n
\
0 1.2
\~~~~
\
0..1
0.2
0.4
, \
6
O1.4
0.i.3
c0.1
7
KDE
0 .5
c 0.4 C
hfixed h ---estimated h using CV
0 1.6
.
h
0.6
O0
0.8
E-~~~~fixedi h -=-~~~~~estimatted h using CV
0.4
h
0.6
0.8
fixed h --- estimated h KDE
\
7
KDE
0.2
using CV
\
6
5
wu
4
w
U) 4
U)
3 3
X.
2
0£.2
2
0.4
0.6
0
0.8
Fig. 4. MISE of our nonparametric measurement pdf estimator for fixed h as a function of h, our nonparametric pdf estimator with h estimated using CV, and KDE method for unquantized measurements, assuming q = 0.50,ug = = 2, a2 = (2/3)2, and (top) N = 500 and (bottom) N = 100 active nodes.
where
Y(1)
< Y) < < , YN,
Yl, Y2,.
~2