Fixed Rank Kriging for Cellular Coverage Analysis - Semantic Scholar

Report 3 Downloads 12 Views
1

Fixed Rank Kriging for Cellular Coverage Analysis arXiv:1505.07062v1 [cs.OH] 26 May 2015

Hajer Braham, Sana Ben Jemaa, Gersende Fort, Eric Moulines and Berna Sayrac

Abstract Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation which relies on geo-located field measurements. These measurements are gathered today during highly expensive drive tests and will be reported in the near future by users equipments thanks to the 3GPP MDT feature (still costly in terms of battery consumption and signaling overhead). In both cases, predicting the coverage on a location where no measurements are available remains a key and challenging task. This paper describes a powerful tool that gives an accurate coverage prediction on the whole area of interest, i.e. a coverage map, by spatially interpolating geo-located measurements using Kriging technique. The paper focuses on the reduction of the computational complexity of the kriging algorithm by applying Fixed Rank Kriging (FRK). The performance evaluation of the FRK algorithm both on simulated measurements and real field measurements shows a good trade-off between prediction efficiency and computational complexity. In order to go a step further towards operational application of the proposed algorithm, a scenario with multiple cells is studied. Simulation results show a good performance in terms of coverage prediction and detection of best serving cell. Keywords Wireless Network, Coverage Map, Radio Environment Map, Spatial Statistics, Fixed Rank Kriging.

I.

I NTRODUCTION

Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation which relies on H. Braham is with Orange Labs research center, Issy-Les-Moulineaux, France and T´el´ecom ParisTech, Paris, France. S. Ben Jemaa and B. Sayrac are with Orange Labs research center, Issy-Les-Moulineaux, France. G. Fort and E. Moulines are with LTCI T´el´ecom ParisTech & CNRS, Paris, France.

2

geo-located field measurements. These measurements are gathered today during highly expensive drive tests (DT) and will be reported in the near future by users equipments thanks to the 3GPP Minimization of Drive Tests (MDT) feature standardized since Release 9 [1]. The reporting of radio measurements together with the best possible geo-location is then automatically reported to the network by the user equipment. Thanks to recent advances in UE geo-localization, in particular, the integration of Global Positioning System (GPS) in the new generation mobile equipments, UE geo-location is rather accurate. Hence, with MDT, network operator has at his disposal a rich source of information that provides a greater insight into the end-user perceived quality of service and a better knowledge of the radio environment. The collection and exploitation of location aware radio information was first introduced as a part of the cognitive radio paradigm, under the environment awareness component of the Cognitive Radio (CR) cycle [2]. Radio Environmental Maps concept was first introduced by Zhao [3] as a database that stores geo-located radio environmental information mainly for opportunistic spectrum access. The REM concept was then extended to an entity that not only stores geo-located radio information but also post processes this information in order to build a complete map. The missing information, namely the considered radio metric in locations where no measurements are available, is then predicted by interpolating the geo-located measurements [4]–[6]. The REM was then studied in the framework of ETSI standardization group ETSI RRS as a tool for the exploitation of geo-located radio measurements for the radio resource management of mobile wireless networks. A technical report dedicated to the definition of use cases for building and exploitation of REM gives the following definition [7]: ”The Radio Environment Map (REM) defines a set of network entities and associated protocols that trigger, perform, store and process geo-located radio measurements (received signal strength, interference levels, QoS measurements [...] ) and network performance indicators. Such measurements are typically performed by user equipments, network entities or dedicated sensors.” In this report, several use cases for REM exploitation in radio resource management are described, coverage and capacity optimization and interference management especially for the introduction of a new technology. Inspired by the geo-statistics area, Kriging technique was applied to REM construction, mainly for coverage prediction and analysis in radio mobile networks [8]–[11]. The computational complexity of the considered algorithm is an important bottleneck of this technique especially

3

as the number of measurements by mobile terminals with the MDT feature will be more and more important. Reducing the complexity of the REM construction algorithm is the main focus of our work. In order to reduce the computational complexity of the kriging algorithm we apply Fixed Rank Kriging (FRK) introduced by Cressie in [12]. FRK offers a trade-off between prediction efficiency and computational complexity. We first introduced the application of FRK to radio coverage prediction in [13] and evaluated its performance. In [14] we applied the FRK algorithm to real drive test measurements in a rural environment. In both papers we evaluated the performance of the algorithm on a single cell with omni-directional antenna. The contribution of this paper can be summarized in the following: •

We introduce the FRK algorithm and explain how we adapt this algorithm to radio coverage data. In particular, we propose an enhancement of the calibration of the FRK algorithm by using the Maximum Likelihood Estimator (MLE) rather than the method of moments proposed in [12].



We evaluate the performances of the proposed algorithm both on simulated and real data measurements. In particular, we go a step further towards operational application of the REM prediction algorithm by considering a multiple cell scenario. In this scenario, we introduce in the model the directivity of the antennas and evaluate both the coverage prediction and the good detection of the best server.

The paper is organized as follows: in Section II we first review propagation models existing in the literature and then introduce a new statistical model based on FRK. Section III focuses on calibration methods for this model: the original method is given and its applicability is discussed; as an alternative, a calibration by Maximum Likelihood is proposed and is solved by using the Expectation-Maximization algorithm. Section IV and V respectively present the numerical analysis in the single cell and multi-cell cases. Section VI summarizes the main conclusions. II.

R ADIO E NVIRONMENT M AP PREDICTION MODELS

A. Review of radio propagation models A radio propagation model describes a relation between the signal strength, the location of the transmitter and the location of the receiver. There are in the literature two different approaches for this description which are respectively derived using analytical and empirical methods [15]. The analytical approach is based on fundamental principals of the radio propagation concept. The

4

empirical one introduces a statistical model and uses a set of measurements to fit this model. The advantage of this second approach is to calibrate the model through actual field measurements. Denote by Z(s) the received power at the receiver end located at s ∈ R2 , expressed in

dBm. The path-loss model, see e.g. [15] (also called in the literature the log-distance model) is

among the analytical approaches. It describes Z(s) as a logarithmically decreasing function of the distance ds between the transmitter antenna location and the receiver location s: Z(s) = p0 − 10κ log10 (ds ),

s ∈ R2 ;

(1)

p0 is the transmitted power in dBm and κ is the path loss exponent. When using this formula to predict the REM, p0 is considered as known since it is one of the antenna characteristic, and κ depends on the propagation environment. For example, κ is in the order of 2 in free space propagation and it is larger when considering an environment with obstacles (see e.g. [15], [16]). The model in Eq. (1) does not take into account the fact that two mobile stations equally distant from the base station, may have different environment characteristics. To tackle this bottleneck, empirical approaches for REM based on a statistical modelisation of a shadowing effect have been introduced. The log-normal shadowing model consists in setting (see [17]) Z(s) = p0 − 10κ log10 (ds ) + ν(s),

s ∈ R2 ,

(2)

where (ν(s))s , introduced to capture de shadowing effect, is a zero-mean Gaussian distributed random variable with variance σ 2 (note that the terminology “log-normal” comes from the fact that the shadowing term expressed in dBm is normally distributed). With this model, the REM ˆ prediction at location s is Z(s) = p0 − 10κ log10 (ds ); p0 , κ and σ are estimated from measured data. The usual technique is the maximum likelihood estimator (note that for the estimation of p0 , κ, in this Gaussian case, the maximum likelihood estimator is the least-square estimator). Both the models (1) and (2) are large-scale propagation models: they do not consider the small fluctuations of the received power due to the local environment. The correlated shadowing model captures these small-scale variations: Z(s) = p0 − 10κ log10 (ds ) + ν(s),

s ∈ R2 ,

(3)

where (ν(s))s is a zero mean Gaussian process with a parametric spatial covariance function

5

(C(s, s0 ))s,s0 . This model implies that two signals Z(s), Z(s0 ) at different locations s, s0 are correlated, with covariance equal to C(s, s0 ). The REM prediction formula based on the model ˆ (3) is known in the literature as the Kriging (see e.g. [18]): the prediction Z(s) is the conditional expectation of Z(s) given the measurements. In this Gaussian shadowing model, this quantity is a linear combination of the observations involving a computational cost O(N 3 ) where N is the number of measurements (see [18, Eq. (3.2.12)] for the prediction formula). Here again, the prediction necessitates the calibration of the parameters. This model is applied to REM interpolation in [10], [19], [20] and this technique has proved to perform accurate prediction performances; different calibration techniques were proposed: by least square, by maximum likelihood [18], [20] or through a Bayesian approach [10], [18]. The models above assume that the received power depends only on the distance between the transmitting antenna and the receiver. Nevertheless, the real case can be more complex: usually operators deploy directional antennas which radiate greater power in some directions. This kind of antenna increases performance on transmitter and receiver especially to cover long ranges. As consequence, the received power depends also on the direction of reception (i.e. the angle between the antenna and the receiver). To fit the model to this new constraint, several papers ¯ proposed to modify the model (2) by adding a term G(s) depending on the mobile location s and modeling the antenna gain (see e.g. [21], [22]): ¯ Z(s) = p0 − 10κ log10 (ds ) + G(s) + ν(s),

s ∈ R2 .

(4)

¯ are proposed, depending on the antenna used for the transmission (for Different gain functions G ¯ depends example, a polar antenna, a sectorial antenna, . . .); see e.g. [22]–[24]. The function G on parameters which are usually considered known. In this paper, we will extend this general model (4) by considering a more general spatial noise ν(s), more precisely the same noise as ¯ to depend on unknown parameters to be in model (3). In addition, we will allow the function G calibrated from the observations. B. A new REM model based on Fixed Rank Kriging We introduce a new model for REM, adapted from the Fixed Rank Kriging model (FRK) introduced by [12] in the geostatistics literature. As discussed in Section II-B and illustrated in Sections IV and V, this model is a good trade-off between computational complexity and

6

prediction accuracy. Z(s) is assumed of the form ¯ Z(s) = p0 − 10κ log10 ds + G(s) + S(s)T η,

s ∈ R2 ,

(5)

where S : R2 → Rr collects r deterministic spatial basis functions and η is a Rr -valued

zero mean Gaussian vector with covariance matrix K. (.)T denotes the transposition and by

convention, the vectors are column-vectors. p0 − 10κ log10 ds − G(s) describes the large scale spatial variation (the trend) and the spatial process (S(s)T η)s is a smooth small-scale spatial

variation. In practice, the number of basis functions r and the basis functions S are chosen by the user (see [12, Section 4] and Section IV-B1 below for a discussion on how to choose these quantities). This REM model is an empirical model, that is the REM prediction is based on measurements: we have N observations Y = (Y (s1 ), . . . , Y (sN ))T of the field process (Z(s))s at locations s1 , · · · , sN , modeled as follows Y (s) = Z (s) + σ ε (s) ,

s ∈ R2 .

(6)

(ε(s))s is assumed to be a zero mean standard Gaussian process, it incorporates the uncertainties of the measurement technique. We assume (η(s))s and (ε(s))s are two independent processes so that the covariance matrix of Y is given by Σ = σ 2 I N + S ? KS T? ,

(7)

where S ? is the N × r matrix S ? = (S(s1 ), . . . , S(sN ))T , and I N denotes the N × N identity matrix.

¯ We consider a specific parametric form for the function G(s): it is assumed to be known ¯ up to a scaling factor ς: we set G(s) = ςG(s). In the case of omni-directional antenna, G is the null function; we give an example of function G for directional antenna in Section V. This model implies that the conditional distribution of (Z(s))s given the observation vector Y is a Gaussian process. Its expectation and covariance functions are respectively given by (see Lemma 2 Appendix A) s 7→ [1, −10 log10 ds , G(s)] α + S(s)T KS T? Σ −1 (Y − T α),

(8)

7

where

(s, s0 ) 7→ S T (s)KS(s0 ) − S(s)T KS T? Σ −1 S ? KS(s0 ),     1 −10 log10 ds1 G(s1 )   p0   ..   .. ..  T = . α = κ. . .      ς 1 −10 log10 dsN G(sN )

(9)

ˆ We use the mean value (8) as the estimator Z(s) for the unknown quantity Z(s). Note that the estimation of (Z(s1 ), . . . , Z(sN ))T is not Y since at locations where we have measurements, the prediction technique (8) acts as a denoising algorithm. This prediction formula involves the inversion of the matrix Σ. By using standard matrix formula (see e.g. [25, Section 1.5 , Eq. (18)]) we have  −1 T Σ −1 = σ −2 I N − σ −2 S ? σ 2 K −1 + S T? S ? S? .

(10)

The key property of this fixed rank kriging model is that it only requires the inversion of r × r

matrices. Therefore, the computational cost for the REM prediction is O(r2 N ) which is a drastic reduction when compared to classical kriging in situations when N is large as it is the case in our framework. The prediction formula also requires the knowledge of (α, σ 2 , K). The goal of the following section is to address the calibration of these parameters. III.

C ALIBRATION OF THE F IXED R ANK K RIGING MODEL

This section is devoted to estimate the parameters (α, σ 2 , K) in the REM model based on FRK. We first expose the method described in the original paper devoted to the FRK model [12]. We also provide a rigorous proof of some weaknesses of this estimation technique pointed out in [26] through numerical experiments. We then propose a second method which is more robust. A. Calibration by a method of moments When proposing the FRK method, Cressie and Johannesson also propose a calibration of the parameters (see [12, eq. (2.8) and Section 3.3.]). α is estimated by the weighted least c of (σ 2 , K) which yields an estimation Σ b of Σ squares estimator: given an estimation (ˆ σ 2 , K) b −1 T )−1 T T Σ b −1 Y . ˆ WLS = (T T Σ (see Eq. (7)), we have α σ 2 and K are estimated by a method of moments. To that goal, the N observations are

replaced with M “pseudo-observations” located at some locations s01 , · · · , s0M and obtained by

8

aggregating the observations in a neighborhood of each location s0l . M , chosen by the user, b M is then associated to these satisfies r < M Q(θ (l) ; θ (l) ) .

(15)

The E- and M-steps are repeated until convergence, which in practice may mean when the difference between kθ (l) − θ (l+1) k changes by an arbitrarily small amount determined by the user [28, Chapter 3]. The derivation of EM in our framework yields ˜ is given by Lemma 1: The EM function θ 7→ Q(θ; θ)

1 1 N ln(σ 2 ) − ln(det(K(υ))) − 2 kY − T αk2 (16) 2 2 2σ  T  h i h i S? S? 1 1 −1 T T ˜ ˜ + K (υ) E ηη |Y ; θ + (Y − T α) S E η|Y ; θ , − Tr ? 2 σ2 σ2 h i h i ˜ and E ηη T |Y ; θ ˜ are detailed in Appendix C. Proposition 2 The calculation of E η|Y ; θ ˜ =− Q(θ; θ)

details the update formula of the parameters (α, σ 2 ). Its proof is in Appendix D. Proposition 2: Given θ (l) , the parameters α and σ 2 are updated as follows:

−1 T   α(l+1) = T T T T Y − S ? E η|Y ; θ (l) ,

    1 2

Y − T α(l+1) 2 + 1 Tr(S T? S ? E ηη T |Y ; θ (l) ) − 2 (Y − T α(l+1) )T S ? E η|Y ; θ (l) . σ(l+1) = N N N

2 We have for any υ, Q(α(l+1) , σ(l+1) , υ; θ (l) ) ≥ Q(θ (l) ; θ (l) ).

The update of υ is specific to each parametric model for K. Upon noting that the first order derivative of υ = (υ1 , · · · , υp ) 7→ Q(α, σ 2 , υ; θ (l) ) w.r.t. υk is given by      T  −1 1 ∂K(υ) ∂K(υ) 1 −1 −1 − Tr K (υ) + Tr K (υ)E ηη |Y ; θ (l) K (υ) 2 ∂υk 2 ∂υk

(17)

11

(see Appendix D), υ(l+1) can be defined as the unique root of this gradient if it is a global maximum. Another strategy is to perform one iteration of a Newton-Raphson algorithm starting from υ(l) with a step size chosen in order to satisfy the EM condition (15). See e.g. [28, Section 4.14] for EM combined with Newton-Raphson procedures. This strategy is used to update one of the parameters in the following example of matrix K. Several examples of structured covariance matrix K can be chosen. Here in the radio cellular context, the shadowing (random) term can be modeled as a zero-mean Gaussian random variable with an exponential correlation model [30]. Thus, we choose the matrix K to be an exponential covariance matrix, defined as ˜ K(β, φ) = β −1 K(φ) ,

0

!

si − s0j ˜ i,j (φ) = exp − , with K exp(φ)

(18)

where s0i − s0j is the Euclidean distance between the two locations s0i and s0j (related to the basis functions, see Section IV-B1). 1/β and exp(φ) are respectively the variance of ηl , 1 ≤ l ≤ r; and

a rate of decay of the correlation (the choice of the parametrization exp(φ) avoids the introduction of a constraint of sign when calibrating φ). We therefore have υ = (β, φ) ∈ R+ ? × R.

For this specific parametric matrix (18), the parameter (β, φ) are updated as follows 2 , the parameters β, φ are updated as follows Proposition 3: Given θ (l) , α(l+1) , σ(l+1)

β(l+1) φ(l+1) = φ(l) +



 T  −1 ˜ = r/Tr K (l) E ηη |Y ; θ (l)

 −1    a(l)  ˜ −1 E ηη T |Y ; θ (l) − I r K ˜ ˜ Tr β(l+1) K (l) (l) ∆ ◦ K (l) H(l)

˜ (l) is a shorthand notation for K(φ ˜ (l) ), ∆ is the r × r matrix with entries (ks0 − s0 k)ij , where K i j

◦ denotes the Hadamard product and

 −1     ˜ ˜ β(l+1) K ˜ l E ηη T |Y ; θ (l) − I r H(l) = −Tr K ∆ ◦ K l  −1     ˜ ˜ l βK ˜ l E ηη T |Y ; θ (l) − I r + exp(−φ(l) )Tr K ∆ ◦ ∆ ◦ K l   2   T  −1 ˜ ˜l ˜ l E ηη |Y ; θ (l) + exp(−φ(l) )Tr K ∆◦K I r − 2β K . l

a(l) ∈ (0, 1) is chosen so that Q(θ (l+1) ; θ (l) ) ≥ Q(θ (l) ; θ (l) ). The proof is given in Appendix E.

12

IV.

A PPLICATIONS TO CELLULAR COVERAGE MAPPING

A. The data sets The performance of our prediction approach is illustrated on two data sets, corresponding to different types of application scenarios. The first dataset consists of simulated measurements generated with a very accurate planning tool, which uses a sophisticated ray-tracing propagation model developed for operational network planning [31]. The data are considered as realistic radio measurements of the ground-truth on the coverage situation over the area of interest. The collected data corresponds to the received pilot power in the Long Term Evolution (LTE) system in an urban scenario located in the Southwest of Paris (France); the collected data are the Reference Signal Received Power (RSRP) values. The environment is covered by a macro-cell with an omni-directional antenna. These measurements are located on a 1000 m×1000m surface, regularly spaced on a cartesian grid consisting of 5m ×5m squares; this yields a total of 40401 measurements (see Figure 3a, where the antenna

location is (595 416m, 2 425 341m)). In order to model the noise measurements, a zero mean Gaussian noise with variance equal to 3dB is added to these data. This yields what we called in Section II-B the process {Y (s), s ∈ D}, where D ⊂ R2 .

The second dataset corresponds to real measurements reported from Drive Tests (DT) done by

Orange France teams, in a rural area located in southwestern France. The Base Station (BS) is about 30 m height and covers an area of 22km×10km. 7800 measurements have been realized in the 800 MHz frequency band using a typical mobile phone equipment connected to a software tool for data acquisition. The DT data are collected using a vehicle driving on roads in the studied area; the locations of these measurements are shown on Figure 4a - note that they are along the roads and the antenna is located at (408 238m, 1 864 600m). We will refer to this data set as the rural data set. For the validation of our approach we consider a split of each data set into a learning set and a test set. The parameters are learned using the data in the learning set, by using the calibration technique described in Section III-B. The performances are then evaluated using the data in the test set. In order to make this analysis more robust to the choice of the learning and test sets, we perform a k-fold cross validation [32]. Here, we choose k = 5 for both the urban and rural data sets, with a uniform data sampling of the subsets (typical values for k are in the range 3

13

to 10 [27, see Section 5.3.]). Therefore, at each step of this cross-validation procedure, we have a learning set consisting of 80% of the available measurements both for the urban and the rural data sets (making learning sets of 32000 and 6000 locations for the urban and rural data sets respectively). Some of the numerical results are given for a single split of the data set into the learning and the test set; other ones are averaged values over the k splits. B. Implementation of the FRK prediction algorithm 1) Choice of the basis functions S: The basis functions s 7→ S(s) = (S1 (s), . . . , Sr (s)) and

their number r both control the complexity and the accuracy of the FRK prediction technique.

Following the suggestions in [12], we choose the l-th basis function s 7→ Sl (s) as a symmetric function centered at locations s0l : Sl is a bi-square function defined as      1 − (ks − s0 k /τ )2 2 , if ks − s0 k 6 τ , l l Sl (s) =  0, otherwise .

(19)

The parameter τ controls the function expansion. Figure 1a shows a two-dimensional view of

such a function Sl . With this choice of basis functions, we have from (5) (recall that G(s) = 0 since the antenna is omni-directional) Z(s) = p0 − 10κ log10 ds +

r X l=1

ηl



ks − s0l k2 1− τ2

2

1ks−s0l k≤τ

where η = (η1 , · · · , ηr ) is a zero mean Gaussian process with covariance matrix K. In the

numerical applications below, the centers of the basis functions s0l and their number r are chosen as follows: rmax functions are located on a Cartesian grid where the elements are unit squares of size τ × τ covering the whole geographic area of interest; then we remove a function Sl if

none of the N observation locations s1 , · · · , sN are in a τ -neighborhood of the center s0l . Thus, there remain r basis functions. Note that the supports of the functions Sl and Sk overlap if and only if s0k is in the 8-neighborhood of s0l . On Figures 1b and 1c, we show the locations of the N observations (red circle) and the locations of the r basis function centers (blue crosses) for two different data sets. In Figure 1b, τ = 100m and r = rmax while in Figure 1c τ = 250m, rmax = 2660 and r = 467.

1870000 1866000

2425400

1862000

0.2

2425000

0.8 0.6 0.4

0.4

0.0

0.6

0.2

1.0 0.8

0.0

2425800

1.0

14

0.0 0.2

0.4

0.6

0.8

1.0

595000 595200 595400 595600 595800

(a) Bi-square function in 2D view

400000

(b) Urban scenario, 1km ×1 km area

405000

410000

415000

420000

(c) Rural scenario, 22km ×10km area

Fig. 1. [left] Bi-square function. [center, right] locations of the N observations (red circles) and locations of the r centers s0l of the basis functions (blue crosses).

2) EM implementation: The theory on EM convergence does not imply any conditions on the initial value θ (0) [29]; the limiting points of the EM sequences are the stationary points of the log-likelihood of the observations Y . We did not observe that the initialization of the parameters play a role on the limiting value of our EM runs. A natural initial value for α is the −1 T Ordinary Least Square estimator given by α(0) = T T T T Y . We choose φ(0) large enough ˜ (0) ) looks like the identity matrix; in practice, we choose τ / exp(φ) in so that the matrix K(φ

4.6 4.4 4.2

Evolution of parameter φ

3.8

4.0

0.16 0.14 0.12 0.10

3.6

0.08

Evolution of parameter β

0.18

4.8

0.20

the order of 5. Finally, we compute the empirical variance V of the components of the residual

0

20

40

60 Iterations

(a)

80

100

0

20

40

60

80

100

Iterations

(b)

Fig. 2. Evolution of the EM algorithm for the estimation of the parameters β and φ (case : Urban data set with r = 676 and N = 32000) −1 2 vector Y − T α(0) and choose β(0) + σ(0) = V. Roughly speaking, we start from a model with

uncorrelated shadowing term. The algorithm is stopped when θ (l) − θ (l−1) < 10−5 over 100

15

successive iterations. In Figures 2a and 2b we show the EM output when estimating the parameters β and φ; we have similar plots for the other parameters α and σ 2 and they are omitted. In Table I, we report the estimated parameters at the iteration j of the EM algorithm for the urban scenario and for different values of j; in this example, EM was stopped after 109 iterations. It can be seen from the expression of the log-likelihood (14) of the observations Y that the MLE b −1 T )−1 T T Σ b −1 Y ; we checked numerically that the parameters obtained at ˆ = (T T Σ satisfies α convergence of our EM runs satisfy this equation. TABLE I.

(U RBAN DATA SET ) EM ITERATES WHEN τ = 50M , r = 676 AND N = 32000

EM Iteration (j) 1 109

2 σ(j) α(j) β(j) φ(j) 32.542 39.606 3.220 0.200 4.907 24.484 3.105 2.967 0.091 3.632

C. Prediction Error Analysis In order to evaluate the prediction accuracy, we compare the measurements Y (s) to the predicted values Yˆ (s) from the model (6). We consider the locations s in the test set T . The

model (6) implies that the conditional expectation of Y (s) given Y at such locations s is equal

to the conditional expectation of Z(s) given Y (see Lemma 2 in Appendix A). Therefore, for ˆ − Y (s) where Z(s) ˆ any s ∈ T , the error (with sign) is Yˆ (s) − Y (s) = Z(s) is given by (8).

We now evaluate the Root Mean Square Error (RMSE) which is one of the commonly used

prediction error indicators (see e.g. [33]), defined as "

2 1 Xˆ RMSE = Y (s) − Y (s) |T | s∈T

# 21

,

(20)

where |T | denotes the number of observations in the test set T . The RMSE is computed for each of the k successive test sets in the cross-validation analysis. In Table II, we report the

mean value of the RMSE over the k partitions and its standard deviation in parenthesis. We compare different strategies for modeling the observations (Y (s))s and for the calibration of the model. We first consider the log-normal shadowing model (see (2)) when the parameters p0 , κ, σ 2 are estimated by MLE; the results are reported in the first column of Table II. Then,

16

we consider the FRK model (see section II-B) when the parameters are estimated by MLE (see Sections III-B and IV-B2) for two different values of the number of basis functions r. We compare two strategies for the estimation of Y (s): the first one is the strategy described above (the results are reported in columns “Kriging” of Table II) and the second one consists in estimating Y (s) by the path-loss part p0 − 10κ log10 ds which is also E[Y (s)] (the results are reported in the last column; here the number of basis functions is r = 676 for the urban

data set and r = 1050 for the rural data set). All these comparisons are done for the urban and the rural data set. This table shows that the FRK model with kriging prediction improves on TABLE II.

M EAN RMSE

IN D B AND STANDARD DEVIATION IN PARENTHESIS , FOR THE URBAN DATA SET ( TOP ) AND THE RURAL DATA SET ( BOTTOM )

Log-normal model path-loss prediction RMSE (dB)

6.898 (± 0.036)

RMSE (dB)

9.68 (± 0.22)

FRK model Kriging path-loss prediction r = 121 r = 676 5.648 4.848 6.988 (± 0.014) (± 0.066) (± 0.0288) r = 170 r = 1050 5.32 3.64 9.33 (± 0.22) (± 0.18) (± 0.23)

other strategies. For the rural scenario, FRK with kriging prediction yields a considerably low RMSE (in the order of 3-5 dB) when compared to both of the path-loss approaches which have

−80

−80

−110

595400

595800

(m)

(a) Measurements of the field (Z(s))s in an additive noise. Fig. 3.

Urban data set

−130 −140 595000

595400

595800

(m)

ˆ (b) Estimated field (Z(s)) s on a regular grid.

0.5

2425000

−140 595000

1.0

−120 2425000

2425000

−120

(m)

−100

2425400

2425400

(m)

(m)

2425400

−100

−130

1.5

−90

−90

−110

2425800

2425800

2425800

a RMSE in the order of 9 dB. For the urban data set, we have similar results. We conclude

595000

595400

595800

(m)

(c) Standard deviation at each location.

17

this numerical section by showing the coverage maps obtained by applying the FRK method described in Section II-B. The model is fitted by using N = 32000 observations for the urban ˆ data set and N = 6000 for the rural data set. The predicted map (Z(s)) s is estimated by the ˆ conditional expectation (8). In Figure 3b (resp. Figure 4b), the value Z(s) is displayed for the urban data set (resp. rural data set) at locations s in the data set (that is locations on a regular grid for the urban data set and locations along roads for the rural one). For the rural data set, we ˆ also display on Figure 4c the values Z(s) for s on a regular grid. It is easily seen from Figure 3b ˆ that our model is not only a path-loss model (i.e. Z(s) is not only of the form p0 − 10κ log10 ds )

and allows spatial inhomogeneity. When the location s is far from any center s0l of the basis

functions - which, due to the way these centers are chosen, (see section IV-B1), that s is far from any observation location si - then it is expected than the prediction can not benefit from the spatial correlation information. This intuition is confirmed by the prediction formula (8) which ˆ states that Z(s) = p0 − 10κ log10 ds when ks − s0 k > τ for any l (since S(s) = 0); note that l

there are many such locations s in the rural data set (see the estimated field on Figure 4c). In order to illustrate the accuracy of our prediction technique, we also compute for each location s the conditional standard deviation of Z(s) given the observations Y (this quantity is given by (9)). We report these values on Figure 3c for the urban data set: the standard deviation is relatively small (not larger than 0.7) compared to the values of the estimated field. We notice a

−120

400000

410000 (m)

(a) Measurements of the field (Z(s))s in an additive noise. Fig. 4.

Rural data set

1870000 (m)

−120

400000

410000 (m)

−80

1866000

1870000

−100

−60

−100

1862000

−100

−80

1862000

(m)

1866000

−80

−60

1866000

−60

1862000

(m)

1870000

similar accuracy for the rural data set but due to space limitations we do not report results.

−120

400000

410000 (m)

ˆ ˆ (b) Estimated field (Z(s)) s at some (c) Estimated field (Z(s))s on a regular locations. grid

18

V.

REM FOR COVERAGE ANALYSIS IN MULTICELLULAR NETWORK

A Mobile Equipment (ME) measures the received power for (potentially) several Base Stations (BS) in order to choose the best serving one: the ME is always searching for the cell providing the strongest received pilot power, this process is called the cell selection. In LTE, cell selection is applied by comparing the instant measured RSRP from all potential cells and choosing the cell providing the highest RSRP value [34]. Operators do not have the whole information on the coverage map and have only a partial map that contains coverage information at some known locations. In this section, we propose a new technique for REM reconstruction in this new setting: in Section IV, we considered situations with a single BS covering the geographic area while in this section, there are many BSs over the considered geographic area. We adapt the model described in Section II in order to address this multi-cell scenario; in section V-B, we introduce the new model and the associated calibration and prediction techniques. We then apply this method on a simulated data set described in Section V-A. A. The data set The data are provided by a very accurate Orange planning tool mentioned in section IV-A [31].This planning tool calculates RSRP values in a sub-urban environment shown in Figure 5a, consisting of 12 antennas grouped into 4 sites of 3 directional antennas. The inter-site distance is bigger than 1km. The antennas are tri-sectored. The RSRP values are computed over a regular grid of size 25m×25m over a 12.4 km2 geographic area, which results in a total of 20 008 measured locations; and it is realized over a 2.6GHz frequency band. The planning tool applies the cellselection described above and at each location on the regular grid, it returns both a RSRP value and the ID of the serving cell. We report these data on Figure 5b: the left plot displays the RSRP values and the right one shows the cell ID (the values are integers in the range {1, · · · , 12}). B. Coverage map construction In this section, we propose a new model for the multi-cell scenario which is based on the FRK model described in section II-B. The idea is to cluster the measurements with respect to the BS/cell ID, introduce a FRK model for each cluster, calibrate and predict each model independently and then combine the REM predictions in order to obtain a single predicted REM. Roughly speaking, in this new model, it is claimed that spatial correlation between two locations

19

−60

2416000 2414000

−70 −80 −90

765000

766000

2413000

2413000

(m)

2415000

−50

2415000

−40

2414000

(m)

2416000

−30

767000

(m)

(a)

765000

766000

767000

(m)

(b)

(c)

Fig. 5. Base Stations localizations (a) vis collected data set: for each locations on a regular grid, (b) the RSRP map (c) the id of the best serving cell taking values in {1, · · · , 12}.

attached to distinct BSs is relatively small and can be neglected. The antennas used for each BS in our scenario are tri-sectored. Therefore, we propose to use the model introduced in (5) with ¯ the received power at location s from the i-th BS, denoted by Zi (s), is given by a non null G:   p − 10κ ln (d ) + ς G (s) + S(s)T η if s ∈ D 0,i i 10 s,i i i i i Zi (s) =  0 otherwise

(21)

where Di ⊆ R2 , p0,i is the transmitted power of the i-th BS, κi is the path loss exponent corresponding to the i-th BS, ds,i is the distance from s to the i-th BS. ςi Gi (s) is the antenna

gain relative to the mobile location s and antenna location sBSi . η i is a Gaussian variable with zero mean and covariance matrix K i . We can choose Di 6= R2 to model geographic area which are not covered by the i-th BS.

Since modeling a real antenna with precise value of the main and side-lobes is difficult and deviates from the focus of our study, we propose to use an approximated antenna pattern proposed in the 3GPP standard [35]. Here we consider only the horizontal gain since we are using 2D model. The approximated antenna gain is denoted as Gi (s) and it is given by "

Gi (s) = − min 12



ψs,i ψ3dB

2

#

, Am ,

(22)

where ψs,i is the angle between ME location, s, and the i-th BS azimuth. Here ψ3dB denotes the angle at which the antenna efficiency is 50% and Am is the maximum antenna gain. For a

20

tri-sectorial antenna, the parameter ψ3dB is usually taken equal to 65◦ and Am = 30dB. Assume we have observations Yi (s) which are noisy measurements of Zi (s): Yi (s) = Zi (s) + σi εi (s) where {i (s), s ∈ R2 } is a zero mean standard Gaussian process, independent of η i . The index i refers to the base station index. If we carry out Ni observations in one column vector

Y i = (Yi (s1,i ), · · · , Yi (sNi ,i ))T , we define Y i = T i αi + (S ? )i η i + εi where η i and εi are the same used in section II-B. In fact, the random term of our model is not modified (or affected), we keep the same definition of covariance function and prediction equations. Since the antenna gain depends on the position of the ME, we need to modify only the deterministic part where T and α becomes equal to     1 −10 ln10 (ds1 ,i ) G(ψs1 ,i )   p0,i   ..    .. .. T i = .  , and αi =  κi  . . .     ςi 1 −10 ln10 (dsNi ,i ) G(ψsNi ,i )

(23)

The parameters p0,i , κi , σi , ςi and K i are unknown and are estimated from Ni observations Y i = (Yi (s1,i ), · · · , Yi (sNi ,i )) from the cluster i by applying the EM technique described in Section III-B (see also IV-B for the implementation). In Figure 6a we show for one of the

antennas the variation of s 7→ p0,i − 10κi ln10 ds,i + ςs,i . We notice that the T i αi is bigger

−100

764500

765500

766500

767500

(m)

(a) s 7→ T i αi for a given BS.

Fig. 6.

764500

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●locations ● ● ● ● ● ● ● ● ● ● ● Observations ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Initial Basis functions ● ● ● ● ● ● ● ● ● ● ● ● ● ●grid ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Final locations ● ● ● Basis ● ● ● ● functions ● ● ● ● ● ● ● ● ● ● ● ● ●

765000

765500

766000

766500

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

767000

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

767500

(m)

−20

(m)

2416000 (m)

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2413000 2414000 2415000 2416000

2413000

−80

2414000

−60

2413000

2415000

−40

2415000

−20

2414000

(m)

2416000

2417000

in the direction of the antenna spread which is expected. Now, applying the prediction for-

−40 −60 −80 −100 −120 −140 765000

766000

767000

(m)

(b) For a given BS: the associated ob- (c) For a given BS: estimated field servations Y i (red crosses) and the loca- (Zˆi (s))s , for s ∈ Di (the black point tions of the basis functions (black dots). are locations not included in Di ).

Different analysis aspect with respect to one BS of the 12 given BSs.

mula (8) with the parameters replaced by their estimation, we obtain a REM reconstruction {Zˆi (s), s ∈ R2 and i ∈ {1, . . . , 12}} for each BS; due to the cell selection mechanism described

21

above for ME, we propose to define the final REM as follows: ˆ Z(s) = argmax Zˆi (s)

at each location s

i s.t. s∈Di

where Zˆi (s) = E[Zi (s)|Y i ] (for s ∈ Di ). In Figure 6c, we display {Zˆi (s), s ∈ Di } for one BS.

Here we choose Di as the locations which are in the direction of the i-th antenna radiation and

at a maximal distance of 0.5km from the nearest observed location (the correlation distance is assumed to be less than 500m) C. Numerical results We first split the dataset into a learning set and a test set: 16 000 locations are drawn uniformly (over the 20 008 available measurements) to form the learning set. Based on the BS/cell ID, these 16 000 points are clustered into 12 subsets of size between 1000 and 3500. In Figure 6b we display the subsets associated to three BSs. We observe that the observations for each sub-model are not uniformly distributed over the geographical area of interest. Therefore, it is not relevant to choose a set of basis functions for each sub-model. We propose the same initial basis functions for the 12 sub-models (with τ = 150 and rmax = 588). For each sub-model, some of the basis functions are canceled as described in Section IV-B1; we get finally an irregular grid location of basis functions which is specific to each sub-model (see the blue circles and black dots in Figure 6b). We use the same basis functions as proposed in the previous examples (see Eq. (19)). For performance evaluation, we consider an omni-directional antenna model (similar to the one in section IV). We compare the predicted cell ID for each location s (that is the index j such ˆ that Z(s) = Zˆj (s)) with the real one. We obtain an error rate of 53 % over the locations s in the test set. When we consider the domain clustering introduced in (21) and keep the assumption on antenna omni-directionality, the error rate on cell ID selection is 31.23% over the test set locations. Finally, we consider the directional model together with the same domain restriction Di . The error rate is drastically decreased to 12.64%. The global error on the whole area (test and learning sets) is 2.53%.

VI.

C ONCLUSION

In this work we have studied the performance of the Fixed Rank Kriging (FRK) algorithm applied to coverage analysis in cellular networks. This method has a good potential when

22

performing prediction using massive data sets (order of thousands and higher) as it offers a good trade-off between prediction quality and computational complexity compared to classical Kriging techniques. This study has been performed using field-like measurements obtained from an accurate planning tool and real field measurements obtained from drive tests. In addition we have adapted the model to a more practical application. In this scenario, we used fieldlike measurements over multiple cells with directive antennas. Simulation results show a good performance in terms of coverage prediction and detection of best serving cell. Our ongoing research focuses on extending the model to take into account the location uncertainty and study its impact on the prediction performances. A PPENDIX A S OME RESULTS ON G AUSSIAN PROCESSES Let a real-valued stochastic process {Z(s), s ∈ S} where S is an index set. It is a Gaussian

process (GP) with mean function {µ(s), s ∈ S} and covariance function {C(s, s0 ), s, s0 ∈ S} iff

for any n and any distinct values s1 , . . . , sn ∈ S, the random vector (Z(s1 ), . . . , Z(sn )) is a Gaussian random vector with mean vector (µ(s1 ), . . . , µ(sn )) and covariance matrix {C(si , sj ), 1 ≤ i, j ≤ n}. A standard GP is a Gaussian process with null mean function and covariance function

given by C(s, s0 ) = δs,s0 . δs,s0 is the Kronecker delta function defined by δs,s0 = 0 if s 6= s0 and 1 otherwise.

Lemma 2: Let {Z(s), s ∈ S} be a GP with mean function {µ(s), s ∈ S} and covariance

function {C(s, s0 ), s, s0 ∈ S}, and fix n distinct points s1 , · · · , sn ∈ S. Denote by µ and C

the n × 1 mean vector and the n × n covariance matrix of the vector (Z(s1 ), . . . , Z(sn )). Let

{Y (s), s ∈ S} be a process defined for any s ∈ S by Y (s) = Z(s) + σε(s), where σ > 0 and (ε(s))s is a standard GP independent of {Z(s), s ∈ S}. Then

(i) {Y (s), s ∈ S} is a GP with mean function {µ(s), s ∈ S} and covariance function {C(s, s0 )+ σ 2 δs,s0 , s, s0 ∈ S}.

(ii) For any s ∈ S, the vector (Z(s), Y ) is a Gaussian vector. Conditionally to Y , the expectation of Z(s) is E [Z(s)|Y ] = µ(s) + C(s, .)T (C + σ 2 I) −1

variance matrix is cov(Z(s)|Y ) = C(s, s) − C(s, .)T (C + σ 2 I) (C(s, s1 ), . . . , C(s, sk ))T .

(iii) For any s ∈ S, E [Y (s)|Y ] = E [Z(s)|Y ],

−1

(Y − µ) and its co-

C(s, .) where C(s, .) =

23

Proof: (i) Let n and s1 , · · · , sn ∈ S be fixed. Set Y = (Y (s1 ), · · · , Y (sn )). We prove that

for any λ = (λ1 , · · · , λn ) ∈ Rn , λT Y is a Gaussian random variable with expectation λT µ and

variance λT (C + σ 2 I)λ. Pn Pn Pn λT Y = i=1 λi Z(si ) + σ i=1 λi ε(si ). Since {Z(s), s ∈ S} is a GP, i=1 λi Z(si ) is a gaussian random variable with mean λT µ and covariance λT Cλ. Since the r.v. {ε(s), s ∈ S} are P P independent, σ ni=1 λi ε(si ) is a centered Gaussian random variable with variance σ 2 ni=1 λ2i .

Finally, the processes are independent, which implies that λT Y is a Gaussian random variable,

with mean and variance equal resp. to the sum of the means and to the sum of the variances. (ii) Let A = (A1,1 , · · · , A1,n ) be a 1×n deterministic matrix. As a preliminary result, we prove

that (Z(s) + AY , Y ) is a Gaussian random vector. For any λ = (λ1 , · · · , λn+1 ) ∈ Rn+1 , we P have (Z(s) + AY , Y ) λ = −σλ1 ε(s) + λ1 Y (s) + ni=1 (λi+1 + λ1 A1,i ) Y (si ). Since {Y (s), s ∈ P S} is a GP, the random variable λ1 Y (s) + ni=1 (λi+1 + λ1 A1,i ) Y (si ) is Gaussian; and since ε(s) is a Gaussian random variable independent of {Y (s), s ∈ S} then the linear combination

(Z(s) + AY , Y ) λ is Gaussian. This establishes that (Z(s) + AY , Y ) is a Gaussian random vector. When applied with A equal to the null vector, we have the first claim. −1

Choose A = −C(s, .)T (σ 2 I + C)

and set Z 0 (s) = Z(s)+AY . Then E[Z 0 (s)] = µ(s)+Aµ

and its variance is given by

Var(Z 0 (s)) = Var(Z(s)) + AVar(Y )AT + 2Acov(Y , Z(s)) −1 = C(s, s) − C(s, .)T σ 2 I + C C(s, .)

where we used that cov (Z(s), Y ) = cov (Y (s), Y ) − σcov (ε(s), Y ) = C(s, ·)T since ε(s) and

Y are independent and E[ε(s)] = 0. Let us prove that Z 0 (s) and Y are independent. Since they are jointly Gaussian, we prove that they are uncorrelated:  cov(Z 0 (s), Y ) = cov(Z(s), Y ) + Acov(Y , Y ) = C(s, .)T + A σ 2 I + C = 0.

The independence of Z 0 (s) and Y implies

E(Z(s)|Y ) = E(Z 0 (s)|Y ) − AE(Y |Y ) = E[Z 0 (s)] − AY = µ(s) − A(Y − µ). In addition, this independence also implies

24

cov(Z(s)|Y ) = cov(Z 0 (s) − AY |Y ) = cov(Z 0 (s)|Y ) = Var(Z 0 (s)), which concludes the proof. (iii) E [Z(s)|Y ] = E [Y (s)|Y ] + σE [ε(s)|Y ]. Since ε(s) and Y are independent and ε(s) is centered, E [ε(s)|Y ] = E [ε(s)] = 0. A PPENDIX B P ROOF OF P ROPOSITION 1 We recall some notations introduced in [12, Appendix A], which will be useful for the proof of Proposition 1. Let Wlj be the weight associated to the observation Y (sj ) in the neighborhood of the bin center s0l (see [12] for the expression of these non negative weights). We set W j = (Wj1 , . . . , WjN )T . To the vector of residual D = (D(s1 ), · · · , D(sN ))T defined by D = Y − T (T T T )−1 T T Y ,

an aggregated vector of residuals D = (D(s01 ), · · · , D(s0M ))T is associated through PN Wli D(si ) W Tl D 0 = D(sl ) = i=1 ; PN W Tl 1N i=1 Wli

b M is a M × M matrix defined by (see [12, Eq. (A.2)]) 1N is the N × 1 vector of ones. Then Σ   V (s0 ), l = k, l b Σ M (l, k) = (24)  D(s0 ) D(s0 ), l 6= k, l k where V (s0l ) =

PN

i=1

Wli D(si )2 /(W Tl 1N ).

Proof of Proposition 1: (i) Let µ = (µ1 , · · · , µM ) ∈ RM . From (24), bMµ = µT Σ

M X l=1

!2

µl D(s0l )

+

M X l=1

M  X  µ2l V (s0l ) − D(s0l )2 ≥ µ2l V (s0l ) − D(s0l )2 . l=1

b M µ ≥ 0. The Jensen’s inequality implies that V (s0l ) ≥ D(s0l )2 for any l thus showing that µT Σ

Note also that this term is positive for any non null vector µ iff VD (s0l ) − D(s0l )2 > 0 for any l. b M is a covariance matrix, there exists an orthogonal M × M matrix U and a (ii) Since Σ

25

b M = U ΛU T . We have diagonal M × M matrix Λ with diagonal entries (λi )i such that Σ M   X Tr (I M − QQT )U ΛU T = Tr U T (I M − QQT )U Λ = B ii λi , i=1

where B = U T (I M − QQT )U . Assume that B ii ≥ 0 for any i. Then    T T Tr (I M − QQ )U ΛU ≥ inf λj Tr(B). j:B jj >0

 Since Tr(B) = Tr((I M − QQT )U U T ) = Tr(I M − QQT ), we have σ ˆ 2 ≥ inf j:B jj >0 λj .

Let us prove that B ii ≥ 0 for any i: for µ ∈ RM , µT Bµ = kU µk2 − kQT (U µ)k2 and this

term is nonnegative since QT (U µ) is the orthogonal projection of (U µ) on the column space of Q (or equivalently, of S M ). This equality also shows that T {j : B jj > 0} = {j : ∃v ∈ Vj , kvk2 > kQT vk2 } = {j : ∃v ∈ Vj , k(S ⊥ M ) vk > 0} .

c is (iii) Since S M is a full rank matrix, R is invertible. Therefore, from (12), it is trivial that K bM − σ positive definite iff QT (Σ ˆ 2 I M )Q is positive definite. Since QT Q = I r , we have for any µ ∈ Rr , µ 6= 0,

bMQ − σ b M Q)µ > σ µT (QT Σ ˆ 2 I r )µ > 0 ⇐⇒ µT (QT Σ ˆ 2 kµk2 .

b M is positive definite iff for any l, W l Remark.: It can be seen from the proof of (i) that Σ

are at least two non null components (say il , jl ) such that D(sil ) 6= D(sjl ). A PPENDIX C P ROOF OF L EMMA 1

Since (η(s))s and (ε(s))s are independent, the conditional distribution of Y given η is a Gaussian distribution N (T α + S ? η, σ 2 I N ). The conditional log-likelihood is therefore given by, up to an additive constant, ln Pr(Y |η; θ) = −

1 N ln(σ 2 ) − 2 kY − T α − S ? ηk2 . 2 2σ

(25)

In addition, the distribution of η is N (0, K(υ)) so that the log-likelihood is given by, up to an additive constant,

26

1 1 ln Pr(η; θ) = − ln(det(K(υ))) − η T K −1 (υ)η . 2 2

(26)

From (25) and (26), the Bayes formula implies that, up to an additive constant,

1 1 N ln(σ 2 ) − ln(det(K(υ))) − 2 kY − T αk2 2 2 2σ  T  1 1 T S? S? −1 T + 2 (Y − T α) S ? η − η + K (υ) η . (27) σ 2 σ2 h i h i ˜ and E η T Aη|Y ; θ ˜ for some deterministic matrix A. From It remains to compute E η|Y ; θ ln Pr(Y , η; θ) = −

(27), observe that the conditional distribution of η given Y is a Gaussian distribution with mean h i −1 T ˜ = ST S? + σ ˜ , E η|Y ; θ ˜ 2 K −1 (˜ υ) S ? (Y − T α) ?

and covariance matrix

−1 h iT  S T S ? ? −1 ˜ − E η|Y ; θ ˜ E η|Y ; θ ˜ = + K (˜ υ) . E ηη |Y ; θ σ ˜2 h

T

i

h

i

The proof is concluded by using the matrix expectation property (see e.g. [36]),     E η T Aη = Tr(AE (η − E[η])(η − E[η])T ) + E[η]T AE[η],

(28)

for any symmetric deterministic matrix A.

A PPENDIX D P ROOF OF P ROPOSITION 2 a) Update of the parameter α: We choose α(l+1) such that 2 2 , υ(l) ; θ (l) ) ≥ Q(α, σ(l) , υ(l) ; θ (l) ) . ∀α , Q(α(l+1) , σ(l)

(29)

It is easily seen from (16) that for any σ 2 , υ, α 7→ Q(α, σ 2 , υ; θ (l) ) is a quadratic function with leading term −αT T T T α/(2σ 2 ). Therefore, it possesses an unique maximum which is given by the root of the first order derivative of Q. The expression of α(l+1) follows from the equation ∂α Q(θ; θ (l) ) =

  1 T 1 T (Y − T α) − 2 T T S ? E η|Y ; θ (l) . 2 σ σ

(30)

27 2 b) Update of the parameter σ 2 : We choose σ(l+1) such that 2 ∀σ 2 , υ, υ 0 , Q(α(l+1) , σ(l+1) , υ; θ (l) ) ≥ Q(α(l+1) , σ 2 , υ 0 ; θ (l) ) .

(31)

It is easily seen from (16) that for any (α, υ), σ 2 7→ Q(α, σ 2 , υ; θ (l) ) is given, up to an additive constant by

σ 2 7→ −

 1  N ln σ 2 − 2 E kY − T α − S ? ηk2 |Y ; θ (l) . 2 2σ

The first order derivative of this function is −

 N 1  2 + E kY − T α − S ηk |Y ; θ , ? (l) 2σ 2 2σ 4

2 2 2 showing that the function increases on (0, σ(l+1) ) and decreases on (σ(l+1) , +∞) where σ(l+1) is

the root of the first order derivative. c) Update of the parameter υ: It is easily seen from (16) that for any (α, σ 2 ), υ 7→

Q(α, σ 2 , υ; θ (l) ) is given, up to an additive constant by

h i 1  −1 1 T ˜ . − ln(det(K(υ))) − Tr K (υ) E ηη |Y ; θ 2 2

The expression of the gradient easily follows from the chain rule ∂J(K(υ)) = Tr ∂υk



∂J(K(υ)) ∂K

T

∂K(υ) ∂υk

!

and the following results, where A, B and X are matrices (see e.g. [37, Eqs.(49) and (63),Chapter 2]): T ∂ det(X) = det(X) X −1 , ∂X

T ∂Tr(AX −1 B) = − X −1 BAX −1 , ∂X A PPENDIX E

P ROOF OF P ROPOSITION 3 d) Update of β: From (17), for any (α, σ 2 , φ), the first order derivative of β 7→ Q(α, σ 2 , β, φ; θ (l) )

is given by

 T  r 1 ˜ −1 − Tr K(φ) E ηη |Y ; θ (l) . 2β 2

(32)

28

   −1 ˜ The root of the gradient (32) is r/Tr K(φ) E ηη T |Y ; θ (l) , which is positive upon noting  h i  T  −1 T ˜ −1 ˜ that Tr K(φ) E ηη |Y ; θ (l) = E η K(φ) η|Y ; θ (l) . A trivial study of the gradient (32) shows that this root is the maximum of the function β 7→ Q(α, σ 2 , β, φ; θ (l) ). Defining β(l+1)

as the root of the gradient with φ = φ(l) yields Q(α(l+1) , σ (l+1) , β(l+1) , φ(l) ; θ (l) ) ≥ Q(θ (l) ; θ (l) ).

e) Update of φ: It is not trivial to find a (local) maximum of φ 7→ Q(α(l+1) , σ (l+1) , β(l+1) , φ; θ (l) ).

We therefore update φ by one step of a Newton-Raphson algorithm [28]. This gives ∂φ Q(θ, θ (l) ) θ=α ,σ2 ,β ,φ  (l+1) (l+1) (l+1) (l) , φ(l+1) = φ(l) − a(l) 2 ∂φ Q(θ, θ (l) ) θ=α ,σ2 ,β ,φ  (l+1)

(l+1)

(l+1)

(33)

(l)

where 0 < a(l) 6 1. We have

˜ ∂K 1 ˜ = ∆ ◦ K, ∂φ exp(φ)

˜ ˜ ˜ ∂ 2K ∂K 1 ∂K = − + ∆ ◦ ∆ ◦ . ∂ 2φ ∂φ exp(φ) ∂φ

The expression of the first order derivative now follows from (17); and the expression of the second order derivative follows from   2  T  1 −1 ∂ K −1 2 (K E ηη |Y ; θ (l) − I r ) ∂φ Q(θ, θ (l) ) = Tr K 2 ∂ 2φ    T  1 −1 ∂K −1 ∂K −1 + Tr K K (I r − 2K E ηη |Y ; θ (l) ) . 2 ∂φ ∂φ The step size a(l) is chosen so that Q(θ (l+1) ; θ (l) ) ≥ Q(θ (l) ; θ (l) ). ACKNOWLEDGMENT The authors would like to acknowledge Emmanuel De Wailly and Jean-Francois Morlier for their help in data acquisition. R EFERENCES [1]

3GPP TR 36.805 v1.3.0 1, “Study on minimization of drive-tests in next generation networks; (release 9),” 3rd Generation Partnership Project, Tech. Rep., 2009. [Online]. Available: http://www.3gpp.org/ftp/Specs/html-info/23141.htm

[2]

J. Mitola, “Cognitive radio: An integrated agent architecture for software defined radio,” (Doctoral dissertation), Royal Institute of Technology (KTH), Stockholm, Sweden, 2000.

[3]

Y. Zhao, J. Gaeddert, K. K. Bae, and J. H. Reed, “Radio environment map enabled situation-aware cognitive radio learning algorithms,” Software Defined Radio Forum (SDRF) technical conference, 2006.

29

[4]

A. Alaya-Feki, S. Ben Jemaa, B. Sayrac, P. Houze, and E. Moulines, “Informed spectrum usage in cognitive radio networks: Interference cartography,” IEEE Conference on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–5, 2008.

[5]

A. Alaya-Feki, B. Sayrac, S. Ben Jemaa, and E. Moulines, “Interference cartography for hierarchical dynamic spectrum access,” IEEE Conference on New Frontiers in Dynamic Spectrum Access Networks (DySPAN), pp. 1–5, 2008.

[6]

J. Riihijarvi, P. Mahonen, M. Wellens, and M. Gordziel, “Characterization and modelling of spectrum for dynamic spectrum access with spatial statistics and random fields,” IEEE Conference on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6, 2008.

[7]

ETSI TR 102 947 V1.1.1, “Reconfigurable Radio Systems (RRS); Use Cases for building and exploitation of Radio Environment Maps (REMs) for intra-operator scenarios ,” European Telecommunications Standards Institute, Tech. Rep., 2013. [Online]. Available: http://www.etsi.org/deliver/etsi tr/102900 102999/102947/01.01.01 60/tr 102947v010101p.pdf

[8]

B. Sayrac, J. Riihij¨arvi, P. M¨ah¨onen, S. Ben Jemaa, E. Moulines, and S. Grimoud, “Improving coverage estimation for cellular networks with spatial bayesian prediction based on measurements,” Proceedings of the ACM SIGCOMM workshop on Cellular networks: operations, challenges, and future design, pp. 43–48, 2012.

[9]

A. Galindo-Serrano, B. Sayrac, S. Ben Jemaa, J. Riihijarvi, and P. Mahonen, “Automated coverage hole detection for cellular networks using radio environment maps,” IEEE Conference on Modeling and Optimization in Mobile, Ad Hoc & Wireless Networks (WiOpt), pp. 35–40, 2013.

[10]

B. Sayrac, A. Galindo-Serrano, S. B. Jemaa, J. Riihij¨arvi, and P. M¨ah¨onen, “Bayesian spatial interpolation as an emerging cognitive radio application for coverage analysis in cellular networks,” Transactions on Emerging Telecommunications Technologies, 2013.

[11]

S. Grimoud, S. Ben Jemaa, B. Sayrac, and E. Moulines, “A REM enabled soft frequency reuse scheme,” IEEE GLOBECOM Workshops (GC Wkshps), pp. 819–823, 2010.

[12]

N. Cressie and G. Johannesson, “Fixed rank kriging for very large spatial data sets,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 70, no. 1, pp. 209–226, 2008.

[13]

H. Braham, S. Ben Jemaa, B. Sayrac, G. Fort, and E. Moulines, “Low complexity spatial interpolation for cellular coverage analysis,” IEEE Conference on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2014.

[14]

——, “Coverage mapping using spatial interpolation with field measurements,” IEEE Conference on Personal, Indoor and Mobile Radio Communications (PIMRC), 2014.

[15]

T. S. Rappaport et al., Wireless communications: principles and practice.

prentice hall PTR New Jersey, 1996, vol. 2.

[16]

C. Phillips, D. Sicker, and D. Grunwald, “Bounding the error of path loss models,” IEEE Conference on New Frontiers in Dynamic Spectrum Access Networks (DySPAN), pp. 71–82, 2011.

[17]

E. Perahia, D. C. Cox, and S. Ho, “Shadow fading cross correlation between base stations,” IEEE Conference on Vehicular Technology Conference (VTC), vol. 1, pp. 313–317, 2001.

[18]

N. A. Cressie, “Statistics for spatial data,” 1993.

[19]

C. Phillips, M. Ton, D. Sicker, and D. Grunwald, “Practical radio environment mapping with geostatistics,” IEEE Conference on Dynamic Spectrum Access Networks (DySPAN), pp. 422–433, 2012.

[20]

J. Riihijarvi and P. Mahonen, “Estimating wireless network properties with spatial statistics and models,” IEEE Conference on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 331–336, 2012.

30

[21]

E. Anderson, C. Phillips, D. Sicker, and D. Grunwald, “Modeling environmental effects on directionality in wireless networks,” Mathematical and Computer Modelling, vol. 53, no. 11, pp. 2078–2092, 2011.

[22]

R. Ramanathan, “On the performance of ad hoc networks with beamforming antennas,” Proceedings of the ACM international symposium on Mobile ad hoc networking & computing, pp. 95–105, 2001.

[23]

Y. Chen, Z. Zhang, and V. K. Dubey, “Effect of antenna directivity on angular power distribution at mobile terminal in urban macrocells: A geometric channel modeling approach,” Wireless Personal Communications, vol. 43, no. 2, pp. 389–409, 2007.

[24]

E. Anderson, G. Yee, C. Phillips, D. Sicker, and D. Grunwald, “The impact of directional antenna models on simulation accuracy,” IEEE Conference on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, WiOPT, pp. 1–7, 2009.

[25]

H. V. Henderson and S. R. Searle, “On deriving the inverse of a sum of matrices,” Siam Review, vol. 23, no. 1, pp. 53–60, 1981.

[26]

M. Katzfuss and N. Cressie, “Tutorial on Fixed Rank Kriging (FRK) of CO2 data,” Department of Statistics Technical Report No. 858, The Ohio State University, Columbus, OH, Tech. Rep., 2011.

[27]

C. Rasmussen and C. Williams, Gaussien Processes for Machine Learning.

[28]

J. M. Geoffrey and K. Thriyambakam, The EM algorithm and extensions.

[29]

C. Wu, “On the convergence properties of the EM algorithm,” The Annals of statistics, vol. 11, no. 1, pp. 95–103, 1983.

[30]

MIT Press, 2006. Wiley-Interscience, 2007, vol. 382.

M. Gudmundson, “Correlation model for shadow fading in mobile radio systems,” Electronics letters, vol. 27, no. 23, pp. 2145–2146, 1991.

[31]

“Asset tool,” http://www.aircominternational.com/Products/Planning/asset.aspx.

[32]

R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation and model selection,” IJCAI, vol. 14, no. 2, pp. 1137–1145, 1995.

[33]

J. Li and G. Australia, A review of spatial interpolation methods for environmental scientists.

Geoscience Australia

Canberra, 2008, vol. 137. [34]

ETSI TS 136 304 V8.6.0, “Evolved universal terrestrial radio access (E-UTRA); user equipment (UE) procedures in idle mode,” 3rd Generation Partnership Project, Tech. Rep., 2009. [Online]. Available: http: //www.3gpp.org/ftp/Specs/html-info/23141.htm

[35]

3GPP TSG-RAN WG4 meeting # 62 R4-120908, Proposal to modeling active antennas, Katherein, Feb 2012.

[36]

G. A. Seber and A. J. Lee, Linear regression analysis.

[37]

K. B. Petersen and M. Pedersen, “The matrix cookbook,” Technical University of Denmark, Tech. Rep., 2012.

John Wiley & Sons, 2012, vol. 936.