A Grid-Based Algorithm for On-Device GSM ... - Semantic Scholar

Report 3 Downloads 88 Views
A Grid-Based Algorithm for On-Device GSM Positioning Petteri Nurmi, Sourav Bhattacharya, Joonas Kukkonen Helsinki Institute for Information Technology HIIT PO Box 68, FI-00014 University of Helsinki, Finland [email protected]

ABSTRACT

We propose a grid-based GSM positioning algorithm that can be deployed entirely on mobile devices. The algorithm uses Gaussian distributions to model signal intensity variations within each grid cell. Position estimates are calculated by combining a probabilistic centroid algorithm with particle filtering. In addition to presenting the positioning algorithm, we describe methods that can be used to create, update and maintain radio maps on a mobile device. We have implemented the positioning algorithm on Nokia S60 and Nokia N900 devices and we evaluate the algorithm using a combination of offline and real world tests. The results indicate that the accuracy of our method is comparable to state-of-the-art methods, while at the same time having significantly smaller storage requirements. Author Keywords

Positioning, GSM, fingerprinting, particle filtering, mobile computing, energy efficiency. ACM Classification Keywords

C.2.4 Computer Communication Networks: Distributed Systems; H.4.m Information Systems Applications: Miscellaneous. General Terms

Algorithms, Experimentation, Measurement. INTRODUCTION

The proliferation of smart phones with positioning capabilities combined with improvements in mobile user interfaces and easier access to custom applications have resulted in a new surge of interest in location-based services. Location-based services are finally expected to achieve their true potential, and revenues from locationbased services are expected to exceed $10 Billion by 2014 [7]. Modern smartphones already support navigation services, point-of-interest and friend finding ser-

vices, and other application domains such as locationbased gaming are rapidly gaining in popularity. Accurate and ubiquitous localization is one the key prerequisites for realizing the potential of location-based services. Contemporary smart phones readily support GSM and WiFi, and an increasing number of phones contain integrated GPS receivers. However, integrated GPS receivers and WiFi radios suffer from excessive power consumption, which makes them unattractive for localization purposes [10]. GSM does not suffer from this drawback, which makes accurate GSM localization desirable for location-based services. Current solutions for GSM localization typically suffer from poor positioning accuracy or require detailed radio maps with information about local GSM signal intensity variations. Moreover, the current approaches that rely on radio map information typically provide no support for constructing and maintaining the radio map directly on the client device. This paper presents a novel on-device GSM localization algorithm for mobile devices that provides good positioning accuracy while at the same time optimizing the size of the radio map that is required for positioning. Contrary to existing approaches, our algorithm supports constructing and updating radio maps directly on the device. The locally constructed radio maps can be shared with other devices, which makes collaborative calibration efforts possible. The algorithm relies on a grid representation of the world and it learns a signal intensity model that captures GSM signal variations within each grid cell. The positioning of the client device is then accomplished using a probabilistic centroid algorithm that is combined with a grid-based particle filter. We evaluate the algorithm using a combination of offline and field experiments. The median accuracy of the algorithm is around 150 meters in all experiments even though the experiments were conducted considering only GSM signal information from a single cell. RELATED WORK

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. UbiComp ’10, Sep 26-Sep 29, 2010, Copenhagen, Denmark. Copyright 2010 ACM 978-1-60558-843-8/10/09...$10.00.

GSM Fingerprinting

Laitinen et al. [11] were among the first ones to use GSM fingerprinting for outdoor positioning. Using fingerprints that contain the identifier and signal strength for the 6 most powerful GSM cells, they were able to achieve 90-percentile accuracies of 90 and 190 meters

within urban and suburban areas in Helsinki, Finland. The Place Lab project achieved accuracies between 100 and 200 meters within the greater Seattle metropolitan area using a sensor model that was coupled with a Bayesian particle filter [12]. The most extensive investigation of GSM positioning was conducted by Chen et al. [2] who considered, among other things, the accuracy of three different techniques (centroid, fingerprinting, signal propagation modeling based on Gaussian processes) in different kind of environments (urban, suburban, sparse suburban). The results indicated that fingerprinting was the most accurate within downtown areas with a median error of around 100 meters. However, fingerprinting was sensitive to the device that was used to collect fingerprints whereas Gaussian processes had better generalization performance. GSM fingerprinting has also been successfully used for indoor positioning. Otsason et al. [14] use wide signalstrength fingerprints that contain information about the 6 strongest GSM cells and readings for up to 29 additional GSM channels. The median accuracy of the system is between 3 and 6 meters within large multi-floor environments. The SkyLoc system focuses on determining the current floor of a mobile user within tall multifloor buildings [21]. Also SkyLoc utilizes wide signalstrength fingerprints, but feature selection techniques are used to select only a small subset of highly relevant radio sources for fingerprinting. The resulting system is able to determine the correct floor around 70% of the time and is within two floors in 97% of the cases. While SkyLoc focused on floor detection, the same authors have also shown that using feature selection techniques can improve positioning accuracy [20]. Other GSM positioning techniques

The simplest way to estimate the position of a client is to use the coordinates of the base station to which the device is currently connected. The accuracy of this method is relatively poor and depends, e.g., on cell size, cell density and environment characteristics. Trevisani and Vitaletti have shown that accuracy cell identifier positioning is several hundreds of meters within densely populated areas and several kilometers within sparsely populated areas. A variation of the cell identifier method is to estimate the location of the handset as a weighted average of several base stations [12]. In geometric positioning, the client or the network measure various signal characteristics and use these to estimate the distance and angle of the handset from a known reference point. The final estimate can then be calculated using basic geometry. Geometric methods include methods based on time advance, angle-ofarrival, time-of-arrival or time-difference-of-arrival measurements; see, e.g., Drane et al. [3]. Network-based positioning is typically based on geometric methods. The final technique we consider is radio propagation modeling. Radio propagation modeling uses knowledge

about wave propagation and environment characteristics to estimate the location of the handset with the help of one or more known reference points. Propagation models can be deterministic, i.e., have fixed parameter values that have been verified from empirical data, or they can use machine learning techniques to learn parametric models for different access points from empirical data. Roos et al. [15] used probabilistic inference to estimate the parameters whereas Wu et al. [23] used support vector regression. Monte Carlo localization is an extension of the probabilistic method that considers temporal dependencies between locations and measurements [19]. Finally, the most popular propagation approach is to use Gaussian processes. Instead of fitting a specific model, Gaussian processes are able to learn a continuous estimator (typically a weighted combination of Gaussian functions) for the propagation model from empirical data [17, 4]. GRID-BASED GSM POSITIONING World Model

The first step in our algorithm is to create a discrete representation of the world. The use of a discrete representation can facilitate calibration efforts [9] and help balancing between positioning accuracy and radio map size [5]. In our case, we represent the world as a grid that consists of d times d patches. The parameter d specifies the granularity of the grid and determines the trade-off between positioning accuracy and radio map storage requirements. The optimal value of d depends on the accuracy of the calibration data and it should not be smaller than the average accuracy of the positioning technology that was used to collect calibration data. Currently we use d = 20 meters, which was selected based on GPS accuracy and radio map storage size considerations. An alternative is to use a topological map consisting, e.g., of continuous street segments (see, e.g., [13]). The problem with this approach is that it requires storing the topological map on the device, whereas the grid representation can be used worldwide without additional storage requirements. We assume location measurements are represented using a geodesic reference system, e.g., WGS84 used by GPS. Geodesic reference systems model the earth’s surface as an ellipsoid and locations near the surface of the ellipsoid are expressed using a (latitude, longitude) pair. To create a grid representation of the world, we need a mapping that associates each (latitude, longitude) pair uniquely with a grid cell that is specified by a (row, column) pair. The mapping of a (latitude, longitude) measurement into a (row, column) pair is illustrated in Fig. 1(a). The mapping algorithm first calculates the geodesic distance1 of the measurement from the point that is located on the same latitude on the prime meridian. The column index of the grid cell is then the (floor of the) resulting distance divided by the size of the grid 1 We calculate geodesic distances using a variant of Vincenty’s algorithm for solving inverse geodesic problems [22].

(a) Example of mapping a given (latitude, (b) Example of mapping a given (row, collongitude) pair into a (row, column) inumn) index into a (latitude, longitude) dex. value. Figure 1. Illustrating the processes of mapping coordinates to grid indexes and vice versa.

cell (i.e., value of d). The row index is resolved similarly, i.e., we calculate the distance of the measurement from the point at the same longitude on the equator and divide the resulting distance with the grid size. When the latitude or longitude value is negative, the corresponding grid index is negated. The motivation for using coordinate-wise distances along the prime meridian and the equator is that this ensures the size of the grid cells remains the same everywhere in the world. The positioning phase requires that we are able to determine the coordinates of a grid cell. We accomplish this by estimating the coordinates of each corner point, and using the mean of the corner points as the coordinates of a grid cell. Fig. 1(b) illustrates the process of mapping a grid index into a (latitude, longitude) pair. The mapping algorithm initializes the coordinate estimates to the point (0, 0) and iteratively refines the estimates by moving along the latitude and longitude axes. The distance to move along an axis corresponds to the difference between the original distance and the coordinatewise distance evaluated at the current estimate. This process is continued until differences in coordinate-wise distances become sufficiently small. Pseudo-code of the mapping algorithm is given in Alg. 1. The description assumes points are located in the North-Eastern hemisphere. Coordinates for other hemispheres can be resolved by negating latitude or longitude values of negative grid indexes. The function InvGeodetic calculates the (inverse) geodesic distance between two (latitude, longitude) pairs, and the function DirGeodetic returns the (latitude, longitude) coordinates that correspond to traveling along the surface of an ellipsoid from a predefined starting point for a given distance in the direction specified by an azimuth parameter. Signal Intensity Model

Once a grid representation of the world has been created, the next step is to construct a sensor model that captures signal intensity variations within each grid cell.

Algorithm 1 Calculate longitude and latitude of the grid corner that is closest to (0,0). Require: Grid column and row as integers, integer d describing the width of each grid cell in meters. xdist = |col| ∗ d ydist = |row| ∗ d xdif = xdist ydif = ydist Lat = 0, Lon = 0 while abs(xdif ) < 0.01 AND abs(ydif ) < 0.01 do [Lon, Lat] = DirGeodetic(Lon, Lat, ydif, 0) [Lon, Lat] = DirGeodetic(Lon, Lat, xdif, 90) X = InvGeodetic(0, Lat, Lon, Lat) Y = InvGeodetic(Lon, 0, Lon, Lat) xdif = xdist − X ydif = ydist − Y end while return [Lon, Lat]

We model the variations using a Gaussian distribution that is determined by a (grid cell, GSM cell) pair. The probability of observing GSM cell j having signal intensity s in grid cell i is given by ! 2 1 (s − µi,j ) p (i|j, s) = q exp − , (1) 2 2σi,j 2πσ 2 i,j

where µi,j and σi,j denote the mean and standard deviation of the corresponding Gaussian distribution. These parameters can be estimated from calibration data or they can be learned directly on the device. When the number of signal strength measurements for a particular (grid cell, GSM cell) pair is small, the estimate of the standard deviation is often close or equal to zero. This can result in over-fitting and decrease positioning accuracy. To avoid over-fitting, we enforce a minimum constraint on the standard deviation and require that

Algorithm 2 EstimateLocation

the value of σi,j is greater than or equal to 5.0. The decision to model local signal intensity variations using Gaussian distributions is motivated by WiFi positioning literature, which has shown that this approach can provide a good trade-off between radio map size and positioning accuracy [5, 16]. Another benefit of using Gaussian distributions is that Gaussians provide closed form solutions to certain equations that we use for updating and maintaining radio maps directly on the client device; see the on-device positioning section. Position Estimation

Let s = (s1 , . . . , sk ) denote a measurement that contains signal intensities for k GSM cells. The probability of observing measurement s in grid cell i is given by p(i|s) =

k Y

p(i|jq , sq ),

(2)

q=1

where we have made the simplifying assumption that signal strength measurements for different GSM cells are independently and identically distributed. If any of the GSM cells has not been previously observed, the corresponding probability is zero and also p(i|s) becomes zero. To support partially matching fingerprints, we substitute p(i|j, s) with a small constant  when GSM cell j has not been observed in grid cell i. In the current implementation we use  = 10−16 . If none of the GSM cells has been observed in the grid cell, we set p(i|s) to zero. A simple way to position the client is to use Equation 2 to calculate the probability of each grid cell and to use the resulting probabilities to calculate a weighted average of the grid cells with non-zero probability. Often we want to track the position of the client continuously. In these situations, information about previous measurements can be used to reduce estimation errors and to smooth the estimated trajectories. A popular way to implement tracking is to use particle filtering, a sequential Monte-Carlo technique that uses a discrete set of weighted samples, referred to as particles, to maintain a probability distribution for the location estimate over time [6, 19]. In our case, we use a grid-based particle filter that assigns each particle to one of the grid cells. The resulting filter maintains a probability distribution over the grid cells and the final position is estimated using a weighted average of the cells with non-zero probability; see Alg. 2. The particle filter is initialized when the tracking of the client is started. During initialization, we generate a set of N particles {xz , wz }N z=1 and sample the initial position of the particles using the probabilities p(wq = i) ∼ C −1 p(i|s) + bN (0, )/dc.

(3)

Here p(i|s) is the prior probability of the grid cell i conditioned on the first signal strength measurement s, PM which is given by Eq. 2. The variable C = q=1 p(iq |s) is a normalizing constant that ensures the probabilities

Input: N number of particles to use, s the current fingerprint, P articles current set of particles p  = N/3 Calculate p(i|s), ∀i ∈ grids . Eq. 1 and Eq. 2 if P articles == ∅ then . Point estimate xz ∼ C −1 p(i|s) + bN (0, )/dc . Sample initial set of particles using Eq. 3 else . Tracking xz = xz + bN (0, )/dc . Scatter particles using random movement end if   PM −||xz −iq || p(xz |s) = q=1 p(iq |s) exp , 2 . Calculate probability of particles using Eq. 4 if P articles 6= ∅ then xz ∼ p(xz |s) . Resample particles end if P K = z p(xz |s) wz = K −1 p(xz |s) . Calculate weights of particles and normalize PN

w x

z z L(t) = Pz=1 N z=1 wz . Estimate location using Eq.6

for the M grid cells sum to unity. Finally, N (0, ), is a Gaussian error term that is used to model measurement errors. The error is given in meters and we use the floor operator together with the grid size parameter to discretize the error into grid cells. The error term essentially smooths some of the probabilities p(i|s) into neighboring cells. The standard deviation  of the error distribution controls the magnitude of the measurement error. Currently we use  = N/3, which was selected based on experiments with a positioning scheme that does not implement tracking. Once we have sampled the initial position of the particles, we calculate the probability p(xz |s) of each particle z. We assume that each grid cell with a non-zero probability contributes to the probability and that the magnitude of the contribution depends on the distance between the grid cell and the particle. Specifically, we set   M X −||xz − iq || p(xz |s) = p(iq |s) exp , (4) 2 q=1 where p(iq |s) is given by Eq. 2 and ||xz − iq || is the grid (or Manhattan) distance between the location of particle xz and the grid cell iq . Accordingly, Eq. 4 essentially distributes the probabilities of grid cells that have non-zero probability to the cell where the particle is currently located. When tracking the client, the first step is to use a move-

ment model to predict movement and to sample a new position for each particle. We currently use a Gaussian movement model that assumes that the probability of a particle moving to a particular grid cell depends on the distance between the current location of the particle and the location of the grid cell. More specifically, we sample the new position of the particle using xz ∼ xz + bN (0, )/dc.

(5)

An alternative is to use signal intensity fluctuations to estimate motion and mobility modes [8, 18]. However, accurate motion predictions typically require information from multiple GSM cells, whereas certain platforms only allow accessing information from a single GSM cell. For this reason we decided to use a simple motion model as a proof-of-concept; see the discussion section. After each particle has been assigned a new position, the next step is to use Eq. 4 to calculate a weight for each particle. Since the motion model that we use allows uniform movement in all directions, some of the particles might have moved far from the grid cells that have a non-zero probability. Consequently, they will have a relatively small weight. To make the particle distribution more consistent with the evidence, the next step is to resample the particles proportionally to their weights. Finally, once the set of particles has been resampled, the weights of the particles are normalized. The particles maintain a probability distribution over the possible location of a mobile client. At a given time t, we can use the particle distribution to provide a point estimate L(t) of the client’s position by calculating a weighted centroid of the particle distribution, i.e., PN z=1 wz xz L(t) = P . (6) N z=1 wz The resulting estimate consists of a (row, column) pair, which can be translated into a (latitude, longitude) pair using Alg. 1 and simple interpolation. The time and memory requirements of the particle filter depend on the number of particles that are used to represent the probability distribution [6]. In our case, also the number of grid cells that have a non-zero probability has an influence on the time complexity. Currently we are using 1000 particles for running the positioning on a Nokia N95 mobile phone; see the discussion section. ON-DEVICE POSITIONING

One of the main challenges with fingerprinting algorithms is related to collecting and maintaining radio map information [8, 9]. In GSM fingerprinting, an additional challenge is related to accessing the radio map information. As the positioning algorithm should optimally work everywhere in the world, the size of the radio maps easily becomes infeasible. Also accessing radio map information from a server is infeasible due to the high power consumption of Internet connectivity [10]. In this section we discuss techniques that can

be used to create, manage and update radio maps directly on the client device. We have implemented the proposed techniques, and the positioning algorithm described in the previous section, on Nokia Maemo devices (Nokia N900) and on Nokia S60 devices (N95, E61i) using Python and Python S60. Radio Map Storage

Our positioning infrastructure is based on a client-server architecture. Each client is responsible for maintaining a local radio map that it can use for positioning itself and the server is responsible for storing and maintaining a global view of the radio map. The server is also responsible for aggregating radio map updates from clients and for propagating changes in radio maps to client devices. The clients store radio maps locally on a file. The file is organized using a two-level indexing scheme. The first level uses GSM cell identifiers (i.e., country code, network code, area code, cell identifier tuples) as the index keys, whereas the second level uses grid indexes (i.e., row and column pair) as the index keys. For each (grid cell, GSM cell) pair we store a parameter vector that contains the current parameters of the signal intensity distribution, i.e., mean, standard deviation and number of measurements. We also store a parameter vector that contains the most recent parameter values that were deployed on the device. The latter vector is used exclusively for radio map synchronization purposes. When the client observes a GSM cell it has not seen previously, it contacts the server and downloads available radio maps. The radio maps can also be stored manually by downloading the suitable file or if the client has access to GPS information the radio map can be learned directly on the device. In addition to downloading radio maps for new GSM cells, the client periodically contacts the server to check for radio map updates. Currently the clients check for updates once per day. Updating Local Radio Maps

Signal strength measurements are inherently noisy and a large number of factors can cause variations in the measurements. For example, measurements from the same location tend to vary according to the time of day and differences between mobile devices can cause variations in the measured signal intensities [2]. Accordingly, radio maps constructed from calibration data necessarily contain inaccuracies, and measurements collected on one type of device are not necessarily well-suited for other types of devices. A potential way to overcome limitations in the calibration data is to allow the mobile clients to collect new signal strength measurements and to use these measurements to adapt the radio map parameters locally. In our case, the mean and variance parameters of the underlying Gaussian distributions can be estimated in an online fashion, which makes it possible to update or even col-

lect radio maps directly on the device without the need to store signal measurements on the device. Another advantage of this approach is that the radio maps can adapt to changes in the signal environment, e.g., when new GSM cells are introduced or when the transmission power of a base station is adjusted. Estimating radio map parameters in an online fashion requires maintaining summary statistics for each grid and GSM cell pair (i, j). Specifically, we need to store the number of signal strength measurements that have been collected for the pair (i, j), and we need to maintain time-dependent estimates of the signal strength measurements’ mean µi,j (t) and sum of squares Si,j (t) values. These estimates can be calculated using [1]: µi,j (t + 1) = µi,j (t) + α (si,j − µi,j (t))

(7) 2

Si,j (t + 1) = Si,j (t) + β [si,j − µi,j (t + 1)] .

(8)

Here α = max{1/t, c} and β = (t − 1)/t are step size parameters, and c is a predefined constant that controls the minimum rate of updating. Currently we use c = 0.01. The standard deviation of signal strength measurements can be derived from Eq. 8 using: ( s ) Sij (t) σi,j (t) = max 5, . (9) (t − 1) Similarly to the training phase, in Eq. 9 we have enforced a minimum constraint on the standard deviation to avoid overfitting. When the deployed radio map contains parameter values for (i, j), we use the existing values to initialize the estimators2 . Otherwise, the estimators are initialized to zero. Updating radio maps locally requires the client to have access to valid GPS measurements. However, integrated GPS receivers tend to have high power consumption, which means that regular collection of measurements on the client device is not feasible. Determining when to collect GPS measurements is out of scope for this paper and instead we assume that the radio map updates are carried out opportunistically whenever valid GPS measurements are available. Synchronizing Radio Maps

Local updates to the radio map parameters can cause the signal intensity distributions to diverge from the distribution originally deployed on the device. When the local distribution has diverged significantly, the new parameters should be propagated to other devices. Instead of sending updates to the server after each local update, we use a statistical criterion to monitor for changes in the signal intensity distribution and trigger an update when the changes are sufficiently significant. We measure the difference in the signal intensity distribution using joint Kullback-Leibler (KL) divergence. Let p and q denote arbitrary probability distributions, 2

µi,j (0) = µi,j and Si,j (0) = (ni,j − 1)σi,j

the KL-divergence between p and q is given by: Z ∞ p(x) DKL (p k q) = p(x) log dx. q(x) −∞

(10)

The joint KL-divergence is defined as the sum of the KL-divergence between p and q and the KL-divergence between q and p, i.e., D (p k q) = DKL (p k q) + DKL (q k p) .

(11)

When both distributions p and q are Gaussian, the joint KL-divergence can be evaluated using:   σp2 σq2 1 1 2 D (p k q) = 2 + 2 + (µp − µq ) + 2 − 2. σq σp σp2 σq (12) We send an update with the new parameter values to the server whenever the value of the KL-divergence exceeds a predefined threshold λ. In the current implementation we use λ = 3. We also require that at least 10 new measurements have been obtained for a given set of parameters before parameter updates are sent to the server. This requirement ensures that the variance of the parameter estimates is sufficiently small for evaluating the significance of the changes in the signal intensity distribution. Requiring a minimum number of calibration measurements also prevents excessive data transmissions and helps to improve battery life. Multiple clients can send updates to the server regarding the same set of parameter values. Consequently, the server needs a way to integrate the different updates in a consistent manner. Whenever the server receives a parameter update, we aggregate the parameters using a weighted average where the weights correspond to the differences in sample counts. Specifically, let N 0 denote the number of samples that were used to calculate the first estimate stored on the client device and let N C denote the number of samples when the parameter update was triggered. Respectively, let N S denote the number of samples stored on the server side. As the weight for the client side parameters we use WC = N C − N 0 , and we use WS = N S − N 0 as the weight for the server side parameters. For a given pair (i, j), the new parameters are then given by: WS µS + WC µC WS + WC WS σ S + WC σ C = WS + W C = N S + N C − N 0,

µnew =

(13)

σ new

(14)

N new

(15)

where the indexes (i, j) have been omitted for brevity. The updated parameter values are stored on the server side and they are also sent back to the client that triggered the update. In addition to using the parameter updates for enforcing consistency in parameter values, the client devices periodically query the server for the most recent parameter values. If the server returns new parameter values, the client aggregates the new values

with local updates similarly to the server, i.e., using a weighted average of the different parameter values. EVALUATION

We have evaluated the proposed algorithm using a combination of offline experiments and real world tests. The offline experiments focus on characterizing the accuracy of the positioning algorithm, whereas the real world experiments focus on evaluating the effects of the ondevice radio management and on understanding how the positioning error evolves over time. Offline Evaluation

To characterize the positioning accuracy of our algorithm, we have conducted a series of offline experiments using data that was collected from two residential neighborhoods in Helsinki, FInland with around 600,000 inhabitants. The neighborhoods were selected to have different topological characteristics and GSM cell densities. Specifically, we selected a residential area with high traffic and cell density and another residential area with lower traffic and cell density. Data was mainly collected by driving around the streets of the selected area. In few cases we had to collect data by walking a particular street segment. During data collection, three phones were used to collect calibration data and one phone was simultaneously collecting test data. Each phone was running the BeTelGeuse mobile platform [10], which was configured to log GSM data every 10 seconds and to stream GPS data continuously. The GSM data consisted of the cell identifier of the current cell tower and the observed signal strength3 . The data was stored to a memory card on the phone and later uploaded to a spatial database. In the upload phase we removed measurements that had no GSM signal or inaccurate GPS information (less than four satellites or HDOP value greater than six). Table 1 shows the positioning accuracy of the grid-based positioning algorithm for the two different neighborhoods. The table also contains the positioning accuracy of the algorithm without the particle filter, and we have included the accuracy of a k-nearest fingerprinting algorithm [2]. Contrary to our algorithm, the k-nearest algorithm does not aggregate signal intensity measurements, but includes all signal intensity measurements in the radio map. The location of the client is then estimated as a weighted average of the k best-matching fingerprints in the radio map. As the weights we have used the inverse of the signal strength distance between a test point and a point stored as part of the radio map. 3 Our algorithm is not restricted to using information from a single GSM cell, but technical limitations prevent us from considering multiple cells in the experiments. Specifically, Nokia S60 and N900 devices unfortunately only allow accessing the identifier and signal strength of the current cell tower. Considering more than one cell typically improves positioning accuracy [20], which means our accuracy results are somewhat conservative.

Even though we have used only a single GSM cell for estimating the location of the client, the median error of the grid-based algorithm is within 150 meters and most of the time the error is within 500 meters of the actual location. The use of particle filtering slightly decreases average case positioning error, though particle filtering makes the algorithm more robust to errors and provides better worst case error. The k-nearest algorithm has slightly better average case performance, which is simply due to the fact that the k-nearest algorithm uses individual points for calculating the location estimates, whereas we use discrete patches that cover a larger area. The main advantages of our method over the k-nearest algorithm are that our algorithm requires little storage space and that is has good generalization performance while at the same time maintaining comparable positioning accuracy. Sensitivity to Calibration Data Density

As part of the offline evaluation we have investigated the trade-off between the spatial density of the calibration trace and the positioning error. We conducted the experiments by simulating a sparser calibration trace where points were selectively removed at regular intervals. Specifically, we considered calibration traces that contained only 25%, 50% or 75% of the measurements and compared the resulting positioning accuracy against using the entire calibration trace. This procedure essentially corresponds to simulating devices with slower scanning rates. Tables 2 and 3 characterize the positioning error of the algorithm when the amount of calibration data is varied. The results are relatively stable and the overall differences in the positioning accuracies are relatively small. Most fluctuations can be associated with random effects that are caused by the random movement of particles in the positioning algorithm and by the selection of points that are removed from the calibration data. The results thus suggest that another benefit of the grid representation is that it makes it possible to achieve good positioning accuracy even with relatively sparse calibration data. Field Experiment: Evolution of Positioning Error

In addition to the offline experiments, we have conducted a field experiment that focused on characterizing the evolution of the positioning error over time. In the field experiment, five users were given a N900 mobile phone and a Bluetooth GPS receiver. The mobile phone was running the on-device positioning algorithm with an empty radio map. Also the server’s radio map was removed before the experiment was started. During the experiment the GPS receiver was polled each minute. If the GPS measurement was valid, the current radio map was used to estimate the position of the client and we calculated the positioning error by comparing the estimate to the corresponding GPS measurement. Next the GPS measurement was used to update the radio map parameters in the manner described previously.

Grid + PF Grid Full Idx

High Density Residential Area 50% 90% 99% 100% 139.68 399.21 474.19 480.46 174.91 366.40 692.49 709.99 112.21 306.29 518.48 795.74

Low Density Residential Area 50% 90% 99% 100% 107.63 328.58 705.45 768.98 116.31 312.19 516.72 795.74 116.61 312.33 516.77 795.74

Table 1. Positioning accuracy of the grid-based positioning algorithm and comparison of positioning accuracy.

Percentile 0 25 50 75 90 95 99 100

% 25% 7.22 71.86 155.50 272.51 408.28 440.87 536.75 545.55

of calibration data 50% 75% 100 % 5.16 7.42 6.45 52.96 63.12 78.29 107.61 149.02 139.68 192.35 261.78 270.78 257.62 403.18 399.21 402.04 435.42 417.61 464.69 460.30 474.19 579.37 465.27 480.46

Table 2. Variation of percentile error with the amount of calibration data for the residential area with high cell density (Dataset 1).

Percentile 0 25 50 75 90 95 99 100

% 25% 5.69 33.46 142.69 264.78 381.17 474.17 599.08 742.12

of calibration data 50% 75% 100 % 6.63 8.68 3.71 29.67 32.79 38.66 122.62 107.61 107.63 235.90 230.22 208.69 361.41 325.75 328.58 452.23 450.31 478.71 694.78 692.20 705.45 767.7 769.11 768.98

Table 3. Variation of percentile error with the amount of calibration data for the residential area with low cell density (Dataset 2).

time one of the participants travels to a new area, the positioning errors increase. As more data is collected from these areas, the error values start to decrease until one of the participants travels again to a new area. As an example, the large increase in the 90 and 95 percentile results at around 400 hours is caused by one of the participants traveling by train to another city.

Figure 2. Evolution of positioning error over time.

If the radio map parameters had changed significantly, the client synchronized the radio map parameters with the server. We initially conducted a ten day experiment, during which 283 location updates were sent to the server. These updates were related to 173 different (GSM cell, grid cell) pairs, i.e., on average there were 1.6 updates per pair. Since then we have continued collecting data and the results include measurements also from our later experiments. Fig. 2 characterizes the evolution of the positioning error over time. The figure has been constructed by calculating the different percentiles for the five participants from all data collected until that point of time and averaging the resulting values over the participants. When the positioning is started for the first time, the position estimates become biased to the location where the user is and the resulting errors are relatively small. Each

The errors in the field evaluation differ from the offline evaluation. First, the median errors are somewhat smaller which is due to the fact that the local radio maps contain many measurements that were collected by the users themselves. As a consequence, the calibration traces are necessarily biased to the trajectories that the user moves and the resulting median errors are small. The 90 and 95 percentile errors, on the other hand, are significantly larger, which is a consequence of starting the experiment with empty radio maps. The field experiment also contains measurements from areas with poor GSM coverage which significantly increases these errors. Cross-Device Generalization Performance

GSM signal measurements vary across devices, which means that measurements collected on one type of device cannot necessarily be used to position another type of device. Results from an earlier evaluation of GSM positioning algorithms by Chen et al. [2] have indicated that the accuracy of GSM fingerprinting can decrease up to 300% when the test data and training data are collected on different devices. To this end, it is important to evaluate how well our algorithm performs when the training and test data are collected on different devices. To evaluate the effect of collecting training data on one device and testing on another device, we repeated the offline experiments using the data collected in the field

Percentile 0 50 90 95 99 100

Cross-Device 7.60 189.90 322.40 460.64 547.20 548.96

Same Dev. 6.45 139.68 399.21 417.61 474.19 480.46

% 118% 136% 81% 110% 115% 114%

Table 4. Cross-device positioning accuracy and comparison to using the same device for collecting testing and training data.

experiments as the calibration trace. The two datasets were collected almost one year apart and using different kind of devices; the training data was collected using Nokia N900 devices and the test data was collected using Nokia E61i devices. To verify that the two datasets contain significant signal intensity variations, we compared the mean signal intensity values for each (GSM cell, grid cell) pair that appeared in both datasets. The average difference in the mean signal intensity values was 9.75 dBm and the standard deviation of the differences was 5.81 dBm. A Wilcoxon signed rank test was used to verify that the differences in the signal intensity values indeed were statistically significant (z = 3.8; p < .01). Table 4 shows the positioning error when training and testing data are collected on different devices. The table also compares the positioning accuracy to the case where the same device is used to collect both training and test data. The degradation in positioning accuracy in the cross-device case is relatively small and most of the times the difference is within 15%. The worst case positioning error is somewhat higher in the cross-device case, which is mainly due to the fact that the test data contained some measurements for areas that were not in the training data. Consequently, the results suggest that our algorithm has sufficient generalization performance for using radio maps created on one type of device to estimate the position of another type of device. Radio Map Storage Size

As the final evaluation step we investigated the storage requirements of the algorithm. The storage space mainly depends on the number of (GSM cell, grid cell) pairs that the client has seen, and the space requirements grow linearly with the number of pairs that have been observed. The additional storage costs are related to the indexing scheme, which requires extra storage if a GSM cell or a grid cell has not been seen previously. The radio maps of the persons who participated in the field trial were between 7.75 kilobytes and 33.37 kilobytes. The smallest radio map contained measurements for 20 GSM cells and 108 grid cells, whereas the largest radio map contained measurements for 97 GSM cells and 508 grid cells. The largest radio map contains measurements from an area that covers approximately 80 km2 . A radio map that stores measurements for at least

one GSM cell in each grid cell would require a minimum of 4.5 MB of storage space. However, since we only store measurements from areas where the user has moved, the resulting radio map remains compact. As the field experiments were conducted during everyday situations, the results suggest that storing radio maps that cover a person’s everyday activities has negligible storage requirements and that the storage requirements of the algorithm are acceptable for practical purposes. In comparison to the k-nearest algorithm, a radio map containing all measurements that the participants collected requires 215.78 kilobytes, whereas storing a radio map that covers all GSM and grid cells that the participants observed requires only 64.9 kilobytes of storage, i.e., around 70% less space. DISCUSSION AND FUTURE WORK

One of the main limitations of our algorithm relates to the number of particles that we use for position estimation. Currently we use 1000 particles as this is the maximum number we can use for achieving near realtime performance. The running time of the algorithm mainly depends on the number of particles, though also the number of grids that are found for a particular GSM cell impacts the running time. By further optimizing the code, it should be possible to run the algorithm in real-time with around 5000 particles on Nokia N900 devices [6]. However, offline evaluations using a higher number of particles have not resulted in significant improvements in positioning accuracy. Another possibility is to dynamically adjust the number of particles, e.g., using KLD adaptation. Exploring alternative particle filtering techniques is part of our future work. Another limitation of our algorithm is related to the motion model that we use in the particle filtering phase. The random movement model was selected as a proofof-concept and we believe that the accuracy of the algorithm can be improved by employing more detailed motion models. This view is also supported by the results as a detailed analysis of positioning errors revealed that the largest errors resulted from periods where the user was moving by car, train or bus. Improving the used motion model is part of our future work. ACKNOWLEDGMENTS

The work was supported in part by the ICT program of the European Community, under the PASCAL2 network of excellence, ICT-216886-PASCAL2. The publication only reflects the authors’ views. The authors are grateful to Wray Buntine for help with the particle filter. The authors are grateful to Taneli V¨ahakangas, Yiyun Shen, Taru It¨apelto and Mikael Andersson for helping with the data collection and the field evaluation. REFERENCES

1. T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician, 37(3):242–247, 1983.

2. M. Y. Chen, T. Sohn, D. Chmelev, D. H¨ ahnel, J. Hightower, J. Hughes, A. LaMarca, F. Potter, I. E. Smith, and A. Varshavsky. Practical metropolitan-scale positioning for GSM phones. In Proceedings of the 8th International Conference on Ubiquitous Computing (UbiComp), volume 4206 of Lecture Notes in Computer Science, pages 225–242. Springer, 2006. 3. C. Drane, M. Macnaughtan, and C. Scott. Positioning GSM telephones. Communications Magazine, IEEE, 36(4):46–54, 59, 1998. 4. B. Ferris, D. H¨ ahnel, and D. Fox. Gaussian processes for signal strength-based location estimation. In Robotics: Science and Systems. The MIT Press, 2006. 5. A. Haeberlen, E. Flannery, A. M. Ladd, A. Rudys, D. S. Wallach, and L. E. Kavraki. Practical robust localization over large-scale 802.11 wireless networks. In Proceedings of the 10th annual international conference on Mobile computing and networking (MobiCom), pages 70–84. ACM, 2004. 6. J. Hightower and G. Borriello. Particle filters for location estimation in ubiquitous computing: A case study. In Proceedings of the 6th International Conference on Ubiquitous Computing (Ubicomp), pages 88–106. Springer, 2006. 7. Juniper Research. Mobile location based services: Applications, forecasts & opportunities 2009-2014. Research report, March 2010. 8. J. Krumm and E. Horvitz. LOCADIO: Inferring motion and location from Wi-Fi signal strenghts. In Proceedings of the 1st International Conference on Mobile and Ubiquitous Systems (Mobiquitous), pages 4 – 14. IEEE, 2004. 9. J. Krumm and J. Platt. Minimizing calibration effort for an indoor 802.11 device location measurement system. MSR-TR-2003-82, Microsoft Research, Seattle, WA, 2003. 10. J. Kukkonen, E. Lagerspetz, P. Nurmi, and M. Andersson. BeTelGeuse: A platform for gathering and processing situational data. IEEE Pervasive Computing, 8(2):49–56, 2009. 11. H. Laitinen, J. L¨ ahteenm¨ aki, and T. Nordstr¨om. Database correlation method for GSM location. In Proceedings of the 53rd IEEE Vehicular Technology Conference (VTC). IEEE, 2001. 12. A. LaMarca, Y. Chawathe, S. Consolvo, J. Hightower, I. E. Smith, J. Scott, T. Sohn, J. Howard, J. Hughes, F. Potter, J. Tabert, P. Powledge, G. Borriello, and B. N. Schilit. Place Lab: Device positioning using radio beacons in the wild. In Proceedings of 3rd International Conference on Pervasive Computing (PERVASIVE), volume 3468, pages 116–133. Springer, 2005.

13. L. Liao, D. Fox, and H. Kautz. Extracting places and activities from GPS traces using hierarchical conditional random fields. International Journal of Robotics Research, 26(1):119–134, 2007. 14. V. Otsason, A. Varshavsky, A. LaMarca, and E. de Lara. Accurate GSM indoor localization. In Proceedings of the 7th International Conference on Ubiquitous Computing (UbiComp), volume 3660 of Lecture Notes in Computer Science, pages 141–158. Springer, 2005. 15. T. Roos, P. Myllym¨aki, and H. Tirri. A statistical modeling approach to location estimation. IEEE Transactions on Mobile Computing, 1(1):59 – 69, 2002. 16. T. Roos, P. Myllym¨aki, H. Tirri, P. Misikangas, and J. Siev¨anen. A probabilistic approach to WLAN user location estimation. International Journal of Wireless Information Networks, 9(3):155–164, 2002. 17. A. Schwaighofer, M. Grigoras, V. Tresp, and C. Hoffmann. GPPS: A Gaussian process positioning system for cellular networks. In Advances in Neural Information Processing Systems 16. MIT Press, 2003. 18. T. Sohn, A. Varshavsky, A. LaMarca, M. Y. Chen, T. Choudhury, I. Smith, S. Consolvo, J. Hightower, W. G. Griswold, and E. de Lara. Mobility detection using everyday GSM traces. In Proceedings of the 8th International Conference on Ubiquitous Computing (Ubicomp), pages 212–224, 2006. 19. S. Thrun, D. Fox, W. Burgard, and F. Dellaert. Robust Monte Carlo localization for Mobile Robots. Artificial Intelligence, 128:99–141, 2001. 20. A. Varshavsky, E. de Lara, J. Hightower, A. LaMarca, and V. Otsason. GSM indoor localization. Pervasive and Mobile Computing, 3:698 – 720, 2007. 21. A. Varshavsky, A. LaMarca, J. Hightower, and E. de Lara. The SkyLoc floor localization system. In Proceedings of the 5th Annual IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 125 – 134. IEEE, 2007. 22. T. Vincenty. Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equations. Survey Review, 23(176):88–93, 1975. 23. Z. Wu, C. Li, J. K.-Y. Ng, and K. R. Leung. Location estimation via support vector regression. IEEE Transactions on Mobile Computing, 6(3):311–321, 2007.