Dwelling in the canyons: Dwelling detection in ... - Semantic Scholar

Report 3 Downloads 70 Views
Dwelling in the canyons: Dwelling detection in Urban Environments Using GPS, Wi-Fi, and Geolocation Niels Brouwersa,∗, Matthias Woehrlea a Embedded

Software Group, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands

Abstract A fundamental part of studying human mobility is to detect dwelling. When we dwell we are not necessarily stationary, but move around in a confined area. Most of our significant places are indoors, which hampers the detection using GPS. In this work, we discuss three different sensor sources when used for dwelling detection in urban environments: GPS, Wi-Fi and geolocation. Our study is based on data collected on mobile phones in cities of various sizes in four European countries. Based on this data, we compare several methods (i ) for classifying whether a user was dwelling and (ii ) for determining dwelling locations. Keywords: Mobile phones, dwelling, urban environment

1. Introduction Understanding human mobility in urban environments is crucial for many different application areas such as traffic prediction, city planning, and for determining social interactions. Therefore human mobility has been widely empirically studied in the social sciences, e. g., [1, 2]. Note that understanding mobility has two components: (i ) Understanding how we move, i. e., determining transportation modes [3]. (ii ) Understanding where we stop, i. e., determining the (important) points of interests (POIs) in our life [2]. For determining whether we stop at a POI, we need to distinguish whether we are dwelling at a location, e. g., at home, a local park or at a supermarket, or if we are mobile. Dwelling identifies that the user is in a locally constrained environment yet not particularly still or stationary. The focus of this paper is to determine whether users are dwelling based on traces collected on their mobile phone. In previous work, empirical data has often been very coarse-granular, e. g., GSM cell tower information [4] or merely cellphone call data [1, 2]. However, ∗ Corresponding

author Email addresses: [email protected] (Niels Brouwers), [email protected] (Matthias Woehrle)

Preprint submitted to Elsevier

December 14, 2011

modern smartphones provide a wealth of sensor data including GPS and WiFi connectivity. Social sciences can benefit from these additional “sensors” by increasing the fidelity of models of human mobility. As with any sensor technology, the information measured is subject to noise, uncertainty and availability issues. Additionally, sensing, e. g., using the GPS chip, consumes energy. This may negatively impact user experience by draining the battery; hence, the use of sensors needs to be carefully examined. Moreover the rate of sensing is a major factor for power consumption. Various sensor modalities have been employed for distinguishing between dwelling and mobility, most notably accelerometers [5, 3], GPS [3], and signal strength readings from Wi-Fi and GSM cell towers [4, 6, 7]. However, the use of geolocation services as a sensor for user location has often not been considered in these works. Geolocation services provide a location estimate based on scanned Wi-Fi fingerprints. While geolocation is not generally available, it is a prime sensor candidate in urban regions where access points (APs) are densely deployed [8]. This work presents a comparative study of three different sensors and their quality w. r. t. determining whether a user was dwelling, and where. The sensors we consider are GPS, the “raw” Wi-Fi scan information about surrounding APs, and a geolocation service based on Wi-Fi data. The contributions of this article can be summarized as: 1. We compare data from three sensors for determining whether users were mobile or dwelling and the corresponding POIs based on off-line analysis of mobile phone measurement traces. 2. We survey 4 different methods for classification and 14 POI extraction strategies and study their relative performance on sensor data collected in urban environments in four european countries. 3. We study the effects of subsampling of the traces in order to investigate the effect of sensing rates on detection performance. 4. We present idiosyncrasies of the sensor types and identify that the different sensors can also be used complementary. In the following we show how our work relates to previous work in Sec. 2. Section 3 discusses the data we collected and details the sensors and their corresponding features that we utilize. In Section 4 we present an evaluation of dwelling and POI detection on the collected data. We conclude in Sec. 5. 2. Related Work Several researchers have proposed systems to detect mobility by monitoring the signal strength of beacons received from fixed network infrastructure such as GSM and Wi-Fi APs. As a user moves around, the set of Wi-Fi APs her mobile phone can overhear and the received signal strength (RSS) of the corresponding beacons of these station, change over time. Moreover, the RSS of individual Wi-Fi APs tends to fluctuate more when the receiving device is in motion.

2

Sohn et al. [4] apply these principles to GSM cell tower information and propose a classifier based on seven features. The classifier distinguishes between three mobility states (stationary, walking, and driving), and achieves an overall accuracy of 85%. Muthukrishnan et al. [6] show how to determine whether a user is in motion using similar features, however on information from Wi-Fi scans. In Mun et al.[7] a combination of both GSM and Wi-Fi was used to differentiate between the same three states as in Sohn et al.[4], achieving a classification rate of 88%. Other sensors commonly found on mobile phones are accelerometers and GPS. These have been used by Reddy et al. [3] to determine a classifier with an average accuracy of 93.6%. Using speed as indicated by the GPS receiver in combination with features extracted from the accelerometer data, the proposed classifier can additionally recognize biking and running. In this paper we classify dwelling using simple decision tree models based on features extracted from GPS, Wi-Fi, and geolocation information. However there may be a benefit for temporal modeling, e. g., using hidden Markov models [3] and conditional random fields [9]. Therefore, we also investigate several hidden Markov models (similar to [3]) in Sec. 4.1. Users move from one location to another; at many locations they spend a considerable amount of time. Extracting these significant locations, i. e., POIs, can be done by analysis of time-annotated location traces. For example, the algorithm in Ashbrook et al.[10] uses the fact that a GPS signal is lost indoors, and detects these cut-off points in the trace. Clustering is then used to gather such points into POIs. However, relying on GPS signal loss will miss many important landmarks such as outdoor locations, indoor locations where GPS is still available, and will generate false-positives in urban canyons. Other approaches have focused on finding spatially and temporally constrained clusters in GPS traces [11, 12], but these assume clusters to be circular point clouds. Both DBSCAN [13] and DJ-Cluster [14] are clustering algorithms that do not assume points clouds to be circular. Finally, mean-shift [15] is a mode searching algorithm that is used for clustering. We implemented all of these algorithms and evaluated them on our sensor data using different strategies. Nurmi et al. [16] explore statistical methods in particular Markov chain monte carlo for detecting spatial clusters based on location information. Clusters are modeled as multivariate normal distributions. The described approach suffers from false positives as any data point no matter if part of a location or from travel between locations must be part of a cluster. We explore a similar approach with mean-shift clustering that also searches for modes of a distribution. Finally, SensLoc [17] integrates motion- and place detection using accelerometers, Wi-Fi access point scanning, and GPS, to deliver a highly accurate system for location recognition and path tracking. However, Wi-Fi scan results are used only for recognizing previously visited POIs and detecting entrance- and departure events, while localization relies entirely on GPS. In contrast, we perform a comparative study of different sensor sources, features and algorithms for the actual localization of POIs.

3

3. Approach We collected an extensive data set from the mobile phones of seven users. We first describe the sensor sources that we sampled and then characterize the data set. 3.1. Sensors We consider three different sensors: GPS, Wi-Fi and geolocation. We extract several features from each sensor source that we describe in the following. 3.1.1. GPS Most, if not all, modern smart phones come equipped with GPS sensors. These provide accurate measurements of both position and speed in outdoor locations, but signal quality is reduced or completely lost in indoor environments. Moreover, phone users tend to keep GPS turned off when not in use to avoid battery drain. When the GPS signal is available however, it tends to be a very good candidate for differentiating between dwelling and mobility [3]. We extract the following GPS features for GPS: (i ) Measured speed provided directly by the GPS, (ii ) speed calculated from the distance of GPS locations, (iii ) the difference between calculated and measured speed, (iv ) a boolean that indicates whether the GPS had a fix, (v ) the number of GPS satellites available for a given measurement and (vi ) the number of location samples around the current location within a specific radius r. 3.1.2. Wi-Fi Continuous scanning for Wi-Fi APs has been used in context-aware computing to detect user mobility. This method is attractive because it can be performed on-line and in real-time, both desirable qualities for this class of applications. Wi-Fi scan results, also called fingerprints, consist of a list of APs and corresponding RSS, where signal strength is measured in dBm. Two commonly used functions for finding similarity between two fingerprint vectors f~1 , f~2 are Cosine-similarity C and the Tanimoto-coefficient T shown below:

C(f~1 , f~2 ) =

f~1 · f~2 kf~1 kkf~2 k

(1)

T (f~1 , f~2 ) =

f~1 · f~2 kf~1 k2 + kf~2 k2 − f~1 · f~2

(2)

In order to use these measures for fingerprints we need to map raw RSS values to a relative measure s ∈ [0, 1] called the relative strength value. The RSS −RSS min relative strength value s is computed by : s = RSS , where RSS min max −RSS min and RSS max are upper and lower signal strength bounds. Since kf~k2 = f~ · f~, we only need to define the dot product of two fingerprints to be able to use equations (1) and (2). We calculate this product by multiplying the relative 4

strength values of the Wi-Fi AP they have in common, and taking the sum. Another way of measuring the strength of an AP is to measure its response rate [18], which is a fraction of how how often a given AP was found in a given time window. Several classification features have been discussed in literature, of which we have selected the following Wi-Fi features: (i ) The Euclidean distance of relative signal strength values, (ii ) the number of Wi-Fi APs that are in the fingerprint (scan result), (iii ) the Jaccard index as a measure of similarity between consecutive fingerprints, (iv ) the Tanimoto-coefficient and (v ) Cosinesimilarity applied to signal strength, the (vi ) sum-of-squares of differences in AP response rate, and (vi ) the Tanimoto-coefficient applied to AP response rate. In the context of clustering, it is important to note that the mean of a set of fingerprints cannot always be mapped back onto a geographical location, and therefore the operation does not have a spatial meaning. Consider a set of two fingerprints with each a single entry for an access point with a relative strength value of s = 1. The mean of this hypothetical set would contain both access points with a signal strength of s = 0.5 for each. However, if these APs are several kilometers apart, no location on earth exists where this such a fingerprint could be measured. In other words, the set of measurable fingerprints is not closed under the mean operation. 3.1.3. Geolocation An alternative to using Wi-Fi scan results directly is to pass them into a localization service such as Google’s geolocation API [19] or Skyhook Wireless’ localization service [20]. These services use large databases of location-annotated Wi-Fi fingerprints to compute a user location based on Wi-Fi scan results. In this way the Wi-Fi chip can act as a “poor man’s” GPS, providing estimates of user location. Google trains its database using a background service built into Android devices that reports GPS coordinates and Wi-Fi scan results to their servers at regular intervals. Our results indicate that Google’s geolocation provides good accuracy and broad coverage. Because of this and the open nature of the API, we chose this service for our experiments. We extract the following geolocation features: (i ) the speed calculated from location distances, and (ii ) the number of location samples around the current location within a specific radius rloc . Note that these features are corresponding to those based on GPS locations. 3.1.4. Feature extraction We extract the complete set of features for each of the sensor sources. The feature extraction has three distinct parameters: (i ) a window size over which each feature is computed, (ii ) an RSSI threshold parameter for Wi-Fi APs below which APs are ignored and (iii ) the density parameter rloc that is used for computing the number of location samples around the current location for geolocation and GPS. Features are extracted for each sample. Since our collection application samples fine-granular at a 2s sample interval for Wi-Fi and 1s for GPS, we 5

Users 7

Phone types 5

Traces

Samples

142

229,417

Total 284

Unique 128

POIs Indoors 90

Outdoors 38

Table 1: Overview of the collected data from four european countries. Note that we selected traces for a variety in activity and points of interests.

may also look at the impact of subsampling the sensor sources. Subsampling reduces energy consumption, geolocation overhead, and the amount of data to be stored. Hence, subsampling allows us to trade-off implementation costs for the fidelity of our classification and clustering approaches. Subsampling is done by removing elements from a trace such that the time between consecutive data points is increased to the desired sampling interval. The feature extraction is then performed on this subsampled trace. 3.2. Data collection We collected data traces from various locations in over a dozen cities in The Netherlands, Germany, Denmark, and Switzerland using a custom Android application. Seven users collected GPS and Wi-Fi data on ZTE Blades, Sony Xperia X10 Mini, Samsung Galaxy S, Samsung Galaxy Ace and Motorola Defy phones. The users are knowledge workers at a university and were asked to log and annotate parts of their day where they traveled to some POI in their lives. During the collection process users manually annotated traces with their activity (walking, dwelling, ...) using the Android application. Since the emphasis is on detecting dwelling and POIs, users visited favorite locations such as supermarkets, bars, tram stops, university buildings, and homes. Dwelling was explained to users as ‘staying at a certain place for some minutes’. This means waiting for the bus is dwelling, but waiting at a traffic light isn’t. Users additionally annotated their POI. The granularity of these annotations was buildings, i. e., street addresses, for both indoor and outdoor POIs. In the following we discuss our data collection approach and the sensor data idiosyncrasies we identified in the collected data traces. 3.2.1. Data overview Our collection software scanned for Wi-Fi APs at 2 second intervals; for each scan we recorded the returned list of APs and their signal strengths, along with the most recent GPS measurement. Note that the GPS hardware samples actually with 1 Hz. While we generally use 2s intervals for feature extraction, we use the higher GPS granularity for POI extraction. The raw traces were sanitized by removing GPS outliers as well as Wi-Fi beacons from locally administered APs [17] in order to rely on fixed APs only. We then obtained geolocation data by passing the Wi-Fi scan results to the Google geolocation service [19]. The returned result is a location and an estimate of its accuracy. Querying can be done either on-line from the phone if an Internet connection is available, or off-line at a central server.

6

Activity Dwelling Walking Driving Running Cycling Train Tram Subway

Samples Count Fraction 155,050 67.6 52,345 22.8 10,664 4.6 3,384 1.5 3,234 1.4 2,696 1.2 1,931 0.8 113 < 0.1

Coverage Wi-Fi GPS 95.6 49.2 93.1 88.3 72.1 83.7 81.0 92.8 95.3 74.6 77.9 65.3 87.2 87.7 25.7 1.8

Table 2: Wi-Fi and GPS coverage for different activities.

Table 1 summarizes the data we collected over the course of four months. We selected 142 traces, containing a total of 229,417 measurement samples, collected in different-sized cities (from ≈ 13, 000 to about 600, 000 inhabitants) across Western Europe. These traces include 284 dwelling locations; 128 locations are unique, i. e., some locations are visited multiple times like the users’ home and favorite caf´es. 90 of these unique locations are indoors, while 38 are outdoors. Users annotated locations post-facto by providing a textual description (including whether it was in- or outdoors) as well as longitude and latitude for each POI1 . 3.2.2. Coverage We characterize coverage of GPS and Wi-Fi based on our extensive data set. Note that Wi-Fi coverage obviously also influences the geolocation results. Table 2 summarizes the GPS and Wi-Fi coverage for the different activities found in our data set. Most (≈ 32 ) of the time the users were dwelling. Wi-Fi coverage at these POIs is very good, better than GPS. GPS is particularly hampered indoors, yet still available. We detail on coverage at POIs below. For walking and cycling in urban environments, we can see that coverage is good. Coverage while running, in particular for Wi-Fi, is lower as there is limited (or no) coverage in parks and forests. Coverage in transportation is reasonable and only deteriorates when exiting densely populated areas. However there is almost no coverage in subways, with even Wi-Fi having a mere 25.6%; this high number is the result of including stations/platforms for all transportation activities. Note that the samples collected for some of the mobile activities are rather low. However, in this work we only differentiate between mobile and dwelling and as such do not consider the differences among the mobile activities. Figure 1 depicts the coverage at the unique POIs. Note that GPS coverage is hampered at indoor locations, yet still often available. Moreover, GPS coverage 1 Longitude

and latitude were determined using Google Maps [21].

7

100 Coverage percentage

Coverage percentage

100 80 60 40 20 0

WiFi

Indoors

80 60 40 20 0

GPS

WiFi

Outdoors

GPS

Figure 1: Coverage (%) of Wi-Fi and GPS location services for indoor and outdoor locations.

P[Scanned