Comparing the Spatial Characteristics of Corresponding Cyber and Physical Communities: A Case Study Xu Lu1, Arie Croitoru1, Jacek Radzikowski1, Andrew Crooks2, Anthony Stefanidis1 1
Center for Geospatial Intelligence and Department of Geography and Geoinformation Science, George Mason University 2 Department of Computational Social Science, George Mason University
{xlu5, acroitor, jradziko, acrooks2, astefani}@gmu.edu ABSTRACT
been analyzed to detect a variety of events (Chen and Roy, 2009), and they have even been studied in the context of supporting landuse classification operations (Leung and Newsam, 2012).
The proliferation of social media over the past few years is presenting us with unique opportunities to sample opinions and interests at spatial and temporal resolutions previously unheard of. In order to make best use of this information though, we need a better understanding of the degree to which the cyber community that is observed through them can serve as a proxy for the corresponding physical community. In this paper we are making a contribution towards this issue by presenting a case study in which we compare spatial characteristics of a community both in the physical and cyber spaces. The key findings of our analysis relate to the selection of an appropriate level of spatial aggregation for analyzing social media content, and on the effect in the level of participation of the distance from the point of interest.
As people act as hybrid sensors by posting in social media, we are presented with a unique opportunity to sample their opinions, reports, and expressed interests at spatial and temporal resolutions previously unheard of (Stefanidis et al., 2013). However, despite the above-mentioned successes we still face significant challenges in transforming social media observations into real world knowledge (Rost et al., 2013). For example, despite efforts to predict election outcomes using social media content our ability to do so reliably is still very limited (Metaxas et al, 2011). A major reason for this inability to fully exploit the geolocated content of social media feeds stems from our limited understanding of the degree to which the cyber community that is observed through them can serve as a proxy for the corresponding physical community. In this paper we are making a contribution towards this issue by presenting a case study in which we compare spatial characteristics of a community both in the physical and cyber spaces. Through this analysis we aim to gain insights on similarities and differences among them. It is important to note that in this paper we use the term community rather loosely, to refer to a cohesive set of individuals who share a common aspect (e.g. some association with an institution as will be the case here).
Categories and Subject Descriptors H.2.8 [Database Applications]: Spatial databases and GIS; J.4 [Computer Applications]: Social and Behavioral Sciences
General Terms Measurement, Verification
Experimentation,
Human
Factors,
Theory,
Keywords Social media, Geography, Community, Twitter
1. INTRODUCTION
The rest of the paper is organized as following: in Section 2 we describe our case study datasets. In Section 3 we present our experimental results, followed by concluding remarks in Section 4.
Fostered by Web 2.0 and corresponding technological advancements, social media have become massively popular during the last decade. By providing venues for the general public to express their comments and opinions they enable both the dissemination of information, and the formation of social networks in cyberspace (Kwak et al., 2010). Furthermore, an increasingly sizeable portion of such content is geolocated, either in the form of precise coordinates of the location from where these feeds were contributed, or as toponyms of these locations, thus enabling new types of data mining processes for a variety of applications. For example, twitter data can be analyzed to assess the impact area of earthquakes (Crooks et al., 2013) or to predict flu outbreaks (Aramaki et al., 2011). Similarly, flickr data have
2. CASE STUDY DATASETS The objective of our case study is to identify a community in the physical space and compare it to its closest cyberspace counterpart. In order to select a case study that is both sufficiently large and spatially distributed to support meaningful analysis, and sufficiently distinct so that it can be identified in social media content, we chose the community of interest formed around a large higher education institution, namely George Mason University (GMU). Furthermore, as studies have indicated that there exists a strong relationship between the alumni community and the branding presence of a University (McAlexander et al., 2006) we can reasonably argue that the alumni community in physical space can serve as a close counterpart to the cyber community for our study. Accordingly, we choose the alumni community of GMU as a representative sample of the spatial distribution of that community in physical space. As its cyberspace counterpart we chose the online community formed in twitter as part of a discussion about the same University. Given these datasets, our task is to compare
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGSPATIAL LBSN’13, November 5, 2013, Orlando, FL, USA (c)2013 ACM ISBN 978-1-4503-2533-2/13/10...$15.00
11
In the following section we will present specific metrics comparing the spatial characteristics of these two communities.
these two communities in terms of their spatial patterns in order to gain insight on how observed patterns in geolocated social media compare to the corresponding community in physical space.
3. RESULTS
2.1 Physical Space Community: Alumni Dataset
The key findings of our analysis relate to the selection of an appropriate level of spatial aggregation for analyzing social media content, and on the effect of distance from the point of interest in participation, and they are presented in the following subsections.
In order to establish the spatial footprint of our physical community we collected anonymized address data for GMU alumni. Accordingly, for the 154,140 alumni that the University has graduated since 1961 we collected their current home address zip codes, which are spread across 9,822 different zip code areas throughout the US. As the US Census Bureau lists approximately 43,000 different zip codes in the US, our alumni are spread across 22.8% of all these zip code areas. The spatial distribution of these alumni is shown in Figure 1. The clusters of states in the background (shown in different colors) correspond to the ten Federal Standard Regions.
! !
! !
!
! ! ! ! ! !!! ! ! !! !
!
!! !!!
!! ! ! ! !! !!!! !!! !!!
!
!! !
!
!
! ! !! ! ! ! ! ! ! ! ! !!! !
!!
! !! ! ! ! ! ! ! !! ! ! !! !! !! !! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! !! ! ! !! ! ! ! ! ! !! ! !! !! ! ! ! ! ! !!!! ! ! ! ! !! ! ! ! !!! ! ! ! ! ! ! ! ! ! !!! !!! ! !! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! !! !! ! !! ! !!
!
! ! !
!
!! !
!
! !! ! ! ! ! ! !! !
!
! ! !
!
!! !!
! !
! ! ! !
! !!!
! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! !! !! !
!
! !!! !
!
!! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !
!!
!
!
!!
! ! !!! ! ! !! ! !!! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! !! !! ! ! ! ! ! ! !!! !! ! ! ! !! !! !!!! ! ! !! ! ! ! ! ! ! ! ! !! !!! ! ! ! !! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! !! ! ! ! !! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! !
!
!! !! ! !! !! !!! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !! ! ! ! ! !! ! ! !! !! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! !! !!! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !! ! ! ! ! ! !!!!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!
!
!
! ! !
!
! !
!
!
!
! !
! !
!
! ! ! !! ! ! ! !! ! ! !!
!
! !
!
!
!
!! !! !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !
!
! !
! ! !
! ! ! ! ! ! ! ! ! !! ! !! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! !!!!! !! ! ! ! ! ! !! !! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! !! !! ! ! ! ! !!!!!! !! ! !! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! !!! ! !!!! ! ! !! ! !!! !! !! !!! ! ! ! ! ! !! ! ! ! !! ! ! ! ! !! !! ! !!!! !! ! !! !! ! !! !!! ! ! ! !! ! !! ! !! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! !! ! ! !!! !!!! !! ! !! ! !!! !! ! !! !! !!!! ! !! ! ! ! !!!! ! ! !! ! ! ! ! ! !! ! !! !!! ! ! ! ! ! ! !! !!!! ! ! !! ! ! ! ! !! ! ! !! ! !!!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! !! !! !!!! !! !! !! ! !! ! !! ! !!! !!! ! !!! ! ! !! !! ! !! ! ! !! !!! ! ! ! !!!!!!!! !!!!! !! !!! !! !! ! ! ! ! ! ! ! ! ! !!! !! !! !!!! ! ! ! ! ! ! ! !! ! !! ! ! ! ! !!!!!!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! !! ! !! ! ! !! ! ! ! ! !! ! ! !!! !! ! ! !! ! ! ! ! ! !!!!!! ! ! ! ! !!! !!!!! !! ! !! ! !! ! ! ! ! ! !!! ! ! !!! !!! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !!! !!! ! ! ! ! !! !!! !!!!! !! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! !! ! ! !! ! !! ! ! !! !!! !! ! ! !! !! ! ! ! ! !!! !! !! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !!!!! ! ! ! ! ! ! ! ! !!! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! !! ! ! !! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! !!! ! ! !! ! ! ! ! !! ! ! ! !! ! !! ! ! ! ! ! ! !!! !!! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! !! ! !! !!!!! ! ! !! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! !! !! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !! ! ! !! !! !! ! ! ! ! ! ! ! !!! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! ! ! ! !!!!! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !!! ! ! !! !!! !! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! !! !! ! ! !! ! ! ! ! !! ! ! ! !! ! !! ! !! ! ! !! ! ! ! ! ! ! ! ! !! ! !! ! ! !! !!! !! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! !! ! ! ! !!!! ! !!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!! !! ! ! ! ! ! ! ! ! ! ! !! !! !!! ! ! ! ! !! ! ! !! ! ! ! ! ! !!!!!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!! ! ! ! ! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! !!!! ! ! ! ! ! ! ! ! ! !! ! ! !!! ! !!! ! ! ! !! ! ! ! ! !!! !!! !!!!!!!!! ! ! !! ! ! ! !! ! ! !!! ! ! ! !! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! !!!! !! ! !!!! ! ! ! ! ! ! !!!! ! ! ! !!! !!!! !! ! !!!!!! ! !! ! !! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !!! !!!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! !! ! ! ! ! ! !! ! !!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! !! ! ! ! ! !! !!!! ! ! ! !! ! ! ! ! !! ! !! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! !! ! ! !! !!! ! !!! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! !! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! ! ! ! !! ! !!!! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !!! ! ! ! ! ! ! ! ! !! !! !! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !! !!!!! !! ! !! ! ! ! !! ! ! !! !!! ! ! !! ! !! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! !!! ! ! !! ! ! ! ! ! ! !!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! !!!! !! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !!! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!!!! !! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! !!! ! !! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! !! ! ! ! ! ! !! ! ! ! !! ! !! !! ! !! ! !! ! ! !! ! !!! ! ! !! ! ! ! ! !! ! !!! !!!! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! !!! ! !! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! !!! !!! ! ! !! ! ! ! ! ! ! !! !! ! !!!!!! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! !! !! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! !! !!! !! !! !! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! !!! !!! ! !! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! !! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! !! ! ! ! !! ! !! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! !! ! ! !!! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! !! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! !! ! !!!! !! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! ! !! ! !! ! ! !! !!! ! ! ! ! !! ! ! ! !!! ! ! !!! !! ! ! ! ! ! ! !! ! ! !!!!!!! ! ! !! ! ! ! !! !! ! ! ! !! ! ! ! !! ! ! ! ! !!! ! !! ! !! ! ! ! !! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !!! ! !! ! !!!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !!!! ! !!! ! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! !!!! ! ! ! !! ! !!! !!! !! !!! ! ! !!!! ! ! !! ! ! ! ! ! ! ! ! ! !! !!!! ! ! ! !!! ! ! ! ! ! ! ! ! !!! ! ! ! !! ! !!! !! !! ! ! !!! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!!!!! !! ! ! ! !!! ! !! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! !! ! ! ! !! ! ! ! ! !! ! !! ! ! ! ! ! !! ! ! !!!!! ! ! ! !!! ! !!!! !! ! ! !!! ! ! ! ! ! !!! ! ! !!! !! ! ! ! !! !!!! ! !! !! ! ! !! ! !! ! ! ! !!! ! ! !! ! ! !! !!! ! !!! ! ! !!! !!!! ! !! ! ! ! ! ! !!! !!! ! ! ! ! !! ! !! ! ! !! ! !! ! !! ! ! ! ! !! ! !!! ! !!!!! ! ! !!!!! ! ! ! ! !! !! !!!! ! !! ! !! !! ! ! ! !!! ! !! ! ! ! ! ! ! !! ! !! ! !!!! ! ! !!! !!! ! !!! !! !!! !! ! ! !! !!! !! ! ! ! !! !! ! ! !! !!!!!!! !! ! !! ! !!! ! ! !!!!! !! ! ! !!!! ! ! !! !!!!! !! !!!! ! ! ! ! !! !! ! !!!! ! ! !! ! !!!! !! !! !! ! ! !! ! !!!!! !! ! ! !! !!!! ! ! ! ! ! !! !! ! ! ! ! !! ! !! ! ! ! !! !!! !! ! !! ! !! ! ! !! ! ! !! !! !! ! ! ! !! !! !!! ! ! ! ! !! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! !!! ! ! !! !!! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! !! ! !! ! !!!!!!!!!!! !! ! ! ! ! ! ! ! ! !! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! ! ! !! ! ! ! ! !!!!! ! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! !!!!! ! ! ! !! ! !! ! ! ! !! ! !! ! !! ! ! !!! !! ! ! !! !! ! !!!!! !!! ! !! ! ! ! !!! ! ! ! ! ! ! !! !!! ! ! ! ! !!!! !!!! ! ! ! ! !! ! ! !!!! ! ! !! !! !!! ! ! !!! ! !! ! ! !! ! ! !! ! !! ! !! ! ! ! ! !! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! !! ! ! ! ! !! ! !! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! ! ! ! !! ! ! ! !!! ! ! !! ! !! ! !! ! ! ! ! ! !! !! ! !! !!!!!!! ! ! ! !! ! ! ! ! !! ! !!! ! !! ! ! ! ! !!!! ! !! ! !!! ! ! !! ! ! ! ! ! ! !! !! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !!! ! !!! !! !!! !! ! ! !!!!! !! ! ! ! ! !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! ! ! !! !!!!! !! !! ! !!!!!! !! !! ! ! ! ! ! ! !!!! ! !! !!!! ! ! ! !! ! !! ! ! ! ! ! !! ! !!!! ! ! !! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! !! ! ! !! !! !! ! !! ! ! ! ! ! ! ! ! ! ! !!! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!!! ! ! !! ! !! ! !! ! ! ! !! ! ! ! !! ! ! !! ! !!!! ! ! ! ! ! !!! ! ! ! !! ! ! ! ! !! !!! !! !!! ! !! ! ! !! ! !! ! ! !! ! ! ! ! !! ! ! ! ! ! ! !! ! !! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! !!! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! !! ! ! ! ! !! ! ! ! ! ! !!!! !!!! ! ! ! !! ! ! ! ! !!! ! ! ! !! ! ! ! !! !! ! ! !! !! !! ! ! ! !! ! ! ! ! ! !!! ! ! !! ! ! ! !!! ! ! !! ! ! !! ! ! ! ! ! ! !! !!! ! ! ! !! ! ! ! ! ! ! !!! ! !!! !! !! ! ! ! ! ! ! !! ! ! !! ! !! !! ! !!! ! ! ! ! ! !!! ! !! ! ! !! !!! ! ! ! ! ! ! !! ! ! !!! ! ! ! ! ! ! !!! ! !! !!! ! !! ! !! ! ! ! ! ! ! ! ! !! ! ! ! !!! ! ! !! ! ! !! ! ! !! ! !! ! !! ! ! ! !! ! !!! ! ! ! !! ! ! ! !! ! ! ! !!!! ! !! ! !!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! !! ! !!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! !! !! !!! ! ! ! !! ! ! ! !! !! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !! ! !! ! ! ! ! ! ! ! !! ! !!!!! ! ! ! ! ! ! ! ! ! !! ! !! ! !! ! ! ! ! !! !! !! ! ! ! !! !! ! ! ! ! !! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! !! ! ! ! ! !! !! ! ! ! ! ! !! ! !! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! !! !! ! ! !!! ! ! ! ! ! !!! ! ! !!!!! !! !! !! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! !! ! !! ! ! ! !! !!! ! ! ! ! ! ! ! ! !! !! ! ! !! ! ! ! !! ! ! !! ! !! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! !! ! !! !! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !! ! !! ! ! !! ! ! !!! ! ! ! ! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !
!
! ! ! !! ! ! !
! !! !! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! !! ! ! ! ! !!
!!
! ! !!
! ! !
!
!
!! ! !
! ! ! ! ! ! !
! ! !! !
!
! ! ! !! ! ! !! !! ! ! ! ! ! !
!
! !! ! !!! ! !
!
!
!
! !! ! ! ! ! ! !
!! ! !
!!!
!
! !
! !! !
10
10
og
! ! !! !! ! ! ! ! ! !! ! ! !!!! ! ! !! ! ! ! ! !! ! ! ! ! !! ! !! !! ! ! ! ! ! !! ! ! ! ! ! ! !! ! !! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !
A key issue when analyzing the effect of spatial distribution is to select the appropriate level of spatial aggregation in order to infer meaningful information from the dataset (see for example the modifiable areal unit problem efforts (Openshaw, 1983)). In order to explore this problem in our study we have aggregated our data at two different levels: zip code and state. In Figure 3 we show a log-log scatter plot of alumni versus tweets spatially aggregated at the zip code level, while in Figure 4 we show a similar scatter plot aggregated at the state level.
Twee
!! ! ! !!!! ! ! !!!! ! !! ! !! ! !! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! !! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !!! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! !! !!! ! ! ! ! !
3.1 Level of Spatial Aggregation
! !!
10
Figure 1. Distribution of GMU alumni address in the US. 10
2.2 Cyber Space Community: Twitter Dataset The corresponding cyberspace social media community was generated by collecting twitter data discussing GMU over a period of eleven months (August 1st, 2012 to July 3rd 2013). The data was collected and processed using our GeoSocial Gauge system prototype (Croitoru et al., 2012; 2013). By accessing twitter’s application programming interface (API) and its one percent streaming content we collected 151,900 tweets discussing GMU during that period. From among them, 70,600 were geolocated within the US. Location information was obtained either by harvesting tweets tagged with precise coordinates (as is usually the case when tweets are posted from a GPS-enabled mobile device), or by identifying descriptive toponyms in tweets, which can be geolocated using a standard Gazetteer (Yahoo! Geocoder in this case). The spatial distribution of these geolocated tweets is shown in Figure 2. !! ! ! !!
!
!!! !! ! ! !! ! ! !!! !! ! !
!
!
!! !! ! ! !!! !! ! !! !!
! !
! !!
! !
!
! !
! !!
!
! !
! !
!
!
!!
!
!
! !!
!
! !
! ! !! ! ! ! !! !! ! ! ! !! ! !!! ! ! ! ! !!! !! ! !!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !!! ! !!! !! !! !! ! !!! ! ! ! ! ! !! !! ! !! ! ! ! ! !!! ! ! ! ! !! ! ! ! ! ! ! !! ! !
!
!!
!
! !
! !
! ! !!
! !
! ! ! !
! !!
!!
! ! !!
!
! !
!
! !
!!! ! !
! !
! ! ! !
! !
!! ! ! !! ! !
! !! ! ! ! ! !!
! !! !! !! !! ! ! !! ! !! ! !
!
!
!
!
!
!
! !
!
! ! ! ! !! ! ! !! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! !! !! ! !! !!! ! !! ! ! !! ! ! ! ! ! !! ! ! !!! ! !! ! ! ! ! ! ! !!! !!!!!! ! !! !! ! ! !! !! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !
! ! !
! !!
!
! ! ! ! ! ! ! ! ! !! ! !
! ! !
!
!
! !
!
!
!
! ! ! !
!
!!
! !
!
!
!
!
! ! ! !!
! ! !
! ! ! !
!
! ! ! !
! !
! ! !
!
!
! ! !! ! ! ! ! !
! ! ! !
! ! !
!
!
! ! !
! !
!
!! ! ! ! !!
!
! ! ! ! !! ! ! ! !!
! ! !! ! !! ! ! ! !
!
!! ! ! ! !! ! ! ! ! ! ! ! !
! !
!
!
!
! ! ! !! ! !
!
!
! ! !! ! ! ! ! ! !!! ! ! !! ! !! !! !
!
!! !!! !
!
! !! !
!
og
! ! !
! !
! !
!
!
10
!
!
! ! !!!!!!! !
!! ! ! ! ! ! ! ! ! ! ! ! !! ! !!
! !
!
10
! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! !! ! ! !! ! !! ! ! ! ! ! !! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! !! ! ! !! ! ! ! !! !! ! ! !!!! ! !! ! ! ! !!!! ! ! ! ! ! ! ! !! ! !! ! !!! ! ! !!! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !!! ! !!! ! !! !! ! ! ! !! ! !! ! ! ! !! !! ! ! ! ! ! !! ! !!!!! !!!! ! ! ! ! !!! ! ! ! !! ! ! ! !!! ! !! !!! ! !! ! ! ! !! ! !!! ! !!!! ! !! !! !! ! ! ! !!! !! !! ! !! !! ! ! ! ! !!! ! ! ! !!! ! ! !! ! !!!! ! ! !! !! !! !! ! ! ! !! ! ! ! ! !! ! ! ! ! !!! ! ! !! ! !! ! !! !! ! !! ! ! ! ! ! !!!! ! ! !!! ! !! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! !! ! !! !!!! ! ! ! !! ! !! !! ! ! ! ! ! ! ! !!!! ! !!! ! ! ! ! ! ! !! ! ! ! ! !! ! ! !! ! !!! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! !!!!!! ! !! ! !! ! ! ! ! !! ! !! ! ! !!! ! !! ! ! ! ! ! ! ! ! ! ! ! !! !! !! !! !!! ! !! ! ! ! ! !! ! ! !! ! ! ! ! ! !!!!!! ! !!! !! ! ! ! !!! ! !! ! !! ! ! ! !! !! ! !! !! !!!! !!! ! ! ! !! ! ! !! ! ! !! ! ! !! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !!! !! ! !! ! !! !! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! !! ! ! ! !! ! ! ! ! ! !! ! ! ! !! !! !! ! ! ! ! !! ! ! !!! !! !! ! ! ! ! !! ! !! ! ! !! ! ! ! ! !! ! !! ! ! ! !! !! ! !! ! !! !! ! ! ! !! !! !! !! !! !! ! !! ! ! ! ! !! !! ! ! ! !! ! !! ! !! ! ! !! !! ! !! ! ! !! ! ! ! ! ! ! ! ! !! ! !!! ! ! ! !! ! !! ! !!!! ! !! !! !!! ! !!! ! ! ! !!!! ! ! !! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! !! !! ! !! ! ! !!! !! !! ! !!! !! ! !! ! ! !! ! ! ! !! !!! ! !!! ! ! ! !! !! ! ! !! ! ! !! ! ! !!! !!! ! ! !! !! ! ! ! !!! ! !! ! !! ! !!! ! ! ! ! !! ! ! ! ! !! ! !!! !! ! !!!! !! ! ! ! !!! ! ! !! ! !! ! ! !!! !! !! ! ! ! ! !! ! ! ! ! ! !! !! ! !! !! ! !!! ! !! ! ! !! ! ! ! !! ! ! ! !!! !! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !! ! ! !!! ! ! ! ! ! !! !! !! ! ! ! ! !! !! !! ! ! !! ! !! ! ! ! !! ! ! !!! ! ! ! ! !! ! !! ! !! !! !! ! ! !! ! !! ! !! !! !!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !!!! ! !! ! ! ! !!! ! ! ! !!! !! !! ! ! !!!! ! ! !!! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !! !!! ! !!! ! !!! ! !! ! ! ! ! !! !! ! ! ! ! ! !!!! ! !! ! ! ! ! ! ! !! !!! ! !!!! ! ! ! !!! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !!!! ! !! !!! ! ! ! ! ! !! ! !!! ! ! ! ! ! !! !! ! ! ! !! ! ! ! !! ! ! ! !!! !! ! ! ! ! ! ! !! ! ! !!! ! ! ! ! ! ! ! !! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! !! !! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! !! !! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! !! !! !! ! ! !! ! ! ! ! ! ! !! ! !! ! ! !! ! ! !! ! ! ! ! !! ! ! ! !! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! !!!! ! ! ! ! !!! ! ! ! !!!!! ! !!!! !
!
!
! !
! ! ! ! !! !! !
! ! ! ! ! ! ! !! !! ! ! ! !! ! !! ! !!!!! ! ! !!! ! ! !! ! ! ! ! !! ! ! ! !! ! ! !! ! ! ! !!! ! !! ! ! !! ! ! ! ! ! !! ! !! ! ! ! ! !! ! !!
!
!!
!
! ! !
10
10
!!
!
!! !
!! ! !
! ! !
!
!
!
!
!
! !
! ! ! !!
! ! !
!
!
10 og
Figure 3. A log-log plot of alumni (horizontal axis) versus twitter traffic (vertical axis) spatially aggregated at the zip code level.
! ! !
!
!
! !
!
! ! !!
!
!
!
!
!
! !
! ! !
! !
!
!
! !
! !
!
10 A umn
Twee
!
10 10
10
10 10
10
10 A umn og
10
10
! ! ! ! ! ! !
Figure 4. A log-log plot of alumni (horizontal axis) versus twitter traffic (vertical axis) spatially aggregated at the state level.
! ! ! ! !! !! ! ! ! ! ! !! ! ! ! ! ! ! ! !! !
!
Figure 2. Spatial distribution of tweets discussing GMU in the period 8/12 – 7/13.
It is easy to observe that Figure 3 does not exhibit any discernable pattern of correlation, whereas Figure 4 does. More specifically, the correlation coefficient for the dataset that is analyzed at the zip
A visual comparison of these two maps (Figures 1 and 2) suggests that these two communities have comparable spatial distributions.
12
artifacts is eliminated, there is no discernible correlation (the correlation coefficient is only 0.41, just slightly improved compared to the earlier 0.34). This is further suggesting that the emergence of patterns at the state level analysis is not due to toponym-induced artifacts, but rather due to the nature of the data, which renders an aggregation at the state level more suitable for their study. We further studied the behavior of our data when aggregating their spatial resolution to the Federal Standard Region level as they were identified in Figures 1 and 2, and the correlation coefficient improved only slightly (0.99, compared to 0.97), which suggests that the state-level analysis offers sufficiently meaningful results while still preserving some reasonable level of spatial granularity.
code level (Figure 3) is only 0.34, whereas the same analysis at the state level (Figure 4) is 0.97, indicating a very strong correlation between cyber community activity and physical community population. This would suggest that although the data can be collected at very fine resolutions, even the individual location level (e.g. precise coordinates), their analysis is better performed at a coarser spatial resolution (e.g. State in this case) in order to better correlate these observations to their real world equivalent. Potentially, this can be attributed to two primary types of artifacts resulting from the geolocation process: Toponym-induced artifacts: geolocating tweets whose spatial footprint is available as a toponym (rather than precise coordinates) may result into two different types of errors. Firstly, the geolocation assigned to a toponym from the gazetteer is typically the geographic center of the area associated with this toponym, and not necessarily the correct zip code. This tends to inflate the presence of certain zip codes (especially ones at the geographic center of populous geographical areas) in our datasets. Secondly, as toponyms are often non-unique, and this may lead to errors when attempting to resolve a location given an ambiguous toponym.
3
10
2
10 Tweets (log)
1
10
User-induced artifacts: as user activity is not regulated by any means, it is not uncommon to have users that have a disproportionate level of participation in the datasets, generating massive number of contributions. This may bias our analysis by inflating the presence of the corresponding zip codes relative to the rest.
•
0
10 0 10
The effects of these artifacts is shown in Table 1, where we list the top 10 most active zip codes for our dataset. We have highlighted zip codes 10007, 46123 and 94930 as they reflect these two types of artifacts. For zip code 10007 we have a toponym-induced error as it is the zip code corresponding to the geographical center of New York, NY. For 94930 we have a classic gazetter-induced error, as Fairfax is associated with Fairfax, CA rather than Fairfax, VA which is the location of GMU. For zip code 46123 we have a particular user who is contributing very large numbers of tweets for GMU, thus inflating that zipcode’s presence in our datasets. Tweets!
10007! 08505! 33133! 46123! 94930! 11767! 60602! 33122! 23219! 77002!
1! 1! 2! 3! 2! 2! 1! 1! 8! 1!
1531! 291! 484! 638! 396! 345! 164! 157! 989! 113!
Tweets$per$ alumni! 1531! 291! 242! 212.67! 198! 172.5! 164! 157! 123.63! 113!
3
4
10
10
3.2 Distance from Point of Interest When we analyze the distributions of alumni and tweets as a function of distance from Fairafx, VA (the location of GMU) we see that they follow comparable patterns (Figure 6). The small bump at a distance of approximately 2,000 miles in both tweets and alumni is corresponding to the transition from the Rocky Mountain region to the Pacific States. The patterns in Figure 6 indicate a distance decay affect, with the numbers dropping quickly and steadily as we move away from the point of interest. However, when we analyze the level of participation of different communities as a function of distance we get a different picture.
Location! New$York,$NY! Bordentown,*NJ! Miami,&FL! Avon,&IN! Fairfax,(CA! Nesconset,)NY! Chicago,(IL! Miami,&FL! Richmond,*VA! Houston,(TX!
4
12
4
x 10
3.5
x 10
3
10
2.5
8
Alumni
Alumni!
2
10 Alumni (log)
Figure 5. A log-log plot of alumni (horizontal axis) versus twitter traffic (vertical axis) aggregated at the zip code level but using only tweets with precise coordinates associated with them.
Table 1. The top 10 most active zip codes in our dataset. zip!
1
10
Tweets
•
6
2
1.5
4 1
The aggregation at a state level minimizes the effects of these artifacts, thus allowing more meaningful patterns to emerge when analyzing the corresponding datasets. In order to further assess the effects of these artifacts we also plotted in Figure 5 the equivalent of Figure 3 but using only the tweets that had precise geolocation information in the form of coordinates. We see that even in this case, when toponyms are not used and the source for potential
2
0 0
0.5
1000
2000 3000 Distance to GMU (miles)
4000
5000
0 0
1000
2000 3000 Distance to GMU (miles)
4000
5000
Figure 6. Left: Alumni number (vertical axis) versus distance from Fairfax (horizontal axis). Right: Number of geolocated tweets (vertical axis) as a function of distance from Fairfax (horizontal axis).
13
sociocultural topics) that have no associated physical location to be considered as their epicenter. We plan to extend our work to such datasets in order to further advance our understanding of the spatial characteristics of corresponding cyber and physical communities.
We define the level of participation as the the ratio of tweets per alumnus at different locations. In Table 1, for example, in addition to the numbers of tweets and alumni we have also listed this ratio. This allows us for example to see that Miami, FL has a larger participation index (291 tweets per alumnus) than Houston, TX (113). When we plot this participation index as a function of distance from Fairfax,VA in Figure 7 we observe a very different pattern than the distance decay one observed in Figure 6. As a matter of fact, the graph suggests that there exists a reverse participation decay function, as participation is overall increasing with distance (instead of decreasing). This is a very interesting finding, as it not only indicates that Tobler’s first law does not apply to participation index, but that its reversed one does: enthusiasm (and participation) is higher as we move away from the actual point of interest.
5. REFERENCES [1] Aramaki, E., Maskawa, S., & Morita, M. 2011. Twitter Catches the Flu: Detecting Influenza Epidemics using Twitter. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing, 1568-1576. [2] Chen, L., & Roy, A. 2009. Event Detection from Flickr Data through Wavelet-based Spatial Analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, 523-532. [3] Croitoru, A. Crooks, A., Radzikowski, J. & Stefanidis, A.. 2013. GeoSocial Gauge: A System Prototype for Knowledge Discovery from Social Media, International Journal of Geographical Information Science, (in press).
4
10
3
Tweets per alumnus (log)
10
[4] Croitoru, A., Stefanidis, A., Radzikowski, J., Crooks, A., Stahl, J. & Wayant, N. 2012. Towards a Collaborative GeoSocial Analysis Workbench, COM.Geo ’12, Washington, DC.
2
10
1
10
[5] Crooks, A., Croitoru, A., Stefanidis, A., & Radzikowski, J. 2013. # Earthquake: Twitter as a Distributed Sensor System. Transactions in GIS, 17(1), 124-147
0
10
−1
10
[6] Gao, H., Barbier, G., & Goolsby, R. 2011. Harnessing the Crowdsourcing Power of Social Media for Disaster Relief. Intelligent Systems, IEEE, 26(3), 10-14.
−2
10
−3
10
0
10
1
10
2
10 Distance (mile) (log)
3
10
[7] Kwak, H., Lee, C., Park, H., & Moon, S. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th ACM International Conference on World wide web, 591-600.
4
10
Figure 7. A log-log graph of tweets per alumnus (vertical axis) as a function of distance from Fairfax (horizontal axis).
[8] Leung, D., & Newsam, S. 2012. Can Off-The-Shelf Object Detectors Be Used To Extract Geographic Information From Geo-Referenced Social Multimedia?. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on LocationBased Social Networks, 12-15.
4. CONCLUSIONS AND OUTLOOK Gaining a better understanding of the degree to which a cyber community that is observed through social media content analysis can serve as a proxy for the corresponding physical community will improve significantly our ability to analyze such data and extract meaningful and reliable information from them. Towards this goal, our short paper contributes a case study in which we compared spatial characteristics of a community both in the physical and cyber spaces.
[9] McAlexander, J. H., Koenig, H. F., & Schouten, J. W. 2006. Building relationships of brand community in higher education: a strategic framework for university advancement. International Journal of Educational Advancement, 6(2), 107-118. [10] Metaxas, P. T., Mustafaraj, E., & Gayo-Avello, D. 2011. How (not) to predict elections. In Privacy, security, risk and trust (PASSAT), 2011 IEEE third international conference on and 2011 IEEE third international conference on social computing (SocialCom), 165-171.
The key findings of our analysis relate to the selection of an appropriate level of spatial aggregation for analyzing social media content, and on the effect in the level of participation of the distance from the point of interest. More specifically, our data suggest that despite the fact that geolocated social media content may be collected at very fine spatial resolution, their analysis is better performed at a coarser spatial resolution (e.g. State in this case) in order to better relate these observations to their real world equivalent.
[11] Openshaw, S. (1983). The modifiable areal unit problem (Vol. 38). Norwich: Geo books. [12] Rost, M., Barkhuus, L., Cramer, H., & Brown, B. 2013. Representation and communication: Challenges in interpreting large social media datasets. In Proceedings of the 2013 ACM Conference on Computer supported cooperative work, 357-362.
We also analyzed the rate of participation as a function of distance and observed that our datasets implies that their relation is surprisingly the opposite of Toblerian, as the rate of participation appears to increase with distance. Of course in this case we are monitoring an interaction that is formed around a topic (GMU) that has a physical presence and a corresponding point of interest. In contrast, other communities may be formed around topics (e.g.
[13] Stefanidis, A., Crooks, A., & Radzikowski, J. 2012. Harvesting ambient geospatial information from social media feeds. GeoJournal, 78(2), 319-338.
14