Point Patterns (Point Data)

Report 0 Downloads 253 Views
Lecture #11

Nov.18, 2011 Spatial Data Analysis - Spatial Statistics  Spatial Statistics  Point Pattern Analysis  Area Pattern Analysis

Exam **rulers and coloured pens – pencil AND pens! Spatial Pattern and Relationships  geographers study settlement patterns, land-use patterns, drainage patters, etc.  ‘pattern’ implies some form of spatial regularity which is taken as a sign of a regular ‘process’ at work  we may also be interested in the attributes (e.g., tree species type) that are attached to points  spatial arrangement or distribution of objects/events/cases (represented by points or areas) is of interest yet are often difficult to describe (qualitatively)  today we are talking about how we use them quantitatively *** - patterns and relationships  so, how we can distinguish these patterns statistically so we can conclude that one is “significantly more clustered” and the other is “significantly more dispersed” without knowing anything else about these patterns? (We can test each pattern against a random point pattern  too clustered to have occurred by chance  too dispersed to have occurred by changed – so they are significantly random

Point Patterns (Point Data)  Ideal World:

Spatially Continuous Phenomena  these data can also be represented by point locations  a continuous measurement (e.g. soil nutrient concentration) attached to each point and this measurement could, in principle, be taken at any other location  the problem is not whether there is a pattern in locations; they are simply the points at which sample measurements were taken

1

 our interest is in understanding the pattern in the values at these locations  can perhaps use this understanding to predict values of that variable at other locations Types of Distortion  Three general patterns: > RANDOM: any point is equally likely to occur at any location and position of any point is not affected by the position of any other point. There is no apparent ordering of the distribution > UNIFORM, REGULAR, or DISPERSED: every point is as far from all of its neighbours as possible > CLUSTERED: many points are concentrated close together, and large areas that contain very few, if any, points Two Primary Approaches  POINT DENSITY approach using QUADRAT ANALYSIS based on observing the frequency distribution or density of points with a set of grid squares (density) 1. Variance to mean ratio approach 2. Frequency distribution comparison approach  POINT INTERACTION approach using NEAREST NEIGHBOUR ANAYLSIS based on distances of points one from another (interactions)  *** what is not just happening in one place, but its relationship with what is happening in another place > we know that things closer together often share commonalities then things farther apart Quadrat Analysis (QA): VMR  QA examines the frequency of points occurring in various parts of the study area  a uniform grid is laid over the study area and the number of points per quadrat are determined  treat each quadrat as an observation and count the number of points within it, to create the variable, x  the frequency count (the number of points occurring within each quadrat) is recorded  geographers always select portion of studies, but this can have a negative impact as this area can disclude an important factor and throw off statistics Quadrat Anaysis (QA)  Variance of dataset is subsequently calculated  the variance-mean ratio index (VMR) is then used to standardize the degree of variability in cell frequencies relative to the average cell frequency

2

 a random distribution would indicate that the variance and mean are the same  therefore, we would expect a variance mean ration around 1  values other than 1 would indicate a non-random distribution Diagram: Variance is 0 in dispersed, 2 in random, and 17 in clustered and mean always 2 Limitations of QA  results often depend on quadrat size and orientation  if the quadrats are too small, they may contain only a couple of points  if they are too large, they may contain too many points  an alternative is to test different sizes (or orientations) to determine the effects of each test on the results  it is a measure of dispersion, and not really pattern, because it is based primarily on the density of points, and not their arrangement in relation to one another For example, QA cannot distinguish between these two, obviously different, patterns  It results in a single measure for the entire distribution, so variations within the region are not recognized (could have clustering locally in some areas, but not overall). For example, overall pattern here is dispersed, but there are some local clusters Nearest Neighbor Analysis  developed by Clark and Evans (1954) for field work in Botany  compares the distances between nearest points and distances that would be expected on the basis of chance  ratio of these two statistics is used to generate an index value: > nearest neighbor distance > expected nearest neighbor distance based on a random distribution Interpreting the NNI  Index is an average distance from the closest neighbour to each point with a distance that would be expected on the basis of chance  if the observed average distance is the same as the mean random distance, than the ratio will be 1.0  if the observed distance is smaller than the mean random distance, the NNI is less that 1.0 (clustered)  if the observed distance is higher than the mean radon distance, the NNI is greater than 1.0 (dispersed)

3

Advantages of NNA over QA  Quadrat size problem is overcome  NNA takes distance into account  Problems associated with NNA:  related to the entire boundary size  must consider how to measure the boundary (arbitrary or some natural boundary)

Area Patterns Spatial Autocorrelation (SA)  first law of geography: “everything is related to everything else, but near things are more related than distant things” – Waldo Tobler (AKA: Tobler’s first law of geography)  Also called: Spatial Association or Spatial Dependence  Examples: rent, income, plant species type, weather, etc.  many of the things we studied as points, can be studied as area  SA: The Spatial Equivalent of Correlation (e.g. Pearson’s r)  looking at one variable and its relationship to itself  ex: do my neighbors share a similar income as myself  it measures the extent to which the occurrence of an event in an areal unit constrains, or makes more probable, the occurrence of an event in a neighboring areal unit (or grid cell)  spatial dependence suggests many statistical tools and inferences may be inappropriate  Three general possibilities: 1) Positive Autocorrelation  nearby locations are likely to be similar to one another 2) Negative Autocorrelation  observations from nearby observations are likely to be different from one another 3) Zero Autocorrelation 4

 no spatial effect is discernible, and observations seem to vary randomly through space  spatial autocorrelation (SA): the degree to which a variable is correlated with itself through space  measure chosen depends on level of measurement of your data  High SA (positive-clustered), Low SA (negative –dispersion), No SA (random) Why is SA important?  many statistics are based on the assumption that the values of observations in each sample are independent of one another  positive spatial autocorrelation may violate this, if the samples were taken from nearby areas  goals of spatial autocorrelation:  measure the strength of spatial autocorrelation in an area  to test the assumption of independence or randomness Measures of Spatial Autocorrelation (Joint-Count Statistic)  Use nominal-scale (categorical data for areas of two types (e.g. political units, ecological boundaries)  these could be used to represent many different types of geographical data  Joint-count statistics can be used to study:  electoral data (liberal vs. other)  the spatial arrangement of arable vs. non-arable land; or  parshes exhibiting population growth or decline could also be measured by join counts ** IT is not about Joint Counting – it is JOINT COUNT STATISTIC *** Joint-Count Statistic  map of binary data (0, 1) or (Black, White) – this may require reclassification of your data into two categories (0,1)  count the number of times a join corresponds to similarity or dissimilarity  J ww – cluster of cases  J BW – dispersion  J BB – cluster of non-cases  compare observed with expected joins  example, urban versus rural regions  what you do is create an adjacency matrix

5

Mapping Heterogeneity  initially, each pixel is assigned a spatial autocorrelation Ex) homegenity vs. heterogeneity classes

6