Visualizing Spatial Autocorrelation

12. Visualizing Spatial Autocorrelation#

Moran's I is slope in a regression of $\sum_{j} w_{i j} z_{j}$ on $z_{i}$

$\sum_{j} w_{i j} z_{j}$ is the independent variable in this regression, called spatial lag
The x-axis is value at each location, the y axis is spatial lag (weighted average of neighboring values)

Based on 4 quadrants / Relative to mean
Upper right and lower left are positive spatial autocorrelation
- Clusters of like values
- Locations are similar to their neighbors
Lower right and upper left are negative spatial autocorrelation
- Spatial outliers
- Locations are different from their neighbors

Use local regression (LOWESS) as a nonlinear smoother
Discover structural breaks in global spatial autocorrelation
- Areas of high and low (or no) spatial autocorrelation
- A form of spatial heterogeneity

Moran's I plot is about Cross-product statistics of pair of observations, now we consider about non-parametric approach.

Calculate the cross-product (covariance / auto-covariance) of each pair, and plot it across the distance

ρ_{i j} = ρ (z_{i}, z_{j}) = \frac{\hat{z_{i}} * \hat{z_{j}}}{\frac{1}{n} \sum_{n} (z_{n} - z_{m})^{2}}

$\hat{Z_{i}}$ : deviations from the mean
$\frac{n (n - 1)}{2}$ individual values of $ρ_{i j}$ (unique pair from $n$ elements)

Fit the function of $ρ * i j = g (d * i j)$
- Use kernel estimator / local regression
  - Depends on choice of kernel function and bandwidth
  - Values of the estimated $g (d_i j)$ do not necessarily result in a valid variance-covariance matrix
- When first hit 0, means how far the spatial interaction goes, the following is waving around 0, basically the noise
Problems
- When distance goes larger, the pair of observations decrease rapidly.
- These “high-leverage” points may distort the whole pattern
- Solution: Cut-off the distance by certain point

Plot geographical distance on the x-axis, and attribute distance on the y-axis

d_{i j} = {[(x_{i} - x_{j})^{2} + (y_{i} - y_{j})^{2}]}^{1 / 2}

v_{i j} = {[\sum_{k} (z_{k i} - z_{k j})^{2}]}^{1 / 2}

Too many points $(\frac{n (n - 1)}{2})$
- Smooth the scatter plot
Tobler’s law i. Attribute distance should increase with geographical distance
We can also calculate the attribute distance of multiple variables

Semi-variance $γ (s_{1}, s_{2})$ is half the average squared difference between the value at points $s_{1}$ and $s_{2}$ , it’s defined as

γ (s_{1}, s_{2}) = \frac{\sum_{v} (s_{1} - s_{2})^{2}}{2 V}

Fit the function $ρ (s * 1, s_{2}) = g (h)$
- $h$ represents the geographical distance
- The exponential variogram model

γ (h) = (s - n) (1 - \exp (- h / (r a))) + n 1 * (0, \infty) (h)

γ (h) = (s - n) ((\frac{3 h}{2 r} - \frac{h^{3}}{2 r^{3}}) 1 * (0, r) (h) + 1 * [r, \infty) (h)) + n 1 * (0, \infty) (h)

γ (h) = (s - n) (1 - \exp (- \frac{h^{2}}{r^{2} a})) + n 1_{(0, \infty)} (h)

Nugget $n$ : Due to the measurement error or spatial source variation of smaller distance than sample unit, the value at the same location might have a different value as well.
Sill $s$ : Limit of the variogram tending to infinity lag distances.
Range $r$ : The distance in which the difference of the variogram from the sill becomes negligible. indicates the range of spatial autocorrelation