9. Exploratory Data Analysis#
9.1. Basic Statistical Graphs#
9.1.1. Univariate Distribution#
Histogram
Bin / bandwidth
Box plot
9.1.2. Bivariate Distribution#
Scatter plot
Lowess Smoother
Local Regression
Slope based on a subset of observations
For each \(x_i\),\(y_i\), fit based on (x,y) with x in neighborhood of \(x_i\)
Choice of bandwidth
Short
bandwidth
yields spiky curveWide
bandwidth
yields smoother curve
9.2. Spatial Heterogeneity#
Structural breaks
in the data with a spatial imprintDifferent distributions in different subregions a. Different
mean
,median
Change in bivariate linear relationship
Structural beak in slope (
Chow Test
)
9.3. Tools for Spatial Heterogeneity Analysis (Bivariate)#
9.3.1. Averages Chart#
Test on difference in mean
Selected and Unselected
Spatial selection
Test is
F-statistic
from dummy variable regressionTest the coefficient of dummy variable is significant or not
9.3.2. Brushing the Scatter Plot#
A brush is a selection shape
Two slopes:
selected
andunselected
As the brush moves, the slopes are recalculated in a dynamic way =
dynamic brushing
The matching observations in other windows are also selection
Dynamic brushing and linking
Chow Test on Homogeneity of slopes
Hypothesis test on equality of slopes
Overall regression slope
Slope for selected
Slope for unselected
Link map brushing with Chow Test
Insight into spatial heterogeneity
9.4. Multivariate EDA#
9.4.1. Objective#
Represent multi-dimensional data in two dimensions i. Dimension reduction ii. Projection
Discover Structure, interaction, patterns
Exploratory methods do not explain, it suggest
hypotheses
9.5. Methods for multivariate EDA#
9.5.1. Bubble Chart#
e.g. kids in family vs public assistants number vs high rent (sd)
Enhanced Scatter Plot
Size of bubble : third variable
Color of bubble: Fourth variable
Explore interaction among variables
Explore structural breaks
9.5.2. 3D Scatter Plot#
Points in a 3D data cube
Two-Dimensional analysis on side panels
Issues of perspective
Zooming, Rotating
Brushing the 3D data Cube
9.5.3. Parallel coordinate plot(PCP)#
Variables
One parallel line for each variable
Outliers are far from the main pack
Observations
A line connecting points on parallels
The line is the counterpart of a point in the multidimensional data cube
9.5.4. Conditional Plots#
Interpretation of Conditional Plots
Micro plots are similar, no effect of conditioning variables
Micro plots are different, effect of conditioning variables