Class 9 Lecture Demonstration Lab

Spring 2024 | UENV3200, UURB3210

Published

March 28, 2024

Mapping Vulnerability + Spatial Statistics

Mapping Vulnerability Model

Part I: Explore SVI vulnerability dataset + Histogram and Statistic Tool

Note

The direct download of the CDC SVI dataset has been reprojected to an equidistant projected coordinate system (USA_Contiguous_Equidistant_Conic) in order to run distance-based statistical tests in this demonstration lab.

To start, we will access the latest 2020 SVI dataset for US Counties:

CDC SVI Data Download Location

SOVI Model

SOVI Model - Themes Structure

As discussed in the 2020 CDC SOVI Model documentation, the dataset’s theme structure is scored based on a percentile rank. Percentile scoring highlights the ‘below’ values for any percentile position within the dataset.

A percentile is a comparison score between a particular score and the scores of the rest of a group. It shows the percentage of scores that a particular score surpassed.

Percentile example using data histogram

Relative to a quartile statistical approach, the following statements can be made about the percentile approach to scoring:

  • The 25th percentile is also called the first quartile.
  • The 50th percentile is generally the median.
  • The 75th percentile is also called the third quartile.
  • The difference between the third and first quartiles is the interquartile range.

With ArcCatalog open and pointed to the datasource SVI2020_US_COUNTY, first map on RPL_THEMES and explore the percentile approach in the mapping:

RPL_THEMES mapped to 10 classes on Quartile Classification Method = Percentiles

RPL_THEMES mapped to 10 classes + Histogram

Note that the statistical summary for RPL_THEMES shows this variable is normalized 0-1. Note the following CDC SOVI methods statement for its scoring:

CDC SOVI Ranking Method

Part II: Min - Max Normalization

Following a statistical exploration of the RPL_THEMES, we will turn to the variable EP_AGE17 - percentage of persons age 17 and younger, estimated. Note that this is not the absolute value population; it is already process through a lower-level normalization. Run the Geostatistical Analyst Histogram tool on variable EP_AGE17:

EP_AGE17 variable EP_AGE17 variable statistical summary

From the Statistical Summary, the attribute EP_AGE17 has the following minimum and maximum values:

  • MIN = 0.2
  • MAX = 42.7

The universe of values can be anywhere from 0.2 upwards to 42.7.

A typical approach to normalize an array of values as found in EP_AGE17 is the technique Min - Max Normalization.

Min-Max normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1.

The following equation is utilized for this method of normalization:

Min Max Normalization Equation

Using the formula, the following equation can be applied to a new column in Field Calculator titled MM_17, type double:


Formula = X new = (X – X min) / (X max – X min)

Translated to the EP_AGE17 attribute:


([EP_AGE17]-0.2)/(42.7-0.2)

The Min - Max Normalization can also be performed for a custom range, i.e. 1 > 10 as opposed to 0 > 1 as we have done thus far. The formula for this normalization is as follows:

Min Max Normalization Equation for custom range

a and b are the min-max values of the attribute array.

Part IV - Clusters of Similarity

The purpose of normalization is typically to normalize multiple variables across different measurements in order to create consistency in a final scoring mechanism. We can hand over a final scoring process to a clustering algorithm that will ‘compare’ or ‘evaluate’ the values in one variable column to those of other variable columns, resulting in ‘similar’ groupings or clusters. The disadvantage of this approach is that the resulting clusters don’t show ‘directionality’ in the resulting map; that is, we can’t see in the categorical result of the clustering process the type of relationship of the variables in the clusters, just that the resulting clusters are of similar pattern between the variables inside the cluster and dissimilar outside the cluster.

With clustering - also known as group analysis, we may want to find cluster patterns between two variables as follows:

  • Housing in structures with 10 or more units - EP_MUNIT
  • Persons aged 17 and younger - EP_AGE17

In ArcGIS there are robust clustering tools that utilizes several algorithms. For the demonstration we will use the Grouping Analysis tool:

Attribute Based Clustering

Next, Populate the tool with the following parameters:

Attribute Based Clustering Parameters

Attribute Based Clustering Parameters at tool interface

Group Analysis Results

Box Plot Directionality

Part V - Hot Spots and Cold Spots

The phrase Hot Spot Analysis refers to a series of statistical methods to determine clustering of ‘like’ values - both low and high values. This can help find spatially and statistically significant clusters of similar valued geographies.

Hot Spot Analysis evaluates where either high or low values cluster spatially within a spatial dataset. These tools works by looking at each feature within the context of neighboring features. A feature with a high value is interesting but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well; the converse is true for cold spots with low values.

To start, navigate to the Optimized Hot Spot Analysis tool:

Optimized Hot Spot Analysis Tool

Next, populate the tool as follows:

Optimized Hot Spot Analysis Tool Parameters

Optimized Hot Spot Analysis Results based on EP_AGE17

References