Mapping and Spatial Analysis in R

A Tutorial for the UCL R User Group

Author

Justin Yang

Published

October 2, 2025

Overview

Spatial data analysis in R brings together several different types of tasks: reading data from a variety of formats, manipulating it to create the attributes you need, visualising it through maps, and applying statistical techniques that account for spatial structure. The aim of this tutorial is to introduce each of these steps in a way that is accessible to beginners while still being useful for those with some prior experience in R.

We will start by working with vector data (points, lines, and polygons), then move on to raster data (continuous surfaces such as elevation or temperature). Along the way, we will explore how to produce both static and interactive maps, and then carry out some introductory forms of spatial analysis, including tests for spatial autocorrelation and geostatistical techniques.

The tutorial does not assume prior experience with spatial analysis, but it does assume some familiarity with R. The examples are designed to be reproducible and to build a foundation that you can adapt to your own datasets. By the end, you should feel more confident about:

  • Importing and exploring spatial data in R.
  • Creating high-quality maps to visualise your data.
  • Understanding when and why spatial statistical methods are needed.
  • Applying some of the most commonly used exploratory and modelling techniques.
What You’ll Need
  • R and RStudio installed on your computer
  • Basic familiarity with R
  • Curiosity about spatial patterns in your research domain

1 Setup

1.1 Installing Required Packages

Code
install.packages(c(
  "sf",            # Vector data
  "terra",         # Raster data  
  "tmap",          # Thematic mapping
  "leaflet",       # Interactive maps
  "dplyr",         # Data manipulation
  "ggplot2",       # Plotting
  "rnaturalearth", # World data
  "spdep",         # Spatial statistics
  "gstat",         # Spatial interpolation
  "spatstat"       # Point pattern analysis
))

1.2 Loading Libraries and Settings

Code
library(sf)
library(terra)
library(tmap)
library(dplyr)
library(ggplot2)
library(leaflet)
library(rnaturalearth)

# Set tmap mode for static plots
tmap_mode("plot")

# Suppress scientific notation for cleaner output
options(scipen = 999)

1.3 Creating Sample Data

Code
# Create example cities dataset
cities <- data.frame(
  name = c("Toronto", "Montreal", "Vancouver", "Calgary", "Ottawa"),
  population = c(2930000, 1780000, 631000, 1336000, 994837),
  longitude = c(-79.38, -73.57, -123.12, -114.07, -75.70),
  latitude = c(43.65, 45.50, 49.28, 51.05, 45.42),
  province = c("Ontario", "Quebec", "British Columbia", "Alberta", "Ontario")
)

# Convert to spatial data
cities_sf <- st_as_sf(cities,
                      coords = c("longitude", "latitude"),
                      crs = 4326)  # WGS84 coordinate system

# Load world country data
world <- ne_countries(scale = "medium", returnclass = "sf")

# Quick preview of our data
print(cities_sf)
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.12 ymin: 43.65 xmax: -73.57 ymax: 51.05
Geodetic CRS:  WGS 84
       name population         province              geometry
1   Toronto    2930000          Ontario  POINT (-79.38 43.65)
2  Montreal    1780000           Quebec   POINT (-73.57 45.5)
3 Vancouver     631000 British Columbia POINT (-123.12 49.28)
4   Calgary    1336000          Alberta POINT (-114.07 51.05)
5    Ottawa     994837          Ontario   POINT (-75.7 45.42)

2 Understanding Spatial Data

2.1 What Makes Data “Spatial”?

Not all data is spatial, but a great deal of the information we work with has a geographic dimension. What makes data spatial is that each observation is tied to a location, either through coordinates (such as latitude and longitude) or through a defined area (such as a district, postcode, or country). This spatial reference allows us to place the data on a map and to consider relationships in terms of distance, proximity, and adjacency.

Tobler’s First Law of Geography

“Everything is related to everything else, but near things are more related than distant things.”

2.2 Spatial Data Detective

Your Task: For each dataset below, decide if it contains spatial information and explain how location might affect the analysis:

Dataset A: Customer purchase records with postal codes
Dataset B: Stock market prices over time
Dataset C: Twitter sentiment analysis with GPS coordinates
Dataset D: University enrolment numbers by year
Dataset E: Air quality readings from monitoring stations

Dataset A: Customer purchase records with postal codesSPATIAL

  • Spatial component: Postal codes can be geocoded to locations
  • Analysis opportunities: Market area analysis, customer clustering, accessibility studies, delivery optimisation

Dataset B: Stock market prices over timeNOT INHERENTLY SPATIAL

  • Why not: Time series data without geographic component
  • Could become spatial: If you had stock exchange locations, regional economic indicators, or investor geography

Dataset C: Twitter sentiment with GPS coordinatesSPATIAL

  • Spatial component: GPS coordinates provide exact locations
  • Analysis opportunities: Sentiment hotspots, geographic opinion mapping, event location analysis

Dataset D: University enrolment by yearNOT INHERENTLY SPATIAL

  • Why not: Just temporal trends without location
  • Could become spatial: If you added student home addresses, university locations, or regional enrollment patterns

Dataset E: Air quality monitoring stationsSPATIAL

  • Spatial component: Monitoring stations have fixed locations
  • Analysis opportunities: Pollution interpolation, exposure mapping, environmental justice analysis

2.3 Vector vs. Raster Data Models

Spatial data in R comes in two main forms: vector and raster. Vector data represents discrete objects such as points (e.g. hospital locations), lines (e.g. roads), or polygons (e.g. administrative boundaries). Raster data, by contrast, represents continuous surfaces such as temperature, elevation, or satellite imagery.

Code
# Vector data: Discrete features with precise boundaries
print("Vector data examples:")
[1] "Vector data examples:"
Code
print(paste("Cities (points):", nrow(cities_sf), "features"))
[1] "Cities (points): 5 features"
Code
print(paste("Countries (polygons):", nrow(world), "features"))
[1] "Countries (polygons): 242 features"
Code
# Visualise vector data
tm_shape(world) +
  tm_borders(col = "gray") +
  tm_shape(cities_sf) +
  tm_symbols(
    col = "red",
    size = "population",
    size.scale = tm_scale_continuous(values.scale = 2),
    title.size = "Population"
  ) +
  tm_title("Vector Data: Discrete Features")

Code
# Raster data: Continuous grid of values
# Create example elevation raster
elevation <- rast(
  ncols = 100,
  nrows = 100,
  xmin = -180,
  xmax = 180,
  ymin = -90,
  ymax = 90
)

# Simulate realistic elevation pattern
set.seed(123)
coords <- crds(elevation)
distances_to_mountains <- sqrt((coords[, 1] - 0)^2 + (coords[, 2] - 0)^2)
values(elevation) <- pmax(0, 3000 * exp(-distances_to_mountains / 30) +
                            rnorm(ncell(elevation), 0, 200))

plot(elevation, main = "Raster Data: Continuous Grid Values")

When to Use Vector vs. Raster

Vector data is ideal for:

  • Administrative boundaries (countries, provinces)
  • Infrastructure (roads, buildings, utilities)
  • Point locations (cities, sample sites, customers)
  • Discrete phenomena with clear boundaries

Raster data is ideal for:

  • Environmental variables (temperature, precipitation, elevation)
  • Satellite imagery and remote sensing data
  • Continuous phenomena across space
  • Modelling and analysis requiring regular grids

2.4 Coordinate Reference Systems (CRS)

2.4.1 What is a CRS?

A CRS defines how spatial data relates to real locations on Earth. Since Earth is round but maps are flat, we need mathematical transformations to project 3D coordinates onto 2D surfaces. CRS includes:

  • Geographic CRS: Uses latitude/longitude (e.g. WGS84, EPSG:4326)
  • Projected CRS: Uses a flat coordinate system (e.g. UTM, EPSG:32617)
  • Local CRS: Specific to a region (e.g. Statistics Canada Lambert, EPSG:3347)

Using CRS correctly is crucial because: - It ensures spatial data aligns correctly on maps - It allows accurate distance and area calculations - It enables spatial operations like buffering, intersection, and joins

Code
# Check the CRS of our data
cat("World countries CRS:\n")
World countries CRS:
Code
st_crs(world)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Code
cat("\nCities CRS:\n")

Cities CRS:
Code
st_crs(cities_sf)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        MEMBER["World Geodetic System 1984 (G2296)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Code
# Demonstrate projection effects
world_mercator <- st_transform(world, 3857)  # Web Mercator
world_mollweide <- st_transform(world, "+proj=moll")  # Mollweide

# Compare projections
mercator_map <- tm_shape(world_mercator) + tm_borders() +
  tm_title("Web Mercator\n(distorts area)")

mollweide_map <- tm_shape(world_mollweide) + tm_borders() +
  tm_title("Mollweide\n(preserves area)")

tmap_arrange(mercator_map, mollweide_map, ncol = 2)

Code
# For distance calculations, use projected CRS
toronto <- cities_sf[cities_sf$name == "Toronto", ]

# Wrong: distance in degrees (meaningless)
montreal <- cities_sf[cities_sf$name == "Montreal", ]
dist_degrees <- st_distance(toronto, montreal)
cat("Distance in degrees:", dist_degrees, "\n")
Distance in degrees: 503922.1 
Code
# Right: distance in meters
toronto_utm <- st_transform(toronto, 32617)  # UTM Zone 17N
montreal_utm <- st_transform(montreal, 32617)
dist_meters <- st_distance(toronto_utm, montreal_utm)
cat("Distance in meters:", round(as.numeric(dist_meters) / 1000), "km\n")
Distance in meters: 506 km
Key CRS considerations

2.5 When to use different CRS

  • 4326 (WGS84): Global data, web mapping, data storage
  • Projected CRS: Measurements, area calculations, buffers
  • Local CRS: Region-specific analysis for maximum accuracy

2.6 Common transformations

  • st_transform(data, 4326) - Convert to lat/lng
  • st_transform(data, 3857) - Convert to Web Mercator
  • st_transform(data, 32617) - Convert to UTM (Eastern North America)

3 Mapping

3.1 The Grammar of Maps

Just as ggplot2 uses a grammar of graphics, tmap provides a grammar of maps with consistent syntax for different map elements. This allows you to build complex maps step by step, adding layers and customising styles in a structured way.

3.2 Key Functions

  • tm_shape(): Defines the spatial data to be plotted
  • tm_polygons(): Adds polygon layers (e.g. countries, regions)
  • tm_lines(): Adds line layers (e.g. roads, rivers)
  • tm_symbols(): Adds point layers (e.g. cities, landmarks)
  • tm_text(): Adds text labels to features
  • tm_layout(): Customises overall map appearance (title, legend, etc.)

3.3 Basic Mapping

Objective: Create your first spatial visualisation

Code
# Start simple
plot(world["pop_est"])  # Base R quick plot

Code
# Better with tmap
tm_shape(world) +
  tm_polygons(
    "pop_est",
    fill.legend = tm_legend("Population"),
    fill.scale = tm_scale("viridis")
  ) +
  tm_title("World Population 2020")

3.4 Mapping Challenge

Choose your experience level:

Objective: Create a choropleth map with custom styling

Code
# Create choropleth map of GDP
tm_shape(world) +
  tm_polygons(
    fill = "gdp_md",
    fill.scale = tm_scale_intervals(values = "brewer.yl_or_rd", style = "quantile"),
    fill.legend = tm_legend(title = "GDP (millions USD)")
  ) +
  tm_title("World GDP Distribution", position = tm_pos_out("center", "top"))

Key Concepts Learned:

  • tm_shape() + tm_polygons() syntax
  • Classification methods (style = "quantile")
  • Color palettes (palette = "YlOrRd") - Layout customisation

Objective: Add multiple layers and custom styling

Code
# Enhanced map with multiple elements
tm_shape(world) +
  tm_polygons(
    fill = "gdp_md",
    fill.scale  = tm_scale_intervals(
      values = "-brewer.rd_yl_bu",
      midpoint = NA,
      style = "fisher",
      n = 5
    ),
    fill.legend = tm_legend(title = "GDP (millions USD)"),
    col  = "white",
    lwd  = 0.5
  ) +
  tm_shape(cities_sf) +
  tm_symbols(
    size        = "population",
    size.scale  = tm_scale_continuous(values.scale = 2),
    size.legend = tm_legend(title = "City Population"),
    fill        = "red",
    fill_alpha  = 0.7,
    col         = "white",
    lwd         = 0.3
  ) +
  tm_text(text = "name", ymod = 1.2, size = 0.7) +
  tm_title("World GDP with Major Cities") +
  tm_layout(frame = FALSE,
            legend.position = c("left", "bottom"))

Key Concepts Learned:

  • Multiple shape layers
  • Different classification methods
  • Symbol scaling by attributes
  • Text labels and positioning

Objective: Create faceted maps with complex styling

Code
# Create continental analysis
world_analysis <- world %>%
  dplyr::filter(
    !is.na(gdp_md),
    !is.na(continent),
    continent != "Seven seas (open ocean)"
  ) %>%
  dplyr::mutate(
    continent = dplyr::case_when(
      continent == "North America" ~ "N. America",
      continent == "South America" ~ "S. America",
      TRUE ~ continent
    ),
    gdp_per_capita = (gdp_md * 1e6) / pop_est
  )

# Faceted map by continent (v4)
tm_shape(world_analysis) +
  tm_polygons(
    fill = "gdp_per_capita",
    fill.scale  = tm_scale_intervals(
      values = "viridis",
      style = "fisher"
    ),
    fill.legend = tm_legend(title = "GDP per capita\n(USD)"),
    col  = "white"   # (v4: border.col -> col)
  ) +
  tm_facets(by = "continent", free.coords = TRUE, ncol = 3) +
  tm_title("GDP per capita by continent") +
  tm_layout(
    panel.label.size = 1.2,
    panel.label.bg.color = "black",
    panel.label.color = "white"
  )

Code
# Summary statistics (unchanged)
gdp_summary <- world_analysis %>%
  sf::st_drop_geometry() %>%
  dplyr::group_by(continent) %>%
  dplyr::summarise(
    countries = dplyr::n(),
    mean_gdp_pc = mean(gdp_per_capita, na.rm = TRUE),
    median_gdp_pc = median(gdp_per_capita, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::arrange(dplyr::desc(mean_gdp_pc))

print(gdp_summary)
# A tibble: 7 × 4
  continent  countries mean_gdp_pc median_gdp_pc
  <chr>          <int>       <dbl>         <dbl>
1 Antarctica         1     200000        200000 
2 Europe            50      37976.        29710.
3 N. America        38      23298.        16190.
4 Asia              53      14810.         5476.
5 S. America        13      14704.         6978.
6 Oceania           25      13553.         6174.
7 Africa            54       2125.         1337.

Key Concepts Learned:

  • Data transformation for mapping
  • Faceted maps for comparative analysis
  • Advanced layout customisation
  • Integrating statistical summaries

4 Spatial Operations

We’ll focus on three fundamental spatial operations that solve most spatial analysis problems:

  1. Buffer: “What’s within X distance?”
  2. Intersect: “What overlaps with what?”
  3. Spatial Join: “How can we connect spatial features based on location?”

These operations allow us to manipulate and analyse spatial data effectively, answering questions about proximity, overlap, and relationships between different spatial features.

Code
# Load Canadian administrative boundaries
canada <- world[world$name == "Canada", ]
provinces <- ne_states(country = "canada", returnclass = "sf")

# 1. BUFFERS - Create 100km zones around cities
cities_utm <- st_transform(cities_sf, crs = 3347)  # Project to UTM for accurate distances
city_buffers <- st_buffer(cities_utm, dist = 100000)  # 100km

# 2. INTERSECTIONS - Find which provinces contain cities
cities_provinces <- st_intersection(cities_sf, provinces)

# 3. SPATIAL JOINS - Add province information to cities
cities_with_provinces <- st_join(cities_sf, provinces)

# Visualise results
tm_shape(provinces) +
  tm_borders(col = "gray") +
  tm_shape(st_transform(city_buffers, 4326)) +
  tm_borders(col = "blue", lwd = 2) +
  tm_fill(col = "blue", fill_alpha = 0.2) +
  tm_shape(cities_sf) +
  tm_symbols(col = "red", size = 1) +
  tm_text("name", ymod = 1.2) +
  tm_title("Cities with 100km Buffers")

5 Spatial Relationships

Spatial predicates are functions that test relationships between geometries, returning TRUE or FALSE. Think of them as questions you can ask about spatial features: “Does this city lie within this county?” or “Do these roads intersect?” These simple questions become powerful analytical tools when applied systematically across datasets.

In R, we can test these relationships using functions from the sf package. Here are some common spatial predicates:

  • st_intersects(): “Do these features overlap in any way?” (broadest, most commonly used)
  • st_within(): “Is feature A completely inside feature B?” (strict containment)
  • st_contains(): Reverse of within - “Does feature A completely contain feature B?”
  • st_touches(): “Do features share a boundary but not interior space?” (adjacent polygons)
  • st_crosses(): “Do lines intersect?” (roads crossing administrative boundaries)
  • st_overlaps(): “Do features share some but not all space?” (partially overlapping regions)

5.1 Spatial Predicates

Objective: Learn to test and understand spatial relationships

Code
# Load Canadian provinces for relationship testing
provinces <- ne_states(country = "canada", returnclass = "sf")

# Test different spatial relationships
cat("Spatial Relationship Tests:\n")
Spatial Relationship Tests:
Code
# Which cities are within provinces?
within_results <- st_within(cities_sf, provinces)
cat("Cities within provinces:\n")
Cities within provinces:
Code
for (i in 1:nrow(cities_sf)) {
  province_idx <- within_results[[i]]
  if (length(province_idx) > 0) {
    cat("  ", cities_sf$name[i], "is within", provinces$name[province_idx], "\n")
  } else {
    cat("  ",
        cities_sf$name[i],
        "is not within any province (likely boundary/water)\n")
  }
}
   Toronto is within Ontario 
   Montreal is within Québec 
   Vancouver is within British Columbia 
   Calgary is within Alberta 
   Ottawa is within Ontario 
Code
# Test intersections (broader than within)
intersects_results <- st_intersects(cities_sf, provinces)
cat("\nIntersection test (should match within for points):\n")

Intersection test (should match within for points):
Code
cat("Cities intersecting provinces:",
    sum(lengths(intersects_results) > 0),
    "out of",
    nrow(cities_sf),
    "\n")
Cities intersecting provinces: 5 out of 5 
Code
# Create buffers to test more relationships
cities_projected <- st_transform(cities_sf, 3347)  # Canada Lambert
city_buffers <- st_buffer(cities_projected, dist = 50000)  # 50km buffers
city_buffers_geo <- st_transform(city_buffers, 4326)

# Test overlaps between buffers
overlap_matrix <- st_overlaps(city_buffers_geo)
cat("\nCity buffer overlaps:\n")

City buffer overlaps:
Code
for (i in 1:nrow(cities_sf)) {
  overlapping <- overlap_matrix[[i]]
  if (length(overlapping) > 0) {
    overlapping_names <- cities_sf$name[overlapping]
    cat(
      "  ",
      cities_sf$name[i],
      "buffer overlaps with:",
      paste(overlapping_names, collapse = ", "),
      "\n"
    )
  }
}

# Visualize relationships
tm_shape(provinces) +
  tm_borders(col = "gray") +
  tm_shape(city_buffers_geo) +
  tm_borders(col = "blue", lwd = 2) +
  tm_fill(col = "blue", fill_alpha = 0.2) +
  tm_shape(cities_sf) +
  tm_symbols(col = "red", size = 1) +
  tm_text("name", ymod = 1.2) +
  tm_title("Spatial Relationships: Cities, Buffers, and Provinces")

5.2 CRS in Spatial Operations

Coordinate Reference Systems (CRS) are crucial for spatial operations. If your data is in geographic coordinates (degrees), operations like buffering or distance calculations will yield incorrect results. Always project your data to an appropriate CRS before performing spatial operations.

Code
# Demonstrate CRS effects on buffers
toronto <- cities_sf[cities_sf$name == "Toronto", ]

# Wrong: Buffer in geographic coordinates (degrees)
buffer_degrees <- st_buffer(toronto, dist = 0.1)  # 0.1 degrees
area_degrees <- st_area(buffer_degrees)

# Right: Buffer in projected coordinates (meters)
toronto_projected <- st_transform(toronto, 3347)  # Statistics Canada Lambert
buffer_meters <- st_buffer(toronto_projected, dist = 11000)  # 11km (approximately 0.1 degrees at Toronto latitude)
area_meters <- st_area(buffer_meters)

# Transform back for comparison
buffer_meters_geo <- st_transform(buffer_meters, 4326)

cat("Buffer Comparison:\n")
Buffer Comparison:
Code
cat("Geographic buffer area:",
    round(as.numeric(area_degrees), 2),
    "square degrees\n")
Geographic buffer area: 0.03 square degrees
Code
cat("Projected buffer area:",
    round(as.numeric(area_meters) / 1000000, 2),
    "square kilometers\n")
Projected buffer area: 379.96 square kilometers
Code
# Visualize the difference
comparison_map <- tm_shape(buffer_degrees) +
  tm_borders(col = "red", lwd = 3) +
  tm_fill(fill = "red", fill_alpha = 0.3) +
  tm_shape(buffer_meters_geo) +
  tm_borders(col = "blue", lwd = 3) +
  tm_fill(fill = "blue", fill_alpha = 0.3) +
  tm_shape(toronto) +
  tm_symbols(col = "black", size = 2) +
  tm_title("Buffer Comparison: Geographic (red) vs Projected (blue)") +
  tm_layout(legend.show = FALSE)

print(comparison_map)

Code
# Distance calculation comparison
montreal <- cities_sf[cities_sf$name == "Montreal", ]

# Geographic distance (ellipsoidal - accurate)
dist_geographic <- st_distance(toronto, montreal)

# Projected distance
montreal_projected <- st_transform(montreal, 3347)
dist_projected <- st_distance(toronto_projected, montreal_projected)

cat("\nDistance Calculations:\n")

Distance Calculations:
Code
cat("Geographic CRS distance:", round(as.numeric(dist_geographic) / 1000, 1), "km\n")
Geographic CRS distance: 503.9 km
Code
cat("Projected CRS distance:", round(as.numeric(dist_projected) / 1000, 1), "km\n")
Projected CRS distance: 514.8 km
Code
cat("Difference:", round(abs(
  as.numeric(dist_geographic) - as.numeric(dist_projected)
) / 1000, 1), "km\n")
Difference: 10.9 km

5.3 Real-World Problem Solving

Scenario: A coffee chain wants to expand in Canada while avoiding oversaturated markets.

Business Rules:

  • Don’t locate within 25km of existing competitors
  • Prioritise high-population areas
  • Consider demographic and economic factors

Your Tasks:

  1. Novice:
    • Create exclusion zones around existing competitors
    • Identify cities outside these zones
    • Visualise suitable locations
  2. Intermediate:
    • Complete novice tasks
    • Calculate market opportunity metrics
    • Rank locations by attractiveness
  3. Advanced:
    • Complete intermediate tasks
    • Build multi-criteria decision model
    • Optimise network of new locations

Data Setup:

Code
# Simulate existing competitor locations
set.seed(456)
competitors <- cities_sf[sample(nrow(cities_sf), 3), ] %>%
  mutate(brand = "Competitor Coffee")

print("Existing competitor locations:")
[1] "Existing competitor locations:"
Code
print(competitors[c("name", "brand")])
Simple feature collection with 3 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.12 ymin: 43.65 xmax: -75.7 ymax: 49.28
Geodetic CRS:  WGS 84
       name             brand              geometry
5    Ottawa Competitor Coffee   POINT (-75.7 45.42)
1   Toronto Competitor Coffee  POINT (-79.38 43.65)
3 Vancouver Competitor Coffee POINT (-123.12 49.28)

Step 1: Create Exclusion Zones

Code
# Project to appropriate CRS for Canada (meters)
canada_crs <- 3347  # Statistics Canada Lambert
cities_proj <- sf::st_transform(cities_sf, canada_crs)
competitors_proj <- sf::st_transform(competitors, canada_crs)

# Create 25 km exclusion zones (in metres)
exclusion_zones <- sf::st_buffer(competitors_proj, dist = 25000)

# Visualise exclusion zones
tm_shape(st_as_sfc(st_bbox(
  c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326
))) + tm_polygons(col = NA, fill_alpha = 0) +
  tm_shape(st_transform(exclusion_zones, 4326)) +
  tm_borders(col = "red", lwd = 2) +
  tm_fill(fill = "red", fill.alpha = 0.3) +
  tm_shape(cities_sf) +
  tm_symbols(
    size        = "population",
    size.scale  = tm_scale_continuous(values.scale = 2),
    size.legend = tm_legend(title = "Population"),
    fill        = "blue",
    col         = "white",
    lwd         = 0.5
  ) +
  tm_shape(competitors) +
  tm_symbols(size = 1.5,
             shape = 4,
             col = "red") +
  tm_title("Coffee Shop Expansion: Exclusion Zones") +
  tm_view(bbox = st_bbox(c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326)) 

Step 2: Find Available Cities

Code
# Find cities outside exclusion zones
exclusion_union <- st_union(exclusion_zones)
available_cities <- st_difference(cities_proj, exclusion_union)

cat("Cities available for expansion:", nrow(available_cities))
Cities available for expansion: 2
Code
cat("\nCities excluded:", nrow(cities_proj) - nrow(available_cities))

Cities excluded: 3
Code
# Show available cities
available_cities_wgs84 <- st_transform(available_cities, 4326)
print(available_cities_wgs84[c("name", "population")])
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -114.07 ymin: 45.5 xmax: -73.57 ymax: 51.05
Geodetic CRS:  WGS 84
      name population              geometry
2 Montreal    1780000   POINT (-73.57 45.5)
4  Calgary    1336000 POINT (-114.07 51.05)

Step 3: Market Opportunity Analysis

Code
# Calculate distance to nearest competitor for each available city
distances_to_competitors <- st_distance(available_cities, competitors_proj)
min_competitor_distance <- apply(distances_to_competitors, 1, min)

# Create market opportunity metrics
available_cities <- available_cities %>%
  mutate(
    distance_to_competitor = as.numeric(min_competitor_distance),
    market_opportunity = population / (distance_to_competitor / 1000),  # Pop per km
    opportunity_rank = rank(-market_opportunity)
  ) %>%
  arrange(opportunity_rank)

# Show rankings
market_analysis <- available_cities %>%
  st_drop_geometry() %>%
  select(name, population, distance_to_competitor, market_opportunity, opportunity_rank) %>%
  mutate(distance_to_competitor = round(distance_to_competitor/1000, 1))

print("Market Opportunity Rankings:")
[1] "Market Opportunity Rankings:"
Code
print(market_analysis)
      name population distance_to_competitor market_opportunity
1 Montreal    1780000                  169.4          10509.845
2  Calgary    1336000                  672.3           1987.064
  opportunity_rank
1                1
2                2

Step 4: Visualization with Rankings

Code
# Create opportunity map
tm_shape(st_as_sfc(st_bbox(
  c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326
))) + tm_polygons(col = NA, fill_alpha = 0) +
  tm_shape(sf::st_transform(available_cities, 4326)) +
  tm_symbols(
    fill        = "market_opportunity",
    fill.scale  = tm_scale_continuous(values = "viridis"),
    fill.legend = tm_legend(title = "Market\nOpportunity"),
    size        = "population",
    size.scale  = tm_scale_continuous(values.scale = 3),
    size.legend = tm_legend(title = "Population"),
    col         = "white",
    lwd         = 0.3
  ) +
  tm_shape(competitors) +
  tm_symbols(col = "red",
             size = 2,
             shape = 4) +
  tm_title("Coffee Chain Expansion Opportunities") +
  tm_view(bbox = st_bbox(c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326))

Step 5: Multi-Criteria Decision Analysis

Code
# Simulate additional business criteria
set.seed(789)
available_cities_enhanced <- available_cities %>%
  mutate(
    # Simulated demographic/economic data
    median_income = rnorm(n(), 75000, 15000),
    young_adult_pct = runif(n(), 15, 35),  # % population 20-35
    business_density = rpois(n(), 25),
    foot_traffic_index = runif(n(), 0.3, 1.0),
    
    # Normalise all criteria to 0-1 scale
    pop_score = scales::rescale(population),
    income_score = scales::rescale(median_income),
    youth_score = scales::rescale(young_adult_pct),
    business_score = scales::rescale(business_density),
    traffic_score = scales::rescale(foot_traffic_index),
    distance_score = scales::rescale(distance_to_competitor),
    
    # Weighted composite score (weights sum to 1.0)
    composite_score = (
      pop_score * 0.25 +      # Population weight
      income_score * 0.20 +   # Income weight  
      youth_score * 0.20 +    # Youth demographic weight
      business_score * 0.15 + # Business environment weight
      traffic_score * 0.15 +  # Foot traffic weight
      distance_score * 0.05   # Distance from competitors weight
    )
  ) %>%
  arrange(desc(composite_score))

# Show comprehensive analysis
decision_matrix <- available_cities_enhanced %>%
  st_drop_geometry() %>%
  select(name, population, median_income, young_adult_pct, 
         business_density, composite_score) %>%
  mutate(
    rank = row_number(),
    median_income = round(median_income),
    young_adult_pct = round(young_adult_pct, 1),
    composite_score = round(composite_score, 3)
  )

print("Multi-Criteria Decision Analysis:")
[1] "Multi-Criteria Decision Analysis:"
Code
print(decision_matrix)
      name population median_income young_adult_pct business_density
1 Montreal    1780000         82861            24.8               25
2  Calgary    1336000         41088            15.4               23
  composite_score rank
1             0.8    1
2             0.2    2

Step 6: Network Optimisation

Code
# Simple greedy optimisation for selecting multiple locations
optimize_network <- function(candidates, n_stores = 3, min_distance = 50000) {
  if(nrow(candidates) == 0) return(candidates[0, ])
  
  # Start with highest-scoring location
  selected <- candidates[1, ]
  remaining <- candidates[-1, ]
  
  while(nrow(selected) < n_stores && nrow(remaining) > 0) {
    # Calculate distances from remaining to all selected
    distances <- st_distance(remaining, selected)
    min_distances <- apply(distances, 1, min)
    
    # Filter candidates far enough from existing stores
    valid_candidates <- remaining[min_distances > min_distance, ]
    
    if(nrow(valid_candidates) > 0) {
      # Select highest-scoring valid candidate
      next_store <- valid_candidates[1, ]
      selected <- rbind(selected, next_store)
      remaining <- remaining[!remaining$name %in% next_store$name, ]
    } else {
      break
    }
  }
  
  return(selected)
}

# Find optimal network of 3 new stores
optimal_network <- optimize_network(available_cities_enhanced, n_stores = 3)

print("Optimal Network Strategy:")
[1] "Optimal Network Strategy:"
Code
print(optimal_network %>% 
  st_drop_geometry() %>%
  select(name, population, composite_score) %>%
  mutate(composite_score = round(composite_score, 3)))
      name population composite_score
1 Montreal    1780000             0.8
2  Calgary    1336000             0.2
Code
# Visualise final strategy
tm_shape(st_as_sfc(st_bbox(
  c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326
))) +
  tm_polygons(col = NA, fill_alpha = 0) +
  tm_shape(st_transform(optimal_network, 4326)) +
  tm_dots(col = "green",
          size = 2,
          shape = 19) +
  tm_text("name",
          ymod = 1.5,
          col = "green",
          size = 1.2) +
  tm_shape(competitors) +
  tm_dots(col = "red",
          size = 2,
          shape = 4) +
  tm_text("name", ymod = -1.5, col = "red") +
  tm_title("Optimal Coffee Chain Network Strategy") +
  tm_layout(legend.show = FALSE) +
  tm_view(bbox = st_bbox(c(
    xmin = -160,
    xmax = -40,
    ymin = 35,
    ymax = 85
  ), crs = 4326))


6 Raster Analysis

Raster data represents the world as a continuous surface divided into a grid of cells, or pixels, each of which stores a value. This is different from vector data, which represents discrete objects such as points, lines, and polygons. Rasters are best suited to variables that change gradually across space, such as elevation, rainfall, temperature, vegetation, or remotely sensed imagery from satellites.

Each raster has two key dimensions:

  • Resolution, which refers to the size of each cell. High-resolution rasters have small cells and show more detail, while low-resolution rasters have larger cells and generalise patterns.
  • Extent, which refers to the geographic area the raster covers.

Rasters can be single-layer, where each cell stores one value, or multi-layer, where each cell contains a stack of values representing different variables or time points. For example, a single raster might store average temperature for July 2024, while a multi-layer raster could store monthly averages for an entire year.

In practice, rasters are extremely powerful because they allow us to represent continuous phenomena, run cell-based calculations, and combine different surfaces through map algebra. For example, you might calculate slope from an elevation raster, estimate average rainfall within administrative boundaries, or overlay pollution and population rasters to identify at-risk areas.

6.1 Working with Elevation Data

Code
# Create realistic elevation raster for North America
set.seed(2024)
elevation <- rast(ncols = 200, nrows = 200, 
                  xmin = -140, xmax = -60, ymin = 25, ymax = 70)

# Simulate mountain ranges (Rockies, Appalachians)
coords <- crds(elevation)
mountain_ranges <- matrix(c(
  -115, 50,    # Canadian Rockies
  -110, 45,    # US Rockies  
  -105, 40,    # Colorado Plateau
  -80, 45,     # Appalachians North
  -85, 35      # Appalachians South
), ncol = 2, byrow = TRUE)

# Calculate elevation based on distance to mountain ranges
elevation_values <- numeric(ncell(elevation))
for(i in 1:nrow(mountain_ranges)) {
  center_x <- mountain_ranges[i, 1]
  center_y <- mountain_ranges[i, 2]
  
  distances <- sqrt((coords[,1] - center_x)^2 + (coords[,2] - center_y)^2)
  elevation_values <- elevation_values + 
    pmax(0, 3500 * exp(-distances/8) + rnorm(length(distances), 0, 150))
}

values(elevation) <- pmax(0, elevation_values)

# Basic visualisation
plot(elevation, main = "North American Elevation (simulated)")

6.2 Environmental Analysis Challenge

Scenario: Analyse environmental conditions for renewable energy development

Research Question: Where are the best locations for wind and solar energy development?

Criteria:

  • Wind energy: High elevation (>1000m), moderate slope (<20°), avoid extreme elevations (>3000m)
  • Solar energy: Moderate elevation (200-1500m), gentle slope (<10°), avoid very high latitudes

Tasks:

  1. All levels: Calculate slope from elevation
  2. All levels: Create suitability maps for wind and solar
  3. Intermediate+: Combine criteria and rank suitability
  4. Advanced: Calculate potential energy capacity by region

Step 1: Calculate Terrain Derivatives

Code
# Calculate slope from elevation
slope <- terrain(elevation, "slope", unit = "degrees")

# Basic visualisation
plot(slope, main = "Slope (degrees)")

Step 2: Define Suitability Criteria

Code
# Wind energy suitability
wind_elevation_ok <- (elevation > 1000) & (elevation < 3000)
wind_slope_ok <- slope < 20
wind_suitable <- wind_elevation_ok & wind_slope_ok

# Solar energy suitability  
solar_elevation_ok <- (elevation > 200) & (elevation < 1500)
solar_slope_ok <- slope < 10
solar_suitable <- solar_elevation_ok & solar_slope_ok

# Create combined suitability index
suitability_index <- wind_suitable * 1 + solar_suitable * 2
# Values: 0 = unsuitable, 1 = wind only, 2 = solar only, 3 = both

# Clean up - set unsuitable areas to NA for cleaner visualisation
suitability_index[suitability_index == 0] <- NA

# Make it categorical with labels so plot() can show a labeled legend
suitability_factor <- as.factor(suitability_index)
levels(suitability_factor) <- data.frame(
  ID   = c(1, 2, 3),
  label = c("Wind Only", "Solar Only", "Both")
)

plot(suitability_factor, 
     main = "Renewable Energy Suitability",
     col = c("yellow", "blue", "green"),
     legend = TRUE)

Step 3: Extract Values at City Locations

Code
# Extract elevation and slope at city locations
city_elevation <- extract(elevation, cities_sf)
city_slope <- extract(slope, cities_sf)
city_suitability <- extract(suitability_index, cities_sf)

# Combine with city data
cities_environmental <- cities_sf %>%
  mutate(
    elevation_m = city_elevation[,2],
    slope_deg = city_slope[,2],
    suitability = city_suitability[,2],
    suitability_type = case_when(
      is.na(suitability) ~ "Unsuitable",
      suitability == 1 ~ "Wind Only",
      suitability == 2 ~ "Solar Only", 
      suitability == 3 ~ "Both Suitable",
      TRUE ~ "Unknown"
    )
  )

# Show results
environmental_summary <- cities_environmental %>%
  st_drop_geometry() %>%
  select(name, elevation_m, slope_deg, suitability_type) %>%
  mutate(
    elevation_m = round(elevation_m),
    slope_deg = round(slope_deg, 1)
  )

print("Environmental Conditions at Major Cities:")
[1] "Environmental Conditions at Major Cities:"
Code
print(environmental_summary)
       name elevation_m slope_deg suitability_type
1   Toronto        4516       0.3       Unsuitable
2  Montreal        2200       0.5        Wind Only
3 Vancouver        2361       0.2        Wind Only
4   Calgary        4844       0.3       Unsuitable
5    Ottawa        3095       0.2       Unsuitable

Step 4: Advanced Analysis - Regional Capacity

Code
# Calculate suitable area by energy type
cell_area <- cellSize(elevation, unit = "km") 

# Wind suitable area
wind_area <- mask(cell_area, wind_suitable, maskvalue = FALSE)
total_wind_area <- global(wind_area, "sum", na.rm = TRUE)

# Solar suitable area  
solar_area <- mask(cell_area, solar_suitable, maskvalue = FALSE)
total_solar_area <- global(solar_area, "sum", na.rm = TRUE)

# Combined visualisation with cities
tm_shape(suitability_index) +
  tm_raster(
    col.scale  = tm_scale_categorical(
      values = c("gold", "blue", "darkgreen"),
      labels = c("Wind", "Solar", "Both")
    ),
    col.legend = tm_legend(title = "Energy Type")
  ) +
  tm_shape(cities_sf) +
  tm_symbols(fill = "white", col = "black", size = 0.8) +  # border.col -> col
  tm_text("name", size = 0.6, ymod = 1.2) +
  tm_title("Renewable Energy Potential in Canada")

Code
cat("Renewable Energy Development Potential:\n")
Renewable Energy Development Potential:
Code
cat("Wind suitable area:", round(total_wind_area$sum), "km²\n") 
Wind suitable area: 13668658 km²
Code
cat("Solar suitable area:", round(total_solar_area$sum), "km²\n")
Solar suitable area: 14187519 km²

7 Creating Maps for Communication

7.1 From Analysis to Communication

The best spatial analysis is meaningless if you can’t communicate your findings effectively. This section focuses on the transition from exploratory analysis to polished communication.

7.2 Three Approaches to Map Communication

Objective: Learn to match visualisation approach to communication goals

Purpose: Guide viewers through a specific narrative or argument

Code
# Create a story about urbanisation
world_urban <- world %>%
  filter(!is.na(pop_est)) %>%
  mutate(
    pop_category = case_when(
      pop_est < 5e6 ~ "Small (<5M)",
      pop_est < 25e6 ~ "Medium (5-25M)",
      pop_est < 100e6 ~ "Large (25-100M)",
      TRUE ~ "Mega (>100M)"
    ),
    pop_category = factor(
      pop_category,
      levels = c(
        "Small (<5M)",
        "Medium (5-25M)",
        "Large (25-100M)",
        "Mega (>100M)"
      )
    )
  )

# Story map with clear narrative
story_map <- tm_shape(world_urban) +
  tm_polygons(
    fill        = "pop_category",
    fill.scale  = tm_scale_categorical(values = c("lightblue", "yellow", "orange", "red")),
    fill.legend = tm_legend(title = "Population Category")
  ) +
  tm_title("The Growing World: Countries by Population Size") +
  tm_layout(
    title.size      = 1.4,
    legend.position = c("left", "bottom"),
    frame           = FALSE
  ) +
  tm_credits(
    "Four countries now exceed 100M people: China, India, USA, Indonesia",
    tm_pos_out = c("center", "bottom"),
    size = 0.8
  )

print(story_map)

Story Map Principles:

  • Clear, descriptive title tells the main message
  • Color scheme supports the narrative
  • Caption provides key insight
  • Simple legend that’s easy to interpret

Purpose: Let users explore data themselves

Code
pal_pop <- colorNumeric(palette = "viridis", domain = world_urban$pop_est)

interactive_dashboard <- leaflet(options = leafletOptions(worldCopyJump = FALSE, continuousWorld = FALSE)) %>%
  addTiles(options = tileOptions(noWrap = TRUE)) %>%
  
  # Polygons coloured by population estimate
  addPolygons(
    data = world_urban,
    fillColor = ~ pal_pop(pop_est),
    weight = 1,
    opacity = 1,
    color = "white",
    fillOpacity = 0.7,
    highlightOptions = highlightOptions(
      weight = 2,
      color = "black",
      bringToFront = TRUE
    ),
    label = ~ paste("Population:", pop_est),
    group = "Population"
  ) %>%
  
  # Points for major cities
  addCircleMarkers(
    data = cities_sf,
    radius = ~ sqrt(population) / 100,
    # same scaling as yours
    fillColor = "blue",
    fillOpacity = 0.7,
    stroke = FALSE,
    label = ~ paste(name, "<br>", "Population:", population),
    group = "Major Cities"
  ) %>%
  
  # Legend (reuse the same palette)
  addLegend(
    pal = pal_pop,
    values = world_urban$pop_est,
    title = "Population Estimate",
    position = "bottomright"
  ) %>%
  
  # Layer control
  addLayersControl(
    overlayGroups = c("Population", "Major Cities"),
    options = layersControlOptions(collapsed = FALSE)
  )

interactive_dashboard %>%
  setView(lng = 0, lat = 20, zoom = 2)

Dashboard Principles:

  • User controls for exploration
  • Multiple layers for comparison
  • Tooltips with detailed information
  • Pan and zoom functionality

Purpose: Static, high-quality graphics for reports/papers

Code
# Publication-ready figure with ggplot2
publication_map <- ggplot(world_urban) +
  geom_sf(aes(fill = log10(pop_est)), color = "white", size = 0.1) +
  scale_fill_viridis_c(
    name = "Population\n(log scale)",
    na.value = "grey90",
    labels = function(x) {
      formatted <- paste0("10^", round(x, 1))
      # Convert scientific notation to proper format
      case_when(
        x <= 6 ~ "< 1M",
        x <= 7 ~ "1-10M", 
        x <= 8 ~ "10-100M",
        TRUE ~ "> 100M"
      )
    }
  ) +
  theme_void() +
  theme(
    legend.position = "bottom",
    legend.key.width = unit(1.5, "cm"),
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 11),
    plot.caption = element_text(hjust = 1, size = 9),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  ) +
  labs(
    title = "Global Population Distribution by Country",
    subtitle = "Population estimates for 2020, showing the concentration of people in Asia",
    caption = "Data: Natural Earth | Analysis: R with sf and ggplot2"
  )

print(publication_map)

Publication Figure Principles:

  • High resolution and clean typography
  • Informative title and subtitle
  • Professional color scheme
  • Proper attribution and sourcing
  • Suitable for black-and-white printing if needed

7.3 Design Your Signature Map

Your Task: Choose a dataset relevant to your field and create a map that:

  1. Tells a clear story about a spatial pattern
  2. Uses appropriate symbology for your data type
  3. Includes proper context (title, legend, attribution)
  4. Matches your audience (academic, public, policy-makers)
Design Checklist

8 Spatial Statistics

Moving Beyond Maps

While maps are essential for exploring spatial data, spatial statistics provides rigorous methods for testing hypotheses, quantifying patterns, and making predictions about spatial processes.

8.1 Spatial Autocorrelation

One of the most important principles in spatial analysis is that “near things are more related than distant things.” This idea, sometimes called Tobler’s First Law of Geography, means that values measured in neighbouring areas are often more similar than would be expected if location had no influence. For example, neighbouring districts might share similar levels of air pollution, adjacent farmland might have comparable soil fertility, and houses on the same street might have similar market values.

This phenomenon known as spatial autocorrelation is critical to detect and account for, because it violates the assumption of independence that underlies many standard statistical techniques. If we ignore spatial autocorrelation, our results may be misleading: we might overstate the significance of a relationship, or fail to capture the true structure of the data.

R provides tools to formally test for spatial autocorrelation. A widely used measure is Moran’s I, which summarises whether high (or low) values cluster together across space more than expected by chance. A positive Moran’s I indicates clustering of similar values, while a negative Moran’s I suggests a checkerboard pattern where high and low values are interspersed. A value close to zero implies little or no spatial autocorrelation.

8.1.1 Economic Clustering Analysis

Research Question: Do countries with similar economic development tend to be neighbors?

Method: Moran’s I test for spatial autocorrelation

Your Tasks:

  1. Calculate GDP per capita for each country
  2. Define spatial neighborhoods (which countries are “neighbors”)
  3. Test for spatial autocorrelation using Moran’s I
  4. Interpret results and identify local clusters
Code
# Load spatial statistics package
library(spdep)

# Prepare economic data
world_econ <- world %>%
  filter(!is.na(gdp_md), !is.na(pop_est)) %>%
  mutate(gdp_per_capita = (gdp_md * 1000000) / pop_est) %>%
  filter(gdp_per_capita > 0)

# Create spatial weights matrix (queen contiguity)
world_nb <- poly2nb(world_econ, queen = TRUE)
world_weights <- nb2listw(world_nb, style = "W", zero.policy = TRUE)

# Calculate Global Moran's I
moran_result <- moran.test(world_econ$gdp_per_capita, world_weights, zero.policy = TRUE)

cat("Global Moran's I Results:\n")
Global Moran's I Results:
Code
cat("Moran's I statistic:", round(moran_result$estimate[1], 4), "\n")
Moran's I statistic: 0.2676 
Code
cat("Expected value (random):",
    round(moran_result$estimate[2], 4),
    "\n")
Expected value (random): -0.0061 
Code
cat("P-value:",
    format(moran_result$p.value, scientific = TRUE),
    "\n\n")
P-value: 2.313351e-06 
Code
# Interpretation
if (moran_result$p.value < 0.05) {
  if (moran_result$estimate[1] > 0) {
    cat("Interpretation: Significant positive spatial autocorrelation\n")
    cat("Countries with similar GDP per capita cluster together.\n")
  }
} else {
  cat("Interpretation: No significant spatial pattern detected.\n")
}
Interpretation: Significant positive spatial autocorrelation
Countries with similar GDP per capita cluster together.

Local Indicators of Spatial Association (LISA)

Code
# Calculate Local Moran's I (LISA)
lisa_result <- localmoran(world_econ$gdp_per_capita, world_weights, zero.policy = TRUE)

# Classify clusters
world_econ$lisa_i <- lisa_result[, 1]
world_econ$lisa_p <- lisa_result[, 5]

world_econ$cluster_type <- case_when(
  world_econ$lisa_p > 0.05 ~ "Not Significant",
  world_econ$lisa_i > 0 &
    world_econ$gdp_per_capita > mean(world_econ$gdp_per_capita) ~ "High-High",
  world_econ$lisa_i > 0 &
    world_econ$gdp_per_capita < mean(world_econ$gdp_per_capita) ~ "Low-Low",
  world_econ$lisa_i < 0 &
    world_econ$gdp_per_capita > mean(world_econ$gdp_per_capita) ~ "High-Low",
  world_econ$lisa_i < 0 &
    world_econ$gdp_per_capita < mean(world_econ$gdp_per_capita) ~ "Low-High",
  TRUE ~ "Not Significant"
)

# Visualise clusters
tm_shape(world_econ) +
  tm_polygons(
    fill        = "cluster_type",
    fill.scale  = tm_scale_categorical(values = c(
      "red", "blue", "lightblue", "pink", "white"
    )),
    fill.legend = tm_legend(title = "Economic Clusters")
  ) +
  tm_title("Local Economic Clustering (LISA)")

Code
# Summary
cluster_summary <- world_econ %>%
  st_drop_geometry() %>%
  count(cluster_type)

print("Cluster Summary:")
[1] "Cluster Summary:"
Code
print(cluster_summary)
     cluster_type   n
1       High-High   9
2 Not Significant 228

8.2 Hotspot Analysis

Hotspot analysis is used to identify areas where unusually high or low values of a variable cluster together in space. Instead of just asking whether values are spatially autocorrelated overall, as with Moran’s I, hotspot analysis highlights where those clusters occur. This makes it especially useful for applications in public health, criminology, environmental science, and urban planning.

8.2.1 Urban Crime Patterns

Scenario: Analyse crime patterns to optimise police resource allocation

Data: Simulated crime incidents in an urban area

Code
# Create simulated crime data
set.seed(2024)
n_crimes <- 250

# Create crime hotspot centers
hotspot_centers <- data.frame(
  x = c(-79.42, -79.38, -79.45),
  y = c(43.68, 43.65, 43.71),
  intensity = c(0.3, 0.3, 0.2)
)

# Generate crime points around hotspots
crime_data <- data.frame()
for (i in 1:nrow(hotspot_centers)) {
  n_points <- round(n_crimes * hotspot_centers$intensity[i])
  x_coords <- rnorm(n_points, hotspot_centers$x[i], 0.008)
  y_coords <- rnorm(n_points, hotspot_centers$y[i], 0.008)
  
  crime_data <- rbind(crime_data, data.frame(x = x_coords, y = y_coords, hotspot = i))
}

# Add random background crimes
n_background <- n_crimes - nrow(crime_data)
if (n_background > 0) {
  # Fixed: this line was cut off
  background_crimes <- data.frame(
    x = runif(n_background, -79.5, -79.3),
    y = runif(n_background, 43.6, 43.75),
    hotspot = 0
  )
  crime_data <- rbind(crime_data, background_crimes)
}

# Convert to spatial data
crime_sf <- st_as_sf(crime_data, coords = c("x", "y"), crs = 4326)

# Basic visualisation
tm_shape(crime_sf) +
  tm_symbols(size = 0.2, fill_alpha = 0.6) +
  tm_title("Crime Incident Locations")

Kernel Density Analysis

Code
# Create density surface (simplified approach)
crime_utm <- st_transform(crime_sf, 32617)  # UTM for Toronto

# Create analysis grid
bbox <- st_bbox(crime_utm)
grid_size <- 50
grid_x <- seq(bbox[1], bbox[3], length.out = grid_size)
grid_y <- seq(bbox[2], bbox[4], length.out = grid_size)

# Calculate density using distance-based weighting
bandwidth <- 200  # 200 meters
crime_coords <- st_coordinates(crime_utm)

density_matrix <- matrix(0, nrow = grid_size, ncol = grid_size)
for (i in 1:length(grid_x)) {
  for (j in 1:length(grid_y)) {
    distances <- sqrt((crime_coords[, 1] - grid_x[i])^2 +
                        (crime_coords[, 2] - grid_y[j])^2)
    weights <- exp(-0.5 * (distances / bandwidth)^2)
    density_matrix[j, i] <- sum(weights)
  }
}

# Create raster
density_raster <- rast(
  ncols = grid_size,
  nrows = grid_size,
  xmin = bbox[1],
  xmax = bbox[3],
  ymin = bbox[2],
  ymax = bbox[4],
  crs = "EPSG:32617"
)
values(density_raster) <- as.vector(density_matrix)

# Transform back for visualisation
density_wgs84 <- project(density_raster, "EPSG:4326")

# Visualise hotspots
tm_shape(density_wgs84) +
  tm_raster(
    col.scale  = tm_scale_continuous(values = "brewer.reds"),
    col.legend = tm_legend(title = "Crime Density"),
    col_alpha  = 0.8
  ) +
  tm_shape(crime_sf) +
  tm_symbols(size = 0.1, fill_alpha = 0.4) +
  tm_title("Crime Hotspot Analysis")

8.3 Spatial Interpolation

Often we only have data at a limited number of locations, such as rainfall measured at weather stations or pollutant concentrations measured at air monitors. Spatial interpolation refers to the set of methods used to estimate values at unsampled locations based on the values observed at nearby sites. The goal is to create a continuous surface from discrete points, filling in the gaps between measurements.

Inverse distance weighting (IDW) estimates values as a weighted average of surrounding points, where nearer points contribute more than distant ones. This produces smoother surfaces and is intuitive, though it does not model spatial processes explicitly.

Kriging is a geostatistical method that uses both the distance and the overall spatial autocorrelation structure of the data, modelled through a variogram. Kriging not only produces predictions but also provides an estimate of uncertainty at each location.

8.3.1 Environmental Monitoring

Objective: Predict air quality at unmeasured locations

Problem: Air quality monitoring stations are expensive and sparse. How can we estimate pollution levels across a region?

Methods:

  • Inverse Distance Weighting (IDW): simple, intuitive
  • Kriging: optimal prediction with uncertainty estimates

Application: Create continuous pollution surface from point measurements

Code
# Create simulated air quality monitoring network
library(gstat)

set.seed(2025)
n_stations <- 20

# Generate monitoring stations
stations <- data.frame(
  station_id = 1:n_stations,
  x = runif(n_stations, -84, -76),  # Longitude
  y = runif(n_stations, 42, 46),    # Latitude
  elevation = runif(n_stations, 100, 600)
)

# Simulate PM2.5 values with spatial structure
pollution_centers <- data.frame(
  x = c(-82, -78.5, -80.5),
  y = c(43, 44.5, 42.5)
)

pm25_values <- numeric(n_stations)
for(i in 1:n_stations) {
  base_pollution <- 12  # Background level
  
  # Distance to pollution sources
  for(j in 1:nrow(pollution_centers)) {
    distance <- sqrt((stations$x[i] - pollution_centers$x[j])^2 + 
                    (stations$y[i] - pollution_centers$y[j])^2)
    pollution_effect <- 15 * exp(-distance * 1.5)
    base_pollution <- base_pollution + pollution_effect
  }
  
  # Elevation effect (higher = cleaner)
  elevation_effect <- -0.015 * (stations$elevation[i] - 350)
  
  pm25_values[i] <- base_pollution + elevation_effect + rnorm(1, 0, 2)
}

stations$pm25 <- pmax(1, pm25_values)
stations_sf <- st_as_sf(stations, coords = c("x", "y"), crs = 4326)

# Visualise monitoring network
tm_shape(stations_sf) +
  tm_symbols(
    fill        = "pm25",
    fill.scale  = tm_scale_continuous(values = "plasma"),
    fill.legend = tm_legend(title = "PM2.5 (μg/m³)"),
    size        = 1.5,
    col         = "white",
    lwd         = 0.3
  ) +
  tm_title("Air Quality Monitoring Network")

Inverse Distance Weighting (IDW)

Code
# Create prediction grid
bbox <- st_bbox(stations_sf)
grid_res <- 0.05
grid_x <- seq(bbox[1], bbox[3], by = grid_res)
grid_y <- seq(bbox[2], bbox[4], by = grid_res)
pred_grid <- expand.grid(x = grid_x, y = grid_y)
pred_grid_sf <- st_as_sf(pred_grid, coords = c("x", "y"), crs = 4326)

# IDW function
idw_predict <- function(known_points, pred_points, values, power = 2) {
  known_coords <- st_coordinates(known_points)
  pred_coords <- st_coordinates(pred_points)
  
  predictions <- numeric(nrow(pred_coords))
  for(i in 1:nrow(pred_coords)) {
    distances <- sqrt((known_coords[,1] - pred_coords[i,1])^2 + 
                     (known_coords[,2] - pred_coords[i,2])^2)
    distances[distances == 0] <- 1e-10  # Avoid division by zero
    weights <- 1 / (distances^power)
    predictions[i] <- sum(weights * values) / sum(weights)
  }
  return(predictions)
}

# Apply IDW
idw_predictions <- idw_predict(stations_sf, pred_grid_sf, stations$pm25)

# Create IDW raster
idw_raster <- rast(
  ncols = length(grid_x), nrows = length(grid_y),
  xmin = bbox[1], xmax = bbox[3], ymin = bbox[2], ymax = bbox[4],
  crs = "EPSG:4326"
)
values(idw_raster) <- idw_predictions

# Visualise IDW results
tm_shape(idw_raster) +
  tm_raster(
    col.scale  = tm_scale_continuous(values = "plasma"),
    col.legend = tm_legend(title = "PM2.5 (μg/m³)"),
    col_alpha  = 0.8
  ) +
  tm_shape(stations_sf) +
  tm_symbols(fill = "white",
             col  = "black",
             size = 0.5) +
  tm_title("Air Quality - IDW Interpolation")

Kriging Interpolation

Code
# Convert to sp format for gstat
stations_sp <- as(stations_sf, "Spatial")
pred_grid_sp <- as(pred_grid_sf, "Spatial")

# Fit variogram
vgm_data <- variogram(pm25 ~ 1, stations_sp)

# Fit model
vgm_model <- fit.variogram(vgm_data, model = vgm("Sph"))

cat("Fitted Variogram Parameters:\n")
Fitted Variogram Parameters:
Code
print(vgm_model)
  model    psill    range
1   Nug  0.00000   0.0000
2   Sph 17.57526 146.2779
Code
# Perform kriging
kriging_result <- krige(pm25 ~ 1, stations_sp, pred_grid_sp, model = vgm_model)
[using ordinary kriging]
Code
# Create kriging raster
kriging_raster <- rast(
  ncols = length(grid_x), nrows = length(grid_y),
  xmin = bbox[1], xmax = bbox[3], ymin = bbox[2], ymax = bbox[4],
  crs = "EPSG:4326"
)
values(kriging_raster) <- kriging_result$var1.pred

# Visualisation comparison
idw_map <- tm_shape(idw_raster) +
  tm_raster(
    col.scale  = tm_scale_continuous(values = "plasma"),
    col.legend = tm_legend(title = "PM2.5")
  ) +
  tm_shape(stations_sf) +
  tm_symbols(fill = "white", size = 0.3) +
  tm_title("IDW Interpolation") +
  tm_layout(legend.show = FALSE)

kriging_map <- tm_shape(kriging_raster) +
  tm_raster(
    col.scale  = tm_scale_continuous(values = "plasma"),
    col.legend = tm_legend(title = "PM2.5")
  ) +
  tm_shape(stations_sf) +
  tm_symbols(fill = "white", size = 0.3) +
  tm_title("Kriging Interpolation")

tmap_arrange(idw_map, kriging_map, ncol = 2)

Code
# Calculate prediction statistics
cat("\nInterpolation Comparison:\n")

Interpolation Comparison:
Code
cat("IDW - Mean:", round(mean(idw_predictions), 2), 
    "SD:", round(sd(idw_predictions), 2), "\n")
IDW - Mean: 15.06 SD: 2.13 
Code
cat("Kriging - Mean:", round(mean(kriging_result$var1.pred), 2),
    "SD:", round(sd(kriging_result$var1.pred), 2), "\n")
Kriging - Mean: 15.17 SD: 2.18 

9 Reproducible Spatial Workflows

Working with spatial data often involves many steps: downloading datasets, cleaning and transforming them, performing analyses, and producing maps or statistical results. Without careful documentation, it can be difficult or even impossible to reproduce the exact workflow later, either by yourself or by other researchers. Reproducibility is therefore a cornerstone of good spatial analysis practice.

9.1 Learning Objectives

  • Integrate spatial analysis into reproducible research workflows
  • Organise spatial projects effectively
  • Document spatial analysis processes
  • Share spatial research outcomes

9.2 Project Organisation for Spatial Analysis

Effective spatial analysis requires structured workflows that others (including your future self) can understand and reproduce.

spatial_project/
├── README.md
├── spatial_project.Rproj
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── scripts/
│   ├── 01_data_preparation.R
│   ├── 02_spatial_analysis.R
│   ├── 03_visualisation.R
│   └── 04_statistical_analysis.R
├── output/
│   ├── figures/
│   ├── maps/
│   └── tables/
└── docs/
    ├── analysis_report.qmd
    └── methods_documentation.md

9.3 Spatial Analysis to Answer a Research Question

Research Question: How do environmental and demographic factors influence economic development patterns across countries?

Your Task: Create a comprehensive spatial analysis that includes:

  1. Data preparation: Clean and integrate multiple datasets
  2. Exploratory mapping: Visualise key variables spatially
  3. Spatial operations: Calculate meaningful derived variables
  4. Statistical analysis: Test for spatial patterns and relationships
  5. Communication: Create final maps and summary

Datasets:

  • Country boundaries with population and economic data
  • City locations with population sizes
  • Environmental data (elevation, climate zones)

Step 1: Data Integration

Code
# Integrate multiple datasets
analysis_data <- world %>%
  filter(!is.na(pop_est), !is.na(gdp_md)) %>%
  mutate(
    gdp_per_capita = (gdp_md * 1000000) / pop_est,
    area_km2 = as.numeric(st_area(geometry)) / 1000000,
    pop_density = pop_est / area_km2,  # Convert to per km²
    development_level = case_when(
      gdp_per_capita < 5000 ~ "Low Income",
      gdp_per_capita < 15000 ~ "Lower Middle",
      gdp_per_capita < 50000 ~ "Upper Middle", 
      TRUE ~ "High Income"
    )
  ) %>%
  select(name, continent, pop_est, gdp_md, gdp_per_capita, 
         pop_density, development_level, geometry)

# Extract environmental data for each country
country_centroids <- st_centroid(analysis_data)
country_elevation <- extract(elevation, country_centroids)

analysis_data$avg_elevation <- country_elevation[,2]

cat("Integrated dataset:", nrow(analysis_data), "countries\n")
Integrated dataset: 242 countries
Code
cat("Variables:", ncol(analysis_data) - 1, "attributes + geometry\n")
Variables: 8 attributes + geometry

Step 2: Exploratory Spatial Analysis

Code
# Create multi-panel exploratory maps
gdp_map <- tm_shape(analysis_data) +
  tm_polygons(
    fill = "gdp_per_capita",
    fill.scale  = tm_scale_intervals(
      # <- not development_level()
      style  = "fisher",
      values = "viridis",
      # cols4all palette name
      n      = 7
    ),
    fill.legend = tm_legend(title = "GDP per capita")
  ) +
  tm_title("Economic Development")


pop_map <- tm_shape(analysis_data) +
  tm_polygons(
    fill = "pop_density",
    fill.scale = tm_scale_intervals(
      style  = "fisher",
      values = "plasma",
      # cols4all palette name
      n      = 7
    ),
    fill.legend = tm_legend(title = "Population density (per km²)")
  ) +
  tm_title("Population Density")

elev_map <- tm_shape(analysis_data) +
  tm_polygons(
    fill = "avg_elevation",
    fill.scale = tm_scale_intervals(
      style  = "fisher",
      # you can pass an explicit vector if the name isn't a cols4all palette
      values = terrain.colors(7),
      n      = 7
    ),
    fill.legend = tm_legend(title = "Elevation (m)")
  ) +
  tm_title("Average Elevation")

dev_map <- tm_shape(analysis_data) +
  tm_polygons(
    fill = "development_level",
    fill.scale = tm_scale_categorical(values = c("red", "orange", "yellow", "green")),
    fill.legend = tm_legend(title = "Development Level")
  ) +
  tm_title("Development Categories")

# Arrange multiple maps
tmap_arrange(gdp_map, pop_map, elev_map, dev_map, ncol = 2)

Step 3: Spatial Operations and Analysis

Code
# Spatial autocorrelation analysis
analysis_nb <- poly2nb(analysis_data, queen = TRUE)
analysis_weights <- nb2listw(analysis_nb, style = "W", zero.policy = TRUE)

# Test multiple variables for spatial autocorrelation
variables <- c("gdp_per_capita", "pop_density", "avg_elevation")
moran_results <- data.frame()

for(var in variables) {
  if(var %in% names(analysis_data)) {
    values <- pull(analysis_data, var)
    # Remove missing values before testing
    if(sum(!is.na(values)) > 10) {
      moran_test <- moran.test(values, analysis_weights, 
                               zero.policy = TRUE, 
                               na.action = na.omit)  # Added this line
      moran_results <- rbind(moran_results, data.frame(
        Variable = var,
        Morans_I = round(moran_test$estimate[1], 4),
        P_value = round(moran_test$p.value, 4),
        Significant = moran_test$p.value < 0.05
      ))
    }
  }
}

cat("Spatial Autocorrelation Results:\n")
Spatial Autocorrelation Results:
Code
print(moran_results)
                         Variable Morans_I P_value Significant
Moran I statistic  gdp_per_capita   0.2286  0.0000        TRUE
Moran I statistic1    pop_density   0.0056  0.3288       FALSE

Step 4: Statistical Relationships

Code
# Examine relationships between variables
analysis_df <- analysis_data %>%
  st_drop_geometry() %>%
  filter(!is.na(gdp_per_capita), !is.na(avg_elevation), !is.na(pop_density))

# Correlation analysis
cor_matrix <- cor(analysis_df[c("gdp_per_capita", "pop_density", "avg_elevation")], 
                  use = "complete.obs")

cat("Correlation Matrix:\n")
Correlation Matrix:
Code
print(round(cor_matrix, 3))
               gdp_per_capita pop_density avg_elevation
gdp_per_capita          1.000       0.972        -0.311
pop_density             0.972       1.000        -0.523
avg_elevation          -0.311      -0.523         1.000
Code
# Regional analysis
regional_summary <- analysis_df %>%
  group_by(continent) %>%
  summarise(
    countries = n(),
    avg_gdp_pc = mean(gdp_per_capita, na.rm = TRUE),
    avg_pop_density = mean(pop_density, na.rm = TRUE),
    avg_elevation = mean(avg_elevation, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  arrange(desc(avg_gdp_pc))

cat("\nRegional Development Patterns:\n")

Regional Development Patterns:
Code
print(regional_summary)
# A tibble: 1 × 5
  continent     countries avg_gdp_pc avg_pop_density avg_elevation
  <chr>             <int>      <dbl>           <dbl>         <dbl>
1 North America         3     76193.            336.         2004.

Step 5: Final Visualisation and Summary

Code
# Create comprehensive final map
final_map <- tm_shape(analysis_data) +
  tm_polygons(
    fill        = "development_level",
    fill.scale  = tm_scale_categorical(values = c(
      "#d73027", "#fc8d59", "#fee08b", "#91cf60"
    )),
    fill.legend = tm_legend(title = "Development Level")
  ) +
  tm_shape(cities_sf) +
  tm_symbols(
    size        = "population",
    size.scale  = tm_scale_continuous(values.scale = 2),
    size.legend = tm_legend(title = "City Population"),
    col         = "black",
    alpha       = 0.6
  ) +
  tm_text("name", size = 0.6, ymod = 1.3) +
  tm_title("Global Development Patterns: Economic, Demographic, and Urban Factors") +
  tm_layout(
    title.size        = 1.4,
    legend.position   = c("left", "bottom"),
    legend.title.size = 1.1
  ) +
  tm_credits(
    "Analysis combines economic data, population density, and urban hierarchies",
    position = c("center", "bottom")
  )

print(final_map)

Code
# Key findings summary
cat("\nKey Findings:\n")

Key Findings:
Code
cat("1. Economic development shows strong spatial clustering\n")
1. Economic development shows strong spatial clustering
Code
cat("2. High-income countries concentrated in North America, Europe, Oceania\n") 
2. High-income countries concentrated in North America, Europe, Oceania
Code
cat("3. Population density varies independently of economic development\n")
3. Population density varies independently of economic development
Code
cat("4. Geographic factors (elevation) show weak correlation with development\n")
4. Geographic factors (elevation) show weak correlation with development
Code
cat("5. Urban centers often indicate broader regional development patterns\n")
5. Urban centers often indicate broader regional development patterns

10 Conclusion and Next Steps

You’ve completed an introduction to spatial analysis in R. You now have:

  • Spatial thinking skills to recognise location-based opportunities in research
  • Technical proficiency with modern spatial R packages (sf, terra, tmap)
  • Analytical toolkit for fundamental spatial operations and statistics
  • Communication abilities to create effective maps for different audiences
  • Workflow knowledge for reproducible spatial research

10.1 Learning Resources

General Spatial Analysis:

Domain-Specific Applications:

Additional Resources:

With thanks to Abbie Chapman, Joana Pereira Da Cruz, and Bin Chi:

10.2 Final Reflection

Spatial analysis opens new perspectives on research questions across virtually every domain. The techniques you’ve learned provide a foundation, but the real power comes from applying spatial thinking to your specific research challenges.

Remember:

  • Start simple and build complexity gradually
  • Always question whether spatial patterns might be meaningful
  • Consider how location and context affect your phenomena
  • Share your spatial discoveries with others

The spatial perspective transforms how we understand our world from local patterns in your research site to global processes affecting humanity. You now have the tools to explore these spatial relationships systematically and rigorously.