Spatial data analysis in R brings together several different types of tasks: reading data from a variety of formats, manipulating it to create the attributes you need, visualising it through maps, and applying statistical techniques that account for spatial structure. The aim of this tutorial is to introduce each of these steps in a way that is accessible to beginners while still being useful for those with some prior experience in R.
We will start by working with vector data (points, lines, and polygons), then move on to raster data (continuous surfaces such as elevation or temperature). Along the way, we will explore how to produce both static and interactive maps, and then carry out some introductory forms of spatial analysis, including tests for spatial autocorrelation and geostatistical techniques.
The tutorial does not assume prior experience with spatial analysis, but it does assume some familiarity with R. The examples are designed to be reproducible and to build a foundation that you can adapt to your own datasets. By the end, you should feel more confident about:
Importing and exploring spatial data in R.
Creating high-quality maps to visualise your data.
Understanding when and why spatial statistical methods are needed.
Applying some of the most commonly used exploratory and modelling techniques.
What You’ll Need
R and RStudio installed on your computer
Basic familiarity with R
Curiosity about spatial patterns in your research domain
1 Setup
1.1 Installing Required Packages
Code
install.packages(c("sf", # Vector data"terra", # Raster data "tmap", # Thematic mapping"leaflet", # Interactive maps"dplyr", # Data manipulation"ggplot2", # Plotting"rnaturalearth", # World data"spdep", # Spatial statistics"gstat", # Spatial interpolation"spatstat"# Point pattern analysis))
1.2 Loading Libraries and Settings
Code
library(sf)library(terra)library(tmap)library(dplyr)library(ggplot2)library(leaflet)library(rnaturalearth)# Set tmap mode for static plotstmap_mode("plot")# Suppress scientific notation for cleaner outputoptions(scipen =999)
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -123.12 ymin: 43.65 xmax: -73.57 ymax: 51.05
Geodetic CRS: WGS 84
name population province geometry
1 Toronto 2930000 Ontario POINT (-79.38 43.65)
2 Montreal 1780000 Quebec POINT (-73.57 45.5)
3 Vancouver 631000 British Columbia POINT (-123.12 49.28)
4 Calgary 1336000 Alberta POINT (-114.07 51.05)
5 Ottawa 994837 Ontario POINT (-75.7 45.42)
2 Understanding Spatial Data
2.1 What Makes Data “Spatial”?
Not all data is spatial, but a great deal of the information we work with has a geographic dimension. What makes data spatial is that each observation is tied to a location, either through coordinates (such as latitude and longitude) or through a defined area (such as a district, postcode, or country). This spatial reference allows us to place the data on a map and to consider relationships in terms of distance, proximity, and adjacency.
Tobler’s First Law of Geography
“Everything is related to everything else, but near things are more related than distant things.”
2.2 Spatial Data Detective
Your Task: For each dataset below, decide if it contains spatial information and explain how location might affect the analysis:
Dataset A: Customer purchase records with postal codes Dataset B: Stock market prices over time Dataset C: Twitter sentiment analysis with GPS coordinates Dataset D: University enrolment numbers by year Dataset E: Air quality readings from monitoring stations
Solution and Discussion
Dataset A: Customer purchase records with postal codes ✅ SPATIAL
Spatial component: Postal codes can be geocoded to locations
Spatial data in R comes in two main forms: vector and raster. Vector data represents discrete objects such as points (e.g. hospital locations), lines (e.g. roads), or polygons (e.g. administrative boundaries). Raster data, by contrast, represents continuous surfaces such as temperature, elevation, or satellite imagery.
Code
# Vector data: Discrete features with precise boundariesprint("Vector data examples:")
A CRS defines how spatial data relates to real locations on Earth. Since Earth is round but maps are flat, we need mathematical transformations to project 3D coordinates onto 2D surfaces. CRS includes:
Projected CRS: Uses a flat coordinate system (e.g. UTM, EPSG:32617)
Local CRS: Specific to a region (e.g. Statistics Canada Lambert, EPSG:3347)
Using CRS correctly is crucial because: - It ensures spatial data aligns correctly on maps - It allows accurate distance and area calculations - It enables spatial operations like buffering, intersection, and joins
Code
# Check the CRS of our datacat("World countries CRS:\n")
# For distance calculations, use projected CRStoronto <- cities_sf[cities_sf$name =="Toronto", ]# Wrong: distance in degrees (meaningless)montreal <- cities_sf[cities_sf$name =="Montreal", ]dist_degrees <-st_distance(toronto, montreal)cat("Distance in degrees:", dist_degrees, "\n")
Distance in degrees: 503922.1
Code
# Right: distance in meterstoronto_utm <-st_transform(toronto, 32617) # UTM Zone 17Nmontreal_utm <-st_transform(montreal, 32617)dist_meters <-st_distance(toronto_utm, montreal_utm)cat("Distance in meters:", round(as.numeric(dist_meters) /1000), "km\n")
Distance in meters: 506 km
Key CRS considerations
2.5 When to use different CRS
4326 (WGS84): Global data, web mapping, data storage
Projected CRS: Measurements, area calculations, buffers
Local CRS: Region-specific analysis for maximum accuracy
2.6 Common transformations
st_transform(data, 4326) - Convert to lat/lng
st_transform(data, 3857) - Convert to Web Mercator
st_transform(data, 32617) - Convert to UTM (Eastern North America)
3 Mapping
3.1 The Grammar of Maps
Just as ggplot2 uses a grammar of graphics, tmap provides a grammar of maps with consistent syntax for different map elements. This allows you to build complex maps step by step, adding layers and customising styles in a structured way.
3.2 Key Functions
tm_shape(): Defines the spatial data to be plotted
# A tibble: 7 × 4
continent countries mean_gdp_pc median_gdp_pc
<chr> <int> <dbl> <dbl>
1 Antarctica 1 200000 200000
2 Europe 50 37976. 29710.
3 N. America 38 23298. 16190.
4 Asia 53 14810. 5476.
5 S. America 13 14704. 6978.
6 Oceania 25 13553. 6174.
7 Africa 54 2125. 1337.
Key Concepts Learned:
Data transformation for mapping
Faceted maps for comparative analysis
Advanced layout customisation
Integrating statistical summaries
4 Spatial Operations
We’ll focus on three fundamental spatial operations that solve most spatial analysis problems:
Buffer: “What’s within X distance?”
Intersect: “What overlaps with what?”
Spatial Join: “How can we connect spatial features based on location?”
These operations allow us to manipulate and analyse spatial data effectively, answering questions about proximity, overlap, and relationships between different spatial features.
Code
# Load Canadian administrative boundariescanada <- world[world$name =="Canada", ]provinces <-ne_states(country ="canada", returnclass ="sf")# 1. BUFFERS - Create 100km zones around citiescities_utm <-st_transform(cities_sf, crs =3347) # Project to UTM for accurate distancescity_buffers <-st_buffer(cities_utm, dist =100000) # 100km# 2. INTERSECTIONS - Find which provinces contain citiescities_provinces <-st_intersection(cities_sf, provinces)# 3. SPATIAL JOINS - Add province information to citiescities_with_provinces <-st_join(cities_sf, provinces)# Visualise resultstm_shape(provinces) +tm_borders(col ="gray") +tm_shape(st_transform(city_buffers, 4326)) +tm_borders(col ="blue", lwd =2) +tm_fill(col ="blue", fill_alpha =0.2) +tm_shape(cities_sf) +tm_symbols(col ="red", size =1) +tm_text("name", ymod =1.2) +tm_title("Cities with 100km Buffers")
5 Spatial Relationships
Spatial predicates are functions that test relationships between geometries, returning TRUE or FALSE. Think of them as questions you can ask about spatial features: “Does this city lie within this county?” or “Do these roads intersect?” These simple questions become powerful analytical tools when applied systematically across datasets.
In R, we can test these relationships using functions from the sf package. Here are some common spatial predicates:
st_intersects(): “Do these features overlap in any way?” (broadest, most commonly used)
st_within(): “Is feature A completely inside feature B?” (strict containment)
st_contains(): Reverse of within - “Does feature A completely contain feature B?”
st_touches(): “Do features share a boundary but not interior space?” (adjacent polygons)
st_overlaps(): “Do features share some but not all space?” (partially overlapping regions)
5.1 Spatial Predicates
Objective: Learn to test and understand spatial relationships
Code
# Load Canadian provinces for relationship testingprovinces <-ne_states(country ="canada", returnclass ="sf")# Test different spatial relationshipscat("Spatial Relationship Tests:\n")
Spatial Relationship Tests:
Code
# Which cities are within provinces?within_results <-st_within(cities_sf, provinces)cat("Cities within provinces:\n")
Cities within provinces:
Code
for (i in1:nrow(cities_sf)) { province_idx <- within_results[[i]]if (length(province_idx) >0) {cat(" ", cities_sf$name[i], "is within", provinces$name[province_idx], "\n") } else {cat(" ", cities_sf$name[i],"is not within any province (likely boundary/water)\n") }}
Toronto is within Ontario
Montreal is within Québec
Vancouver is within British Columbia
Calgary is within Alberta
Ottawa is within Ontario
Code
# Test intersections (broader than within)intersects_results <-st_intersects(cities_sf, provinces)cat("\nIntersection test (should match within for points):\n")
Intersection test (should match within for points):
Coordinate Reference Systems (CRS) are crucial for spatial operations. If your data is in geographic coordinates (degrees), operations like buffering or distance calculations will yield incorrect results. Always project your data to an appropriate CRS before performing spatial operations.
Code
# Demonstrate CRS effects on bufferstoronto <- cities_sf[cities_sf$name =="Toronto", ]# Wrong: Buffer in geographic coordinates (degrees)buffer_degrees <-st_buffer(toronto, dist =0.1) # 0.1 degreesarea_degrees <-st_area(buffer_degrees)# Right: Buffer in projected coordinates (meters)toronto_projected <-st_transform(toronto, 3347) # Statistics Canada Lambertbuffer_meters <-st_buffer(toronto_projected, dist =11000) # 11km (approximately 0.1 degrees at Toronto latitude)area_meters <-st_area(buffer_meters)# Transform back for comparisonbuffer_meters_geo <-st_transform(buffer_meters, 4326)cat("Buffer Comparison:\n")
# Show available citiesavailable_cities_wgs84 <-st_transform(available_cities, 4326)print(available_cities_wgs84[c("name", "population")])
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -114.07 ymin: 45.5 xmax: -73.57 ymax: 51.05
Geodetic CRS: WGS 84
name population geometry
2 Montreal 1780000 POINT (-73.57 45.5)
4 Calgary 1336000 POINT (-114.07 51.05)
Step 3: Market Opportunity Analysis
Code
# Calculate distance to nearest competitor for each available citydistances_to_competitors <-st_distance(available_cities, competitors_proj)min_competitor_distance <-apply(distances_to_competitors, 1, min)# Create market opportunity metricsavailable_cities <- available_cities %>%mutate(distance_to_competitor =as.numeric(min_competitor_distance),market_opportunity = population / (distance_to_competitor /1000), # Pop per kmopportunity_rank =rank(-market_opportunity) ) %>%arrange(opportunity_rank)# Show rankingsmarket_analysis <- available_cities %>%st_drop_geometry() %>%select(name, population, distance_to_competitor, market_opportunity, opportunity_rank) %>%mutate(distance_to_competitor =round(distance_to_competitor/1000, 1))print("Market Opportunity Rankings:")
[1] "Market Opportunity Rankings:"
Code
print(market_analysis)
name population distance_to_competitor market_opportunity
1 Montreal 1780000 169.4 10509.845
2 Calgary 1336000 672.3 1987.064
opportunity_rank
1 1
2 2
Raster data represents the world as a continuous surface divided into a grid of cells, or pixels, each of which stores a value. This is different from vector data, which represents discrete objects such as points, lines, and polygons. Rasters are best suited to variables that change gradually across space, such as elevation, rainfall, temperature, vegetation, or remotely sensed imagery from satellites.
Each raster has two key dimensions:
Resolution, which refers to the size of each cell. High-resolution rasters have small cells and show more detail, while low-resolution rasters have larger cells and generalise patterns.
Extent, which refers to the geographic area the raster covers.
Rasters can be single-layer, where each cell stores one value, or multi-layer, where each cell contains a stack of values representing different variables or time points. For example, a single raster might store average temperature for July 2024, while a multi-layer raster could store monthly averages for an entire year.
In practice, rasters are extremely powerful because they allow us to represent continuous phenomena, run cell-based calculations, and combine different surfaces through map algebra. For example, you might calculate slope from an elevation raster, estimate average rainfall within administrative boundaries, or overlay pollution and population rasters to identify at-risk areas.
The best spatial analysis is meaningless if you can’t communicate your findings effectively. This section focuses on the transition from exploratory analysis to polished communication.
7.2 Three Approaches to Map Communication
Objective: Learn to match visualisation approach to communication goals
Purpose: Guide viewers through a specific narrative or argument
Code
# Create a story about urbanisationworld_urban <- world %>%filter(!is.na(pop_est)) %>%mutate(pop_category =case_when( pop_est <5e6~"Small (<5M)", pop_est <25e6~"Medium (5-25M)", pop_est <100e6~"Large (25-100M)",TRUE~"Mega (>100M)" ),pop_category =factor( pop_category,levels =c("Small (<5M)","Medium (5-25M)","Large (25-100M)","Mega (>100M)" ) ) )# Story map with clear narrativestory_map <-tm_shape(world_urban) +tm_polygons(fill ="pop_category",fill.scale =tm_scale_categorical(values =c("lightblue", "yellow", "orange", "red")),fill.legend =tm_legend(title ="Population Category") ) +tm_title("The Growing World: Countries by Population Size") +tm_layout(title.size =1.4,legend.position =c("left", "bottom"),frame =FALSE ) +tm_credits("Four countries now exceed 100M people: China, India, USA, Indonesia",tm_pos_out =c("center", "bottom"),size =0.8 )print(story_map)
Story Map Principles:
Clear, descriptive title tells the main message
Color scheme supports the narrative
Caption provides key insight
Simple legend that’s easy to interpret
Purpose: Let users explore data themselves
Code
pal_pop <-colorNumeric(palette ="viridis", domain = world_urban$pop_est)interactive_dashboard <-leaflet(options =leafletOptions(worldCopyJump =FALSE, continuousWorld =FALSE)) %>%addTiles(options =tileOptions(noWrap =TRUE)) %>%# Polygons coloured by population estimateaddPolygons(data = world_urban,fillColor =~pal_pop(pop_est),weight =1,opacity =1,color ="white",fillOpacity =0.7,highlightOptions =highlightOptions(weight =2,color ="black",bringToFront =TRUE ),label =~paste("Population:", pop_est),group ="Population" ) %>%# Points for major citiesaddCircleMarkers(data = cities_sf,radius =~sqrt(population) /100,# same scaling as yoursfillColor ="blue",fillOpacity =0.7,stroke =FALSE,label =~paste(name, "<br>", "Population:", population),group ="Major Cities" ) %>%# Legend (reuse the same palette)addLegend(pal = pal_pop,values = world_urban$pop_est,title ="Population Estimate",position ="bottomright" ) %>%# Layer controladdLayersControl(overlayGroups =c("Population", "Major Cities"),options =layersControlOptions(collapsed =FALSE) )interactive_dashboard %>%setView(lng =0, lat =20, zoom =2)
Dashboard Principles:
User controls for exploration
Multiple layers for comparison
Tooltips with detailed information
Pan and zoom functionality
Purpose: Static, high-quality graphics for reports/papers
Code
# Publication-ready figure with ggplot2publication_map <-ggplot(world_urban) +geom_sf(aes(fill =log10(pop_est)), color ="white", size =0.1) +scale_fill_viridis_c(name ="Population\n(log scale)",na.value ="grey90",labels =function(x) { formatted <-paste0("10^", round(x, 1))# Convert scientific notation to proper formatcase_when( x <=6~"< 1M", x <=7~"1-10M", x <=8~"10-100M",TRUE~"> 100M" ) } ) +theme_void() +theme(legend.position ="bottom",legend.key.width =unit(1.5, "cm"),plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =11),plot.caption =element_text(hjust =1, size =9),legend.title =element_text(size =10),legend.text =element_text(size =9) ) +labs(title ="Global Population Distribution by Country",subtitle ="Population estimates for 2020, showing the concentration of people in Asia",caption ="Data: Natural Earth | Analysis: R with sf and ggplot2" )print(publication_map)
Publication Figure Principles:
High resolution and clean typography
Informative title and subtitle
Professional color scheme
Proper attribution and sourcing
Suitable for black-and-white printing if needed
7.3 Design Your Signature Map
Your Task: Choose a dataset relevant to your field and create a map that:
Tells a clear story about a spatial pattern
Uses appropriate symbology for your data type
Includes proper context (title, legend, attribution)
Matches your audience (academic, public, policy-makers)
Design Checklist
8 Spatial Statistics
Moving Beyond Maps
While maps are essential for exploring spatial data, spatial statistics provides rigorous methods for testing hypotheses, quantifying patterns, and making predictions about spatial processes.
8.1 Spatial Autocorrelation
One of the most important principles in spatial analysis is that “near things are more related than distant things.” This idea, sometimes called Tobler’s First Law of Geography, means that values measured in neighbouring areas are often more similar than would be expected if location had no influence. For example, neighbouring districts might share similar levels of air pollution, adjacent farmland might have comparable soil fertility, and houses on the same street might have similar market values.
This phenomenon known as spatial autocorrelation is critical to detect and account for, because it violates the assumption of independence that underlies many standard statistical techniques. If we ignore spatial autocorrelation, our results may be misleading: we might overstate the significance of a relationship, or fail to capture the true structure of the data.
R provides tools to formally test for spatial autocorrelation. A widely used measure is Moran’s I, which summarises whether high (or low) values cluster together across space more than expected by chance. A positive Moran’s I indicates clustering of similar values, while a negative Moran’s I suggests a checkerboard pattern where high and low values are interspersed. A value close to zero implies little or no spatial autocorrelation.
cluster_type n
1 High-High 9
2 Not Significant 228
8.2 Hotspot Analysis
Hotspot analysis is used to identify areas where unusually high or low values of a variable cluster together in space. Instead of just asking whether values are spatially autocorrelated overall, as with Moran’s I, hotspot analysis highlights where those clusters occur. This makes it especially useful for applications in public health, criminology, environmental science, and urban planning.
Often we only have data at a limited number of locations, such as rainfall measured at weather stations or pollutant concentrations measured at air monitors. Spatial interpolation refers to the set of methods used to estimate values at unsampled locations based on the values observed at nearby sites. The goal is to create a continuous surface from discrete points, filling in the gaps between measurements.
Inverse distance weighting (IDW) estimates values as a weighted average of surrounding points, where nearer points contribute more than distant ones. This produces smoother surfaces and is intuitive, though it does not model spatial processes explicitly.
Kriging is a geostatistical method that uses both the distance and the overall spatial autocorrelation structure of the data, modelled through a variogram. Kriging not only produces predictions but also provides an estimate of uncertainty at each location.
8.3.1 Environmental Monitoring
Objective: Predict air quality at unmeasured locations
# Convert to sp format for gstatstations_sp <-as(stations_sf, "Spatial")pred_grid_sp <-as(pred_grid_sf, "Spatial")# Fit variogramvgm_data <-variogram(pm25 ~1, stations_sp)# Fit modelvgm_model <-fit.variogram(vgm_data, model =vgm("Sph"))cat("Fitted Variogram Parameters:\n")
Fitted Variogram Parameters:
Code
print(vgm_model)
model psill range
1 Nug 0.00000 0.0000
2 Sph 17.57526 146.2779
Code
# Perform krigingkriging_result <-krige(pm25 ~1, stations_sp, pred_grid_sp, model = vgm_model)
Working with spatial data often involves many steps: downloading datasets, cleaning and transforming them, performing analyses, and producing maps or statistical results. Without careful documentation, it can be difficult or even impossible to reproduce the exact workflow later, either by yourself or by other researchers. Reproducibility is therefore a cornerstone of good spatial analysis practice.
9.1 Learning Objectives
Integrate spatial analysis into reproducible research workflows
Organise spatial projects effectively
Document spatial analysis processes
Share spatial research outcomes
9.2 Project Organisation for Spatial Analysis
Effective spatial analysis requires structured workflows that others (including your future self) can understand and reproduce.
Spatial analysis opens new perspectives on research questions across virtually every domain. The techniques you’ve learned provide a foundation, but the real power comes from applying spatial thinking to your specific research challenges.
Remember:
Start simple and build complexity gradually
Always question whether spatial patterns might be meaningful
Consider how location and context affect your phenomena
Share your spatial discoveries with others
The spatial perspective transforms how we understand our world from local patterns in your research site to global processes affecting humanity. You now have the tools to explore these spatial relationships systematically and rigorously.
Source Code
---title: "Mapping and Spatial Analysis in R"subtitle: "A Tutorial for the UCL R User Group"author: "Justin Yang"date: todayeditor: visual---## Overview {.unnumbered}Spatial data analysis in R brings together several different types of tasks: reading data from a variety of formats, manipulating it to create the attributes you need, visualising it through maps, and applying statistical techniques that account for spatial structure. The aim of this tutorial is to introduce each of these steps in a way that is accessible to beginners while still being useful for those with some prior experience in R.We will start by working with vector data (points, lines, and polygons), then move on to raster data (continuous surfaces such as elevation or temperature). Along the way, we will explore how to produce both static and interactive maps, and then carry out some introductory forms of spatial analysis, including tests for spatial autocorrelation and geostatistical techniques.The tutorial does not assume prior experience with spatial analysis, but it does assume some familiarity with R. The examples are designed to be reproducible and to build a foundation that you can adapt to your own datasets. By the end, you should feel more confident about:- Importing and exploring spatial data in R.- Creating high-quality maps to visualise your data.- Understanding when and why spatial statistical methods are needed.- Applying some of the most commonly used exploratory and modelling techniques.::: callout-important## What You'll Need- R and RStudio installed on your computer- Basic familiarity with R- Curiosity about spatial patterns in your research domain:::## Setup### Installing Required Packages```{r}#| label: setup-packages#| eval: falseinstall.packages(c("sf", # Vector data"terra", # Raster data "tmap", # Thematic mapping"leaflet", # Interactive maps"dplyr", # Data manipulation"ggplot2", # Plotting"rnaturalearth", # World data"spdep", # Spatial statistics"gstat", # Spatial interpolation"spatstat"# Point pattern analysis))```### Loading Libraries and Settings```{r}#| label: setup-librarieslibrary(sf)library(terra)library(tmap)library(dplyr)library(ggplot2)library(leaflet)library(rnaturalearth)# Set tmap mode for static plotstmap_mode("plot")# Suppress scientific notation for cleaner outputoptions(scipen =999)```### Creating Sample Data```{r}#| label: setup-data# Create example cities datasetcities <-data.frame(name =c("Toronto", "Montreal", "Vancouver", "Calgary", "Ottawa"),population =c(2930000, 1780000, 631000, 1336000, 994837),longitude =c(-79.38, -73.57, -123.12, -114.07, -75.70),latitude =c(43.65, 45.50, 49.28, 51.05, 45.42),province =c("Ontario", "Quebec", "British Columbia", "Alberta", "Ontario"))# Convert to spatial datacities_sf <-st_as_sf(cities,coords =c("longitude", "latitude"),crs =4326) # WGS84 coordinate system# Load world country dataworld <-ne_countries(scale ="medium", returnclass ="sf")# Quick preview of our dataprint(cities_sf)```------------------------------------------------------------------------## Understanding Spatial Data### What Makes Data "Spatial"?Not all data is spatial, but a great deal of the information we work with has a geographic dimension. What makes data spatial is that each observation is tied to a location, either through coordinates (such as latitude and longitude) or through a defined area (such as a district, postcode, or country). This spatial reference allows us to place the data on a map and to consider relationships in terms of distance, proximity, and adjacency.::: callout-note## Tobler's First Law of Geography> "Everything is related to everything else, but near things are more related than distant things.":::### Spatial Data Detective**Your Task**: For each dataset below, decide if it contains spatial information and explain how location might affect the analysis:**Dataset A**: Customer purchase records with postal codes\**Dataset B**: Stock market prices over time\**Dataset C**: Twitter sentiment analysis with GPS coordinates\**Dataset D**: University enrolment numbers by year\**Dataset E**: Air quality readings from monitoring stations::: {.callout-tip collapse="true"}## Solution and Discussion**Dataset A: Customer purchase records with postal codes** ✅ **SPATIAL**- *Spatial component*: Postal codes can be geocoded to locations- *Analysis opportunities*: Market area analysis, customer clustering, accessibility studies, delivery optimisation**Dataset B: Stock market prices over time** ❌ **NOT INHERENTLY SPATIAL**- *Why not*: Time series data without geographic component- *Could become spatial*: If you had stock exchange locations, regional economic indicators, or investor geography**Dataset C: Twitter sentiment with GPS coordinates** ✅ **SPATIAL**- *Spatial component*: GPS coordinates provide exact locations- *Analysis opportunities*: Sentiment hotspots, geographic opinion mapping, event location analysis**Dataset D: University enrolment by year** ❌ **NOT INHERENTLY SPATIAL**- *Why not*: Just temporal trends without location- *Could become spatial*: If you added student home addresses, university locations, or regional enrollment patterns**Dataset E: Air quality monitoring stations** ✅ **SPATIAL**- *Spatial component*: Monitoring stations have fixed locations- *Analysis opportunities*: Pollution interpolation, exposure mapping, environmental justice analysis:::### Vector vs. Raster Data ModelsSpatial data in R comes in two main forms: vector and raster. Vector data represents discrete objects such as points (e.g. hospital locations), lines (e.g. roads), or polygons (e.g. administrative boundaries). Raster data, by contrast, represents continuous surfaces such as temperature, elevation, or satellite imagery.```{r}#| label: data-models-demo# Vector data: Discrete features with precise boundariesprint("Vector data examples:")print(paste("Cities (points):", nrow(cities_sf), "features"))print(paste("Countries (polygons):", nrow(world), "features"))# Visualise vector datatm_shape(world) +tm_borders(col ="gray") +tm_shape(cities_sf) +tm_symbols(col ="red",size ="population",size.scale =tm_scale_continuous(values.scale =2),title.size ="Population" ) +tm_title("Vector Data: Discrete Features")``````{r}#| label: raster-demo# Raster data: Continuous grid of values# Create example elevation rasterelevation <-rast(ncols =100,nrows =100,xmin =-180,xmax =180,ymin =-90,ymax =90)# Simulate realistic elevation patternset.seed(123)coords <-crds(elevation)distances_to_mountains <-sqrt((coords[, 1] -0)^2+ (coords[, 2] -0)^2)values(elevation) <-pmax(0, 3000*exp(-distances_to_mountains /30) +rnorm(ncell(elevation), 0, 200))plot(elevation, main ="Raster Data: Continuous Grid Values")```::: callout-note## When to Use Vector vs. Raster**Vector data** is ideal for:- Administrative boundaries (countries, provinces)- Infrastructure (roads, buildings, utilities)- Point locations (cities, sample sites, customers)- Discrete phenomena with clear boundaries**Raster data** is ideal for:- Environmental variables (temperature, precipitation, elevation)- Satellite imagery and remote sensing data- Continuous phenomena across space- Modelling and analysis requiring regular grids:::### Coordinate Reference Systems (CRS)#### What is a CRS?A CRS defines how spatial data relates to real locations on Earth. Since Earth is round but maps are flat, we need mathematical transformations to project 3D coordinates onto 2D surfaces. CRS includes:- **Geographic CRS**: Uses latitude/longitude (e.g. WGS84, EPSG:4326)- **Projected CRS**: Uses a flat coordinate system (e.g. UTM, EPSG:32617)- **Local CRS**: Specific to a region (e.g. Statistics Canada Lambert, EPSG:3347)Using CRS correctly is crucial because: - It ensures spatial data aligns correctly on maps - It allows accurate distance and area calculations - It enables spatial operations like buffering, intersection, and joins```{r}#| label: crs-demo# Check the CRS of our datacat("World countries CRS:\n")st_crs(world)cat("\nCities CRS:\n")st_crs(cities_sf)``````{r}#| label: crs-importance# Demonstrate projection effectsworld_mercator <-st_transform(world, 3857) # Web Mercatorworld_mollweide <-st_transform(world, "+proj=moll") # Mollweide# Compare projectionsmercator_map <-tm_shape(world_mercator) +tm_borders() +tm_title("Web Mercator\n(distorts area)")mollweide_map <-tm_shape(world_mollweide) +tm_borders() +tm_title("Mollweide\n(preserves area)")tmap_arrange(mercator_map, mollweide_map, ncol =2)``````{r}#| label: when-transform# For distance calculations, use projected CRStoronto <- cities_sf[cities_sf$name =="Toronto", ]# Wrong: distance in degrees (meaningless)montreal <- cities_sf[cities_sf$name =="Montreal", ]dist_degrees <-st_distance(toronto, montreal)cat("Distance in degrees:", dist_degrees, "\n")# Right: distance in meterstoronto_utm <-st_transform(toronto, 32617) # UTM Zone 17Nmontreal_utm <-st_transform(montreal, 32617)dist_meters <-st_distance(toronto_utm, montreal_utm)cat("Distance in meters:", round(as.numeric(dist_meters) /1000), "km\n")```::: callout-note## Key CRS considerations### When to use different CRS- 4326 (WGS84): Global data, web mapping, data storage- Projected CRS: Measurements, area calculations, buffers- Local CRS: Region-specific analysis for maximum accuracy### Common transformations- st_transform(data, 4326) - Convert to lat/lng- st_transform(data, 3857) - Convert to Web Mercator- st_transform(data, 32617) - Convert to UTM (Eastern North America):::------------------------------------------------------------------------## Mapping### The Grammar of MapsJust as `ggplot2` uses a grammar of graphics, `tmap` provides a grammar of maps with consistent syntax for different map elements. This allows you to build complex maps step by step, adding layers and customising styles in a structured way.### Key Functions- `tm_shape()`: Defines the spatial data to be plotted- `tm_polygons()`: Adds polygon layers (e.g. countries, regions)- `tm_lines()`: Adds line layers (e.g. roads, rivers)- `tm_symbols()`: Adds point layers (e.g. cities, landmarks)- `tm_text()`: Adds text labels to features- `tm_layout()`: Customises overall map appearance (title, legend, etc.)### Basic Mapping**Objective**: Create your first spatial visualisation```{r}#| label: basic-mapping# Start simpleplot(world["pop_est"]) # Base R quick plot# Better with tmaptm_shape(world) +tm_polygons("pop_est",fill.legend =tm_legend("Population"),fill.scale =tm_scale("viridis") ) +tm_title("World Population 2020")```### Mapping Challenge**Choose your experience level:**::: panel-tabset## Novice**Objective**: Create a choropleth map with custom styling```{r}#| label: novice-map# Create choropleth map of GDPtm_shape(world) +tm_polygons(fill ="gdp_md",fill.scale =tm_scale_intervals(values ="brewer.yl_or_rd", style ="quantile"),fill.legend =tm_legend(title ="GDP (millions USD)") ) +tm_title("World GDP Distribution", position =tm_pos_out("center", "top"))```**Key Concepts Learned**:- `tm_shape()` + `tm_polygons()` syntax- Classification methods (`style = "quantile"`)- Color palettes (`palette = "YlOrRd"`) - Layout customisation## Intermediate**Objective**: Add multiple layers and custom styling```{r}#| label: intermediate-map# Enhanced map with multiple elementstm_shape(world) +tm_polygons(fill ="gdp_md",fill.scale =tm_scale_intervals(values ="-brewer.rd_yl_bu",midpoint =NA,style ="fisher",n =5 ),fill.legend =tm_legend(title ="GDP (millions USD)"),col ="white",lwd =0.5 ) +tm_shape(cities_sf) +tm_symbols(size ="population",size.scale =tm_scale_continuous(values.scale =2),size.legend =tm_legend(title ="City Population"),fill ="red",fill_alpha =0.7,col ="white",lwd =0.3 ) +tm_text(text ="name", ymod =1.2, size =0.7) +tm_title("World GDP with Major Cities") +tm_layout(frame =FALSE,legend.position =c("left", "bottom"))```**Key Concepts Learned**:- Multiple shape layers- Different classification methods- Symbol scaling by attributes- Text labels and positioning## Advanced**Objective**: Create faceted maps with complex styling```{r}#| label: advanced-map#| fig-height: 8# Create continental analysisworld_analysis <- world %>% dplyr::filter(!is.na(gdp_md),!is.na(continent), continent !="Seven seas (open ocean)" ) %>% dplyr::mutate(continent = dplyr::case_when( continent =="North America"~"N. America", continent =="South America"~"S. America",TRUE~ continent ),gdp_per_capita = (gdp_md *1e6) / pop_est )# Faceted map by continent (v4)tm_shape(world_analysis) +tm_polygons(fill ="gdp_per_capita",fill.scale =tm_scale_intervals(values ="viridis",style ="fisher" ),fill.legend =tm_legend(title ="GDP per capita\n(USD)"),col ="white"# (v4: border.col -> col) ) +tm_facets(by ="continent", free.coords =TRUE, ncol =3) +tm_title("GDP per capita by continent") +tm_layout(panel.label.size =1.2,panel.label.bg.color ="black",panel.label.color ="white" )# Summary statistics (unchanged)gdp_summary <- world_analysis %>% sf::st_drop_geometry() %>% dplyr::group_by(continent) %>% dplyr::summarise(countries = dplyr::n(),mean_gdp_pc =mean(gdp_per_capita, na.rm =TRUE),median_gdp_pc =median(gdp_per_capita, na.rm =TRUE),.groups ="drop" ) %>% dplyr::arrange(dplyr::desc(mean_gdp_pc))print(gdp_summary)```**Key Concepts Learned**:- Data transformation for mapping- Faceted maps for comparative analysis- Advanced layout customisation- Integrating statistical summaries:::------------------------------------------------------------------------## Spatial OperationsWe'll focus on three fundamental spatial operations that solve most spatial analysis problems:1. **Buffer**: "What's within X distance?"2. **Intersect**: "What overlaps with what?"3. **Spatial Join**: "How can we connect spatial features based on location?"These operations allow us to manipulate and analyse spatial data effectively, answering questions about proximity, overlap, and relationships between different spatial features.```{r}#| label: spatial-operations-demo# Load Canadian administrative boundariescanada <- world[world$name =="Canada", ]provinces <-ne_states(country ="canada", returnclass ="sf")# 1. BUFFERS - Create 100km zones around citiescities_utm <-st_transform(cities_sf, crs =3347) # Project to UTM for accurate distancescity_buffers <-st_buffer(cities_utm, dist =100000) # 100km# 2. INTERSECTIONS - Find which provinces contain citiescities_provinces <-st_intersection(cities_sf, provinces)# 3. SPATIAL JOINS - Add province information to citiescities_with_provinces <-st_join(cities_sf, provinces)# Visualise resultstm_shape(provinces) +tm_borders(col ="gray") +tm_shape(st_transform(city_buffers, 4326)) +tm_borders(col ="blue", lwd =2) +tm_fill(col ="blue", fill_alpha =0.2) +tm_shape(cities_sf) +tm_symbols(col ="red", size =1) +tm_text("name", ymod =1.2) +tm_title("Cities with 100km Buffers")```## Spatial RelationshipsSpatial predicates are functions that test relationships between geometries, returning TRUE or FALSE. Think of them as questions you can ask about spatial features: "Does this city lie within this county?" or "Do these roads intersect?" These simple questions become powerful analytical tools when applied systematically across datasets.In R, we can test these relationships using functions from the `sf` package. Here are some common spatial predicates:- `st_intersects()`: "Do these features overlap in any way?" (broadest, most commonly used)- `st_within()`: "Is feature A completely inside feature B?" (strict containment)- `st_contains()`: Reverse of within - "Does feature A completely contain feature B?"- `st_touches()`: "Do features share a boundary but not interior space?" (adjacent polygons)- `st_crosses()`: "Do lines intersect?" (roads crossing administrative boundaries)- `st_overlaps()`: "Do features share some but not all space?" (partially overlapping regions)### Spatial Predicates**Objective**: Learn to test and understand spatial relationships```{r}#| label: spatial-predicates# Load Canadian provinces for relationship testingprovinces <-ne_states(country ="canada", returnclass ="sf")# Test different spatial relationshipscat("Spatial Relationship Tests:\n")# Which cities are within provinces?within_results <-st_within(cities_sf, provinces)cat("Cities within provinces:\n")for (i in1:nrow(cities_sf)) { province_idx <- within_results[[i]]if (length(province_idx) >0) {cat(" ", cities_sf$name[i], "is within", provinces$name[province_idx], "\n") } else {cat(" ", cities_sf$name[i],"is not within any province (likely boundary/water)\n") }}# Test intersections (broader than within)intersects_results <-st_intersects(cities_sf, provinces)cat("\nIntersection test (should match within for points):\n")cat("Cities intersecting provinces:",sum(lengths(intersects_results) >0),"out of",nrow(cities_sf),"\n")# Create buffers to test more relationshipscities_projected <-st_transform(cities_sf, 3347) # Canada Lambertcity_buffers <-st_buffer(cities_projected, dist =50000) # 50km bufferscity_buffers_geo <-st_transform(city_buffers, 4326)# Test overlaps between buffersoverlap_matrix <-st_overlaps(city_buffers_geo)cat("\nCity buffer overlaps:\n")for (i in1:nrow(cities_sf)) { overlapping <- overlap_matrix[[i]]if (length(overlapping) >0) { overlapping_names <- cities_sf$name[overlapping]cat(" ", cities_sf$name[i],"buffer overlaps with:",paste(overlapping_names, collapse =", "),"\n" ) }}# Visualize relationshipstm_shape(provinces) +tm_borders(col ="gray") +tm_shape(city_buffers_geo) +tm_borders(col ="blue", lwd =2) +tm_fill(col ="blue", fill_alpha =0.2) +tm_shape(cities_sf) +tm_symbols(col ="red", size =1) +tm_text("name", ymod =1.2) +tm_title("Spatial Relationships: Cities, Buffers, and Provinces")```### CRS in Spatial OperationsCoordinate Reference Systems (CRS) are crucial for spatial operations. If your data is in geographic coordinates (degrees), operations like buffering or distance calculations will yield incorrect results. Always project your data to an appropriate CRS before performing spatial operations.```{r}#| label: crs-effects# Demonstrate CRS effects on bufferstoronto <- cities_sf[cities_sf$name =="Toronto", ]# Wrong: Buffer in geographic coordinates (degrees)buffer_degrees <-st_buffer(toronto, dist =0.1) # 0.1 degreesarea_degrees <-st_area(buffer_degrees)# Right: Buffer in projected coordinates (meters)toronto_projected <-st_transform(toronto, 3347) # Statistics Canada Lambertbuffer_meters <-st_buffer(toronto_projected, dist =11000) # 11km (approximately 0.1 degrees at Toronto latitude)area_meters <-st_area(buffer_meters)# Transform back for comparisonbuffer_meters_geo <-st_transform(buffer_meters, 4326)cat("Buffer Comparison:\n")cat("Geographic buffer area:",round(as.numeric(area_degrees), 2),"square degrees\n")cat("Projected buffer area:",round(as.numeric(area_meters) /1000000, 2),"square kilometers\n")# Visualize the differencecomparison_map <-tm_shape(buffer_degrees) +tm_borders(col ="red", lwd =3) +tm_fill(fill ="red", fill_alpha =0.3) +tm_shape(buffer_meters_geo) +tm_borders(col ="blue", lwd =3) +tm_fill(fill ="blue", fill_alpha =0.3) +tm_shape(toronto) +tm_symbols(col ="black", size =2) +tm_title("Buffer Comparison: Geographic (red) vs Projected (blue)") +tm_layout(legend.show =FALSE)print(comparison_map)# Distance calculation comparisonmontreal <- cities_sf[cities_sf$name =="Montreal", ]# Geographic distance (ellipsoidal - accurate)dist_geographic <-st_distance(toronto, montreal)# Projected distancemontreal_projected <-st_transform(montreal, 3347)dist_projected <-st_distance(toronto_projected, montreal_projected)cat("\nDistance Calculations:\n")cat("Geographic CRS distance:", round(as.numeric(dist_geographic) /1000, 1), "km\n")cat("Projected CRS distance:", round(as.numeric(dist_projected) /1000, 1), "km\n")cat("Difference:", round(abs(as.numeric(dist_geographic) -as.numeric(dist_projected)) /1000, 1), "km\n")```### Real-World Problem Solving**Scenario**: A coffee chain wants to expand in Canada while avoiding oversaturated markets.**Business Rules**:- Don't locate within 25km of existing competitors- Prioritise high-population areas- Consider demographic and economic factors::: panel-tabset## Brief**Your Tasks**:1. **Novice**: - Create exclusion zones around existing competitors - Identify cities outside these zones - Visualise suitable locations2. **Intermediate**: - Complete novice tasks - Calculate market opportunity metrics - Rank locations by attractiveness3. **Advanced**: - Complete intermediate tasks - Build multi-criteria decision model - Optimise network of new locations**Data Setup**:```{r}#| label: coffee-setup# Simulate existing competitor locationsset.seed(456)competitors <- cities_sf[sample(nrow(cities_sf), 3), ] %>%mutate(brand ="Competitor Coffee")print("Existing competitor locations:")print(competitors[c("name", "brand")])```## Novice**Step 1: Create Exclusion Zones**```{r}#| label: novice-coffee# Project to appropriate CRS for Canada (meters)canada_crs <-3347# Statistics Canada Lambertcities_proj <- sf::st_transform(cities_sf, canada_crs)competitors_proj <- sf::st_transform(competitors, canada_crs)# Create 25 km exclusion zones (in metres)exclusion_zones <- sf::st_buffer(competitors_proj, dist =25000)# Visualise exclusion zonestm_shape(st_as_sfc(st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326))) +tm_polygons(col =NA, fill_alpha =0) +tm_shape(st_transform(exclusion_zones, 4326)) +tm_borders(col ="red", lwd =2) +tm_fill(fill ="red", fill.alpha =0.3) +tm_shape(cities_sf) +tm_symbols(size ="population",size.scale =tm_scale_continuous(values.scale =2),size.legend =tm_legend(title ="Population"),fill ="blue",col ="white",lwd =0.5 ) +tm_shape(competitors) +tm_symbols(size =1.5,shape =4,col ="red") +tm_title("Coffee Shop Expansion: Exclusion Zones") +tm_view(bbox =st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326)) ```**Step 2: Find Available Cities**```{r}#| label: available-cities# Find cities outside exclusion zonesexclusion_union <-st_union(exclusion_zones)available_cities <-st_difference(cities_proj, exclusion_union)cat("Cities available for expansion:", nrow(available_cities))cat("\nCities excluded:", nrow(cities_proj) -nrow(available_cities))# Show available citiesavailable_cities_wgs84 <-st_transform(available_cities, 4326)print(available_cities_wgs84[c("name", "population")])```## Intermediate**Step 3: Market Opportunity Analysis**```{r}#| label: intermediate-coffee# Calculate distance to nearest competitor for each available citydistances_to_competitors <-st_distance(available_cities, competitors_proj)min_competitor_distance <-apply(distances_to_competitors, 1, min)# Create market opportunity metricsavailable_cities <- available_cities %>%mutate(distance_to_competitor =as.numeric(min_competitor_distance),market_opportunity = population / (distance_to_competitor /1000), # Pop per kmopportunity_rank =rank(-market_opportunity) ) %>%arrange(opportunity_rank)# Show rankingsmarket_analysis <- available_cities %>%st_drop_geometry() %>%select(name, population, distance_to_competitor, market_opportunity, opportunity_rank) %>%mutate(distance_to_competitor =round(distance_to_competitor/1000, 1))print("Market Opportunity Rankings:")print(market_analysis)```**Step 4: Visualization with Rankings**```{r}#| label: opportunity-map# Create opportunity maptm_shape(st_as_sfc(st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326))) +tm_polygons(col =NA, fill_alpha =0) +tm_shape(sf::st_transform(available_cities, 4326)) +tm_symbols(fill ="market_opportunity",fill.scale =tm_scale_continuous(values ="viridis"),fill.legend =tm_legend(title ="Market\nOpportunity"),size ="population",size.scale =tm_scale_continuous(values.scale =3),size.legend =tm_legend(title ="Population"),col ="white",lwd =0.3 ) +tm_shape(competitors) +tm_symbols(col ="red",size =2,shape =4) +tm_title("Coffee Chain Expansion Opportunities") +tm_view(bbox =st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326))```## Advanced**Step 5: Multi-Criteria Decision Analysis**```{r}#| label: advanced-coffee# Simulate additional business criteriaset.seed(789)available_cities_enhanced <- available_cities %>%mutate(# Simulated demographic/economic datamedian_income =rnorm(n(), 75000, 15000),young_adult_pct =runif(n(), 15, 35), # % population 20-35business_density =rpois(n(), 25),foot_traffic_index =runif(n(), 0.3, 1.0),# Normalise all criteria to 0-1 scalepop_score = scales::rescale(population),income_score = scales::rescale(median_income),youth_score = scales::rescale(young_adult_pct),business_score = scales::rescale(business_density),traffic_score = scales::rescale(foot_traffic_index),distance_score = scales::rescale(distance_to_competitor),# Weighted composite score (weights sum to 1.0)composite_score = ( pop_score *0.25+# Population weight income_score *0.20+# Income weight youth_score *0.20+# Youth demographic weight business_score *0.15+# Business environment weight traffic_score *0.15+# Foot traffic weight distance_score *0.05# Distance from competitors weight ) ) %>%arrange(desc(composite_score))# Show comprehensive analysisdecision_matrix <- available_cities_enhanced %>%st_drop_geometry() %>%select(name, population, median_income, young_adult_pct, business_density, composite_score) %>%mutate(rank =row_number(),median_income =round(median_income),young_adult_pct =round(young_adult_pct, 1),composite_score =round(composite_score, 3) )print("Multi-Criteria Decision Analysis:")print(decision_matrix)```**Step 6: Network Optimisation**```{r}#| label: network-optimization# Simple greedy optimisation for selecting multiple locationsoptimize_network <-function(candidates, n_stores =3, min_distance =50000) {if(nrow(candidates) ==0) return(candidates[0, ])# Start with highest-scoring location selected <- candidates[1, ] remaining <- candidates[-1, ]while(nrow(selected) < n_stores &&nrow(remaining) >0) {# Calculate distances from remaining to all selected distances <-st_distance(remaining, selected) min_distances <-apply(distances, 1, min)# Filter candidates far enough from existing stores valid_candidates <- remaining[min_distances > min_distance, ]if(nrow(valid_candidates) >0) {# Select highest-scoring valid candidate next_store <- valid_candidates[1, ] selected <-rbind(selected, next_store) remaining <- remaining[!remaining$name %in% next_store$name, ] } else {break } }return(selected)}# Find optimal network of 3 new storesoptimal_network <-optimize_network(available_cities_enhanced, n_stores =3)print("Optimal Network Strategy:")print(optimal_network %>%st_drop_geometry() %>%select(name, population, composite_score) %>%mutate(composite_score =round(composite_score, 3)))# Visualise final strategytm_shape(st_as_sfc(st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326))) +tm_polygons(col =NA, fill_alpha =0) +tm_shape(st_transform(optimal_network, 4326)) +tm_dots(col ="green",size =2,shape =19) +tm_text("name",ymod =1.5,col ="green",size =1.2) +tm_shape(competitors) +tm_dots(col ="red",size =2,shape =4) +tm_text("name", ymod =-1.5, col ="red") +tm_title("Optimal Coffee Chain Network Strategy") +tm_layout(legend.show =FALSE) +tm_view(bbox =st_bbox(c(xmin =-160,xmax =-40,ymin =35,ymax =85 ), crs =4326))```:::------------------------------------------------------------------------## Raster AnalysisRaster data represents the world as a continuous surface divided into a grid of cells, or pixels, each of which stores a value. This is different from vector data, which represents discrete objects such as points, lines, and polygons. Rasters are best suited to variables that change gradually across space, such as elevation, rainfall, temperature, vegetation, or remotely sensed imagery from satellites.Each raster has two key dimensions:- **Resolution**, which refers to the size of each cell. High-resolution rasters have small cells and show more detail, while low-resolution rasters have larger cells and generalise patterns.- **Extent**, which refers to the geographic area the raster covers.Rasters can be single-layer, where each cell stores one value, or multi-layer, where each cell contains a stack of values representing different variables or time points. For example, a single raster might store average temperature for July 2024, while a multi-layer raster could store monthly averages for an entire year.In practice, rasters are extremely powerful because they allow us to represent continuous phenomena, run cell-based calculations, and combine different surfaces through map algebra. For example, you might calculate slope from an elevation raster, estimate average rainfall within administrative boundaries, or overlay pollution and population rasters to identify at-risk areas.### Working with Elevation Data```{r}#| label: raster-basics# Create realistic elevation raster for North Americaset.seed(2024)elevation <-rast(ncols =200, nrows =200, xmin =-140, xmax =-60, ymin =25, ymax =70)# Simulate mountain ranges (Rockies, Appalachians)coords <-crds(elevation)mountain_ranges <-matrix(c(-115, 50, # Canadian Rockies-110, 45, # US Rockies -105, 40, # Colorado Plateau-80, 45, # Appalachians North-85, 35# Appalachians South), ncol =2, byrow =TRUE)# Calculate elevation based on distance to mountain rangeselevation_values <-numeric(ncell(elevation))for(i in1:nrow(mountain_ranges)) { center_x <- mountain_ranges[i, 1] center_y <- mountain_ranges[i, 2] distances <-sqrt((coords[,1] - center_x)^2+ (coords[,2] - center_y)^2) elevation_values <- elevation_values +pmax(0, 3500*exp(-distances/8) +rnorm(length(distances), 0, 150))}values(elevation) <-pmax(0, elevation_values)# Basic visualisationplot(elevation, main ="North American Elevation (simulated)")```### Environmental Analysis Challenge**Scenario**: Analyse environmental conditions for renewable energy development::: panel-tabset## Brief**Research Question**: Where are the best locations for wind and solar energy development?**Criteria**:- **Wind energy**: High elevation (\>1000m), moderate slope (\<20°), avoid extreme elevations (\>3000m)- **Solar energy**: Moderate elevation (200-1500m), gentle slope (\<10°), avoid very high latitudes**Tasks**:1. **All levels**: Calculate slope from elevation2. **All levels**: Create suitability maps for wind and solar3. **Intermediate+**: Combine criteria and rank suitability4. **Advanced**: Calculate potential energy capacity by region## Solution**Step 1: Calculate Terrain Derivatives**```{r}#| label: terrain-analysis# Calculate slope from elevationslope <-terrain(elevation, "slope", unit ="degrees")# Basic visualisationplot(slope, main ="Slope (degrees)")```**Step 2: Define Suitability Criteria**```{r}#| label: suitability-criteria# Wind energy suitabilitywind_elevation_ok <- (elevation >1000) & (elevation <3000)wind_slope_ok <- slope <20wind_suitable <- wind_elevation_ok & wind_slope_ok# Solar energy suitability solar_elevation_ok <- (elevation >200) & (elevation <1500)solar_slope_ok <- slope <10solar_suitable <- solar_elevation_ok & solar_slope_ok# Create combined suitability indexsuitability_index <- wind_suitable *1+ solar_suitable *2# Values: 0 = unsuitable, 1 = wind only, 2 = solar only, 3 = both# Clean up - set unsuitable areas to NA for cleaner visualisationsuitability_index[suitability_index ==0] <-NA# Make it categorical with labels so plot() can show a labeled legendsuitability_factor <-as.factor(suitability_index)levels(suitability_factor) <-data.frame(ID =c(1, 2, 3),label =c("Wind Only", "Solar Only", "Both"))plot(suitability_factor, main ="Renewable Energy Suitability",col =c("yellow", "blue", "green"),legend =TRUE)```**Step 3: Extract Values at City Locations**```{r}#| label: extract-values# Extract elevation and slope at city locationscity_elevation <-extract(elevation, cities_sf)city_slope <-extract(slope, cities_sf)city_suitability <-extract(suitability_index, cities_sf)# Combine with city datacities_environmental <- cities_sf %>%mutate(elevation_m = city_elevation[,2],slope_deg = city_slope[,2],suitability = city_suitability[,2],suitability_type =case_when(is.na(suitability) ~"Unsuitable", suitability ==1~"Wind Only", suitability ==2~"Solar Only", suitability ==3~"Both Suitable",TRUE~"Unknown" ) )# Show resultsenvironmental_summary <- cities_environmental %>%st_drop_geometry() %>%select(name, elevation_m, slope_deg, suitability_type) %>%mutate(elevation_m =round(elevation_m),slope_deg =round(slope_deg, 1) )print("Environmental Conditions at Major Cities:")print(environmental_summary)```**Step 4: Advanced Analysis - Regional Capacity**```{r}#| label: regional-capacity# Calculate suitable area by energy typecell_area <-cellSize(elevation, unit ="km") # Wind suitable areawind_area <-mask(cell_area, wind_suitable, maskvalue =FALSE)total_wind_area <-global(wind_area, "sum", na.rm =TRUE)# Solar suitable area solar_area <-mask(cell_area, solar_suitable, maskvalue =FALSE)total_solar_area <-global(solar_area, "sum", na.rm =TRUE)# Combined visualisation with citiestm_shape(suitability_index) +tm_raster(col.scale =tm_scale_categorical(values =c("gold", "blue", "darkgreen"),labels =c("Wind", "Solar", "Both") ),col.legend =tm_legend(title ="Energy Type") ) +tm_shape(cities_sf) +tm_symbols(fill ="white", col ="black", size =0.8) +# border.col -> coltm_text("name", size =0.6, ymod =1.2) +tm_title("Renewable Energy Potential in Canada")cat("Renewable Energy Development Potential:\n")cat("Wind suitable area:", round(total_wind_area$sum), "km²\n") cat("Solar suitable area:", round(total_solar_area$sum), "km²\n")```:::------------------------------------------------------------------------## Creating Maps for Communication### From Analysis to CommunicationThe best spatial analysis is meaningless if you can't communicate your findings effectively. This section focuses on the transition from exploratory analysis to polished communication.### Three Approaches to Map Communication**Objective**: Learn to match visualisation approach to communication goals::: panel-tabset## Story Maps**Purpose**: Guide viewers through a specific narrative or argument```{r}#| label: story-map# Create a story about urbanisationworld_urban <- world %>%filter(!is.na(pop_est)) %>%mutate(pop_category =case_when( pop_est <5e6~"Small (<5M)", pop_est <25e6~"Medium (5-25M)", pop_est <100e6~"Large (25-100M)",TRUE~"Mega (>100M)" ),pop_category =factor( pop_category,levels =c("Small (<5M)","Medium (5-25M)","Large (25-100M)","Mega (>100M)" ) ) )# Story map with clear narrativestory_map <-tm_shape(world_urban) +tm_polygons(fill ="pop_category",fill.scale =tm_scale_categorical(values =c("lightblue", "yellow", "orange", "red")),fill.legend =tm_legend(title ="Population Category") ) +tm_title("The Growing World: Countries by Population Size") +tm_layout(title.size =1.4,legend.position =c("left", "bottom"),frame =FALSE ) +tm_credits("Four countries now exceed 100M people: China, India, USA, Indonesia",tm_pos_out =c("center", "bottom"),size =0.8 )print(story_map)```**Story Map Principles**:- Clear, descriptive title tells the main message- Color scheme supports the narrative- Caption provides key insight- Simple legend that's easy to interpret## Interactive Dashboards**Purpose**: Let users explore data themselves```{r}#| label: interactive-dashboardpal_pop <-colorNumeric(palette ="viridis", domain = world_urban$pop_est)interactive_dashboard <-leaflet(options =leafletOptions(worldCopyJump =FALSE, continuousWorld =FALSE)) %>%addTiles(options =tileOptions(noWrap =TRUE)) %>%# Polygons coloured by population estimateaddPolygons(data = world_urban,fillColor =~pal_pop(pop_est),weight =1,opacity =1,color ="white",fillOpacity =0.7,highlightOptions =highlightOptions(weight =2,color ="black",bringToFront =TRUE ),label =~paste("Population:", pop_est),group ="Population" ) %>%# Points for major citiesaddCircleMarkers(data = cities_sf,radius =~sqrt(population) /100,# same scaling as yoursfillColor ="blue",fillOpacity =0.7,stroke =FALSE,label =~paste(name, "<br>", "Population:", population),group ="Major Cities" ) %>%# Legend (reuse the same palette)addLegend(pal = pal_pop,values = world_urban$pop_est,title ="Population Estimate",position ="bottomright" ) %>%# Layer controladdLayersControl(overlayGroups =c("Population", "Major Cities"),options =layersControlOptions(collapsed =FALSE) )interactive_dashboard %>%setView(lng =0, lat =20, zoom =2)```**Dashboard Principles**:- User controls for exploration- Multiple layers for comparison- Tooltips with detailed information- Pan and zoom functionality## Publication Figures**Purpose**: Static, high-quality graphics for reports/papers```{r}#| label: publication-figure#| fig-width: 10#| fig-height: 6# Publication-ready figure with ggplot2publication_map <-ggplot(world_urban) +geom_sf(aes(fill =log10(pop_est)), color ="white", size =0.1) +scale_fill_viridis_c(name ="Population\n(log scale)",na.value ="grey90",labels =function(x) { formatted <-paste0("10^", round(x, 1))# Convert scientific notation to proper formatcase_when( x <=6~"< 1M", x <=7~"1-10M", x <=8~"10-100M",TRUE~"> 100M" ) } ) +theme_void() +theme(legend.position ="bottom",legend.key.width =unit(1.5, "cm"),plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =11),plot.caption =element_text(hjust =1, size =9),legend.title =element_text(size =10),legend.text =element_text(size =9) ) +labs(title ="Global Population Distribution by Country",subtitle ="Population estimates for 2020, showing the concentration of people in Asia",caption ="Data: Natural Earth | Analysis: R with sf and ggplot2" )print(publication_map)```**Publication Figure Principles**:- High resolution and clean typography- Informative title and subtitle- Professional color scheme- Proper attribution and sourcing- Suitable for black-and-white printing if needed:::### Design Your Signature Map**Your Task**: Choose a dataset relevant to your field and create a map that:1. Tells a clear story about a spatial pattern2. Uses appropriate symbology for your data type3. Includes proper context (title, legend, attribution)4. Matches your audience (academic, public, policy-makers)::: callout-tip## Design Checklist- [ ] Clear title that states the main message- [ ] Appropriate color scheme for your data and audience- [ ] Readable legend with meaningful labels- [ ] Proper attribution for data sources- [ ] Clean layout without clutter- [ ] Consistent style throughout the visualisation:::------------------------------------------------------------------------## Spatial Statistics::: callout-note## Moving Beyond MapsWhile maps are essential for exploring spatial data, spatial statistics provides rigorous methods for testing hypotheses, quantifying patterns, and making predictions about spatial processes.:::### Spatial AutocorrelationOne of the most important principles in spatial analysis is that “near things are more related than distant things.” This idea, sometimes called Tobler’s First Law of Geography, means that values measured in neighbouring areas are often more similar than would be expected if location had no influence. For example, neighbouring districts might share similar levels of air pollution, adjacent farmland might have comparable soil fertility, and houses on the same street might have similar market values.This phenomenon known as spatial autocorrelation is critical to detect and account for, because it violates the assumption of independence that underlies many standard statistical techniques. If we ignore spatial autocorrelation, our results may be misleading: we might overstate the significance of a relationship, or fail to capture the true structure of the data.R provides tools to formally test for spatial autocorrelation. A widely used measure is Moran’s I, which summarises whether high (or low) values cluster together across space more than expected by chance. A positive Moran’s I indicates clustering of similar values, while a negative Moran’s I suggests a checkerboard pattern where high and low values are interspersed. A value close to zero implies little or no spatial autocorrelation.#### Economic Clustering Analysis::: panel-tabset## Brief**Research Question**: Do countries with similar economic development tend to be neighbors?**Method**: Moran's I test for spatial autocorrelation**Your Tasks**:1. Calculate GDP per capita for each country2. Define spatial neighborhoods (which countries are "neighbors")3. Test for spatial autocorrelation using Moran's I4. Interpret results and identify local clusters## Solution```{r}#| label: spatial-autocorrelation#| message: false# Load spatial statistics packagelibrary(spdep)# Prepare economic dataworld_econ <- world %>%filter(!is.na(gdp_md), !is.na(pop_est)) %>%mutate(gdp_per_capita = (gdp_md *1000000) / pop_est) %>%filter(gdp_per_capita >0)# Create spatial weights matrix (queen contiguity)world_nb <-poly2nb(world_econ, queen =TRUE)world_weights <-nb2listw(world_nb, style ="W", zero.policy =TRUE)# Calculate Global Moran's Imoran_result <-moran.test(world_econ$gdp_per_capita, world_weights, zero.policy =TRUE)cat("Global Moran's I Results:\n")cat("Moran's I statistic:", round(moran_result$estimate[1], 4), "\n")cat("Expected value (random):",round(moran_result$estimate[2], 4),"\n")cat("P-value:",format(moran_result$p.value, scientific =TRUE),"\n\n")# Interpretationif (moran_result$p.value <0.05) {if (moran_result$estimate[1] >0) {cat("Interpretation: Significant positive spatial autocorrelation\n")cat("Countries with similar GDP per capita cluster together.\n") }} else {cat("Interpretation: No significant spatial pattern detected.\n")}```**Local Indicators of Spatial Association (LISA)**```{r}#| label: lisa-analysis# Calculate Local Moran's I (LISA)lisa_result <-localmoran(world_econ$gdp_per_capita, world_weights, zero.policy =TRUE)# Classify clustersworld_econ$lisa_i <- lisa_result[, 1]world_econ$lisa_p <- lisa_result[, 5]world_econ$cluster_type <-case_when( world_econ$lisa_p >0.05~"Not Significant", world_econ$lisa_i >0& world_econ$gdp_per_capita >mean(world_econ$gdp_per_capita) ~"High-High", world_econ$lisa_i >0& world_econ$gdp_per_capita <mean(world_econ$gdp_per_capita) ~"Low-Low", world_econ$lisa_i <0& world_econ$gdp_per_capita >mean(world_econ$gdp_per_capita) ~"High-Low", world_econ$lisa_i <0& world_econ$gdp_per_capita <mean(world_econ$gdp_per_capita) ~"Low-High",TRUE~"Not Significant")# Visualise clusterstm_shape(world_econ) +tm_polygons(fill ="cluster_type",fill.scale =tm_scale_categorical(values =c("red", "blue", "lightblue", "pink", "white" )),fill.legend =tm_legend(title ="Economic Clusters") ) +tm_title("Local Economic Clustering (LISA)")# Summarycluster_summary <- world_econ %>%st_drop_geometry() %>%count(cluster_type)print("Cluster Summary:")print(cluster_summary)```:::### Hotspot AnalysisHotspot analysis is used to identify areas where unusually high or low values of a variable cluster together in space. Instead of just asking whether values are spatially autocorrelated overall, as with Moran’s I, hotspot analysis highlights where those clusters occur. This makes it especially useful for applications in public health, criminology, environmental science, and urban planning.#### Urban Crime Patterns::: panel-tabset## Brief**Scenario**: Analyse crime patterns to optimise police resource allocation**Data**: Simulated crime incidents in an urban area## Solution```{r}#| label: crime-hotspots# Create simulated crime dataset.seed(2024)n_crimes <-250# Create crime hotspot centershotspot_centers <-data.frame(x =c(-79.42, -79.38, -79.45),y =c(43.68, 43.65, 43.71),intensity =c(0.3, 0.3, 0.2))# Generate crime points around hotspotscrime_data <-data.frame()for (i in1:nrow(hotspot_centers)) { n_points <-round(n_crimes * hotspot_centers$intensity[i]) x_coords <-rnorm(n_points, hotspot_centers$x[i], 0.008) y_coords <-rnorm(n_points, hotspot_centers$y[i], 0.008) crime_data <-rbind(crime_data, data.frame(x = x_coords, y = y_coords, hotspot = i))}# Add random background crimesn_background <- n_crimes -nrow(crime_data)if (n_background >0) {# Fixed: this line was cut off background_crimes <-data.frame(x =runif(n_background, -79.5, -79.3),y =runif(n_background, 43.6, 43.75),hotspot =0 ) crime_data <-rbind(crime_data, background_crimes)}# Convert to spatial datacrime_sf <-st_as_sf(crime_data, coords =c("x", "y"), crs =4326)# Basic visualisationtm_shape(crime_sf) +tm_symbols(size =0.2, fill_alpha =0.6) +tm_title("Crime Incident Locations")```**Kernel Density Analysis**```{r}#| label: kernel-density# Create density surface (simplified approach)crime_utm <-st_transform(crime_sf, 32617) # UTM for Toronto# Create analysis gridbbox <-st_bbox(crime_utm)grid_size <-50grid_x <-seq(bbox[1], bbox[3], length.out = grid_size)grid_y <-seq(bbox[2], bbox[4], length.out = grid_size)# Calculate density using distance-based weightingbandwidth <-200# 200 meterscrime_coords <-st_coordinates(crime_utm)density_matrix <-matrix(0, nrow = grid_size, ncol = grid_size)for (i in1:length(grid_x)) {for (j in1:length(grid_y)) { distances <-sqrt((crime_coords[, 1] - grid_x[i])^2+ (crime_coords[, 2] - grid_y[j])^2) weights <-exp(-0.5* (distances / bandwidth)^2) density_matrix[j, i] <-sum(weights) }}# Create rasterdensity_raster <-rast(ncols = grid_size,nrows = grid_size,xmin = bbox[1],xmax = bbox[3],ymin = bbox[2],ymax = bbox[4],crs ="EPSG:32617")values(density_raster) <-as.vector(density_matrix)# Transform back for visualisationdensity_wgs84 <-project(density_raster, "EPSG:4326")# Visualise hotspotstm_shape(density_wgs84) +tm_raster(col.scale =tm_scale_continuous(values ="brewer.reds"),col.legend =tm_legend(title ="Crime Density"),col_alpha =0.8 ) +tm_shape(crime_sf) +tm_symbols(size =0.1, fill_alpha =0.4) +tm_title("Crime Hotspot Analysis")```:::### Spatial InterpolationOften we only have data at a limited number of locations, such as rainfall measured at weather stations or pollutant concentrations measured at air monitors. Spatial interpolation refers to the set of methods used to estimate values at unsampled locations based on the values observed at nearby sites. The goal is to create a continuous surface from discrete points, filling in the gaps between measurements.Inverse distance weighting (IDW) estimates values as a weighted average of surrounding points, where nearer points contribute more than distant ones. This produces smoother surfaces and is intuitive, though it does not model spatial processes explicitly.Kriging is a geostatistical method that uses both the distance and the overall spatial autocorrelation structure of the data, modelled through a variogram. Kriging not only produces predictions but also provides an estimate of uncertainty at each location.#### Environmental Monitoring**Objective**: Predict air quality at unmeasured locations::: panel-tabset## Brief**Problem**: Air quality monitoring stations are expensive and sparse. How can we estimate pollution levels across a region?**Methods**:- Inverse Distance Weighting (IDW): simple, intuitive- Kriging: optimal prediction with uncertainty estimates**Application**: Create continuous pollution surface from point measurements## Solution```{r}#| label: air-quality-setup# Create simulated air quality monitoring networklibrary(gstat)set.seed(2025)n_stations <-20# Generate monitoring stationsstations <-data.frame(station_id =1:n_stations,x =runif(n_stations, -84, -76), # Longitudey =runif(n_stations, 42, 46), # Latitudeelevation =runif(n_stations, 100, 600))# Simulate PM2.5 values with spatial structurepollution_centers <-data.frame(x =c(-82, -78.5, -80.5),y =c(43, 44.5, 42.5))pm25_values <-numeric(n_stations)for(i in1:n_stations) { base_pollution <-12# Background level# Distance to pollution sourcesfor(j in1:nrow(pollution_centers)) { distance <-sqrt((stations$x[i] - pollution_centers$x[j])^2+ (stations$y[i] - pollution_centers$y[j])^2) pollution_effect <-15*exp(-distance *1.5) base_pollution <- base_pollution + pollution_effect }# Elevation effect (higher = cleaner) elevation_effect <--0.015* (stations$elevation[i] -350) pm25_values[i] <- base_pollution + elevation_effect +rnorm(1, 0, 2)}stations$pm25 <-pmax(1, pm25_values)stations_sf <-st_as_sf(stations, coords =c("x", "y"), crs =4326)# Visualise monitoring networktm_shape(stations_sf) +tm_symbols(fill ="pm25",fill.scale =tm_scale_continuous(values ="plasma"),fill.legend =tm_legend(title ="PM2.5 (μg/m³)"),size =1.5,col ="white",lwd =0.3 ) +tm_title("Air Quality Monitoring Network")```**Inverse Distance Weighting (IDW)**```{r}#| label: idw-interpolation# Create prediction gridbbox <-st_bbox(stations_sf)grid_res <-0.05grid_x <-seq(bbox[1], bbox[3], by = grid_res)grid_y <-seq(bbox[2], bbox[4], by = grid_res)pred_grid <-expand.grid(x = grid_x, y = grid_y)pred_grid_sf <-st_as_sf(pred_grid, coords =c("x", "y"), crs =4326)# IDW functionidw_predict <-function(known_points, pred_points, values, power =2) { known_coords <-st_coordinates(known_points) pred_coords <-st_coordinates(pred_points) predictions <-numeric(nrow(pred_coords))for(i in1:nrow(pred_coords)) { distances <-sqrt((known_coords[,1] - pred_coords[i,1])^2+ (known_coords[,2] - pred_coords[i,2])^2) distances[distances ==0] <-1e-10# Avoid division by zero weights <-1/ (distances^power) predictions[i] <-sum(weights * values) /sum(weights) }return(predictions)}# Apply IDWidw_predictions <-idw_predict(stations_sf, pred_grid_sf, stations$pm25)# Create IDW rasteridw_raster <-rast(ncols =length(grid_x), nrows =length(grid_y),xmin = bbox[1], xmax = bbox[3], ymin = bbox[2], ymax = bbox[4],crs ="EPSG:4326")values(idw_raster) <- idw_predictions# Visualise IDW resultstm_shape(idw_raster) +tm_raster(col.scale =tm_scale_continuous(values ="plasma"),col.legend =tm_legend(title ="PM2.5 (μg/m³)"),col_alpha =0.8 ) +tm_shape(stations_sf) +tm_symbols(fill ="white",col ="black",size =0.5) +tm_title("Air Quality - IDW Interpolation")```**Kriging Interpolation**```{r}#| label: kriging-interpolation#| warning: false# Convert to sp format for gstatstations_sp <-as(stations_sf, "Spatial")pred_grid_sp <-as(pred_grid_sf, "Spatial")# Fit variogramvgm_data <-variogram(pm25 ~1, stations_sp)# Fit modelvgm_model <-fit.variogram(vgm_data, model =vgm("Sph"))cat("Fitted Variogram Parameters:\n")print(vgm_model)# Perform krigingkriging_result <-krige(pm25 ~1, stations_sp, pred_grid_sp, model = vgm_model)# Create kriging rasterkriging_raster <-rast(ncols =length(grid_x), nrows =length(grid_y),xmin = bbox[1], xmax = bbox[3], ymin = bbox[2], ymax = bbox[4],crs ="EPSG:4326")values(kriging_raster) <- kriging_result$var1.pred# Visualisation comparisonidw_map <-tm_shape(idw_raster) +tm_raster(col.scale =tm_scale_continuous(values ="plasma"),col.legend =tm_legend(title ="PM2.5") ) +tm_shape(stations_sf) +tm_symbols(fill ="white", size =0.3) +tm_title("IDW Interpolation") +tm_layout(legend.show =FALSE)kriging_map <-tm_shape(kriging_raster) +tm_raster(col.scale =tm_scale_continuous(values ="plasma"),col.legend =tm_legend(title ="PM2.5") ) +tm_shape(stations_sf) +tm_symbols(fill ="white", size =0.3) +tm_title("Kriging Interpolation")tmap_arrange(idw_map, kriging_map, ncol =2)# Calculate prediction statisticscat("\nInterpolation Comparison:\n")cat("IDW - Mean:", round(mean(idw_predictions), 2), "SD:", round(sd(idw_predictions), 2), "\n")cat("Kriging - Mean:", round(mean(kriging_result$var1.pred), 2),"SD:", round(sd(kriging_result$var1.pred), 2), "\n")```:::------------------------------------------------------------------------## Reproducible Spatial WorkflowsWorking with spatial data often involves many steps: downloading datasets, cleaning and transforming them, performing analyses, and producing maps or statistical results. Without careful documentation, it can be difficult or even impossible to reproduce the exact workflow later, either by yourself or by other researchers. Reproducibility is therefore a cornerstone of good spatial analysis practice.### Learning Objectives- Integrate spatial analysis into reproducible research workflows- Organise spatial projects effectively- Document spatial analysis processes- Share spatial research outcomes### Project Organisation for Spatial AnalysisEffective spatial analysis requires structured workflows that others (including your future self) can understand and reproduce.``` spatial_project/├── README.md├── spatial_project.Rproj├── data/│ ├── raw/│ ├── processed/│ └── external/├── scripts/│ ├── 01_data_preparation.R│ ├── 02_spatial_analysis.R│ ├── 03_visualisation.R│ └── 04_statistical_analysis.R├── output/│ ├── figures/│ ├── maps/│ └── tables/└── docs/ ├── analysis_report.qmd └── methods_documentation.md```### Spatial Analysis to Answer a Research Question::: panel-tabset## Project Brief**Research Question**: How do environmental and demographic factors influence economic development patterns across countries?**Your Task**: Create a comprehensive spatial analysis that includes:1. **Data preparation**: Clean and integrate multiple datasets2. **Exploratory mapping**: Visualise key variables spatially3. **Spatial operations**: Calculate meaningful derived variables4. **Statistical analysis**: Test for spatial patterns and relationships5. **Communication**: Create final maps and summary**Datasets**:- Country boundaries with population and economic data- City locations with population sizes- Environmental data (elevation, climate zones)## Complete Solution**Step 1: Data Integration**```{r}#| label: data-prep# Integrate multiple datasetsanalysis_data <- world %>%filter(!is.na(pop_est), !is.na(gdp_md)) %>%mutate(gdp_per_capita = (gdp_md *1000000) / pop_est,area_km2 =as.numeric(st_area(geometry)) /1000000,pop_density = pop_est / area_km2, # Convert to per km²development_level =case_when( gdp_per_capita <5000~"Low Income", gdp_per_capita <15000~"Lower Middle", gdp_per_capita <50000~"Upper Middle", TRUE~"High Income" ) ) %>%select(name, continent, pop_est, gdp_md, gdp_per_capita, pop_density, development_level, geometry)# Extract environmental data for each countrycountry_centroids <-st_centroid(analysis_data)country_elevation <-extract(elevation, country_centroids)analysis_data$avg_elevation <- country_elevation[,2]cat("Integrated dataset:", nrow(analysis_data), "countries\n")cat("Variables:", ncol(analysis_data) -1, "attributes + geometry\n")```**Step 2: Exploratory Spatial Analysis**```{r}#| label: exploration#| fig-height: 10# Create multi-panel exploratory mapsgdp_map <-tm_shape(analysis_data) +tm_polygons(fill ="gdp_per_capita",fill.scale =tm_scale_intervals(# <- not development_level()style ="fisher",values ="viridis",# cols4all palette namen =7 ),fill.legend =tm_legend(title ="GDP per capita") ) +tm_title("Economic Development")pop_map <-tm_shape(analysis_data) +tm_polygons(fill ="pop_density",fill.scale =tm_scale_intervals(style ="fisher",values ="plasma",# cols4all palette namen =7 ),fill.legend =tm_legend(title ="Population density (per km²)") ) +tm_title("Population Density")elev_map <-tm_shape(analysis_data) +tm_polygons(fill ="avg_elevation",fill.scale =tm_scale_intervals(style ="fisher",# you can pass an explicit vector if the name isn't a cols4all palettevalues =terrain.colors(7),n =7 ),fill.legend =tm_legend(title ="Elevation (m)") ) +tm_title("Average Elevation")dev_map <-tm_shape(analysis_data) +tm_polygons(fill ="development_level",fill.scale =tm_scale_categorical(values =c("red", "orange", "yellow", "green")),fill.legend =tm_legend(title ="Development Level") ) +tm_title("Development Categories")# Arrange multiple mapstmap_arrange(gdp_map, pop_map, elev_map, dev_map, ncol =2)```**Step 3: Spatial Operations and Analysis**```{r}#| label: spatial-ops# Spatial autocorrelation analysisanalysis_nb <-poly2nb(analysis_data, queen =TRUE)analysis_weights <-nb2listw(analysis_nb, style ="W", zero.policy =TRUE)# Test multiple variables for spatial autocorrelationvariables <-c("gdp_per_capita", "pop_density", "avg_elevation")moran_results <-data.frame()for(var in variables) {if(var %in%names(analysis_data)) { values <-pull(analysis_data, var)# Remove missing values before testingif(sum(!is.na(values)) >10) { moran_test <-moran.test(values, analysis_weights, zero.policy =TRUE, na.action = na.omit) # Added this line moran_results <-rbind(moran_results, data.frame(Variable = var,Morans_I =round(moran_test$estimate[1], 4),P_value =round(moran_test$p.value, 4),Significant = moran_test$p.value <0.05 )) } }}cat("Spatial Autocorrelation Results:\n")print(moran_results)```**Step 4: Statistical Relationships**```{r}#| label: statistics# Examine relationships between variablesanalysis_df <- analysis_data %>%st_drop_geometry() %>%filter(!is.na(gdp_per_capita), !is.na(avg_elevation), !is.na(pop_density))# Correlation analysiscor_matrix <-cor(analysis_df[c("gdp_per_capita", "pop_density", "avg_elevation")], use ="complete.obs")cat("Correlation Matrix:\n")print(round(cor_matrix, 3))# Regional analysisregional_summary <- analysis_df %>%group_by(continent) %>%summarise(countries =n(),avg_gdp_pc =mean(gdp_per_capita, na.rm =TRUE),avg_pop_density =mean(pop_density, na.rm =TRUE),avg_elevation =mean(avg_elevation, na.rm =TRUE),.groups ='drop' ) %>%arrange(desc(avg_gdp_pc))cat("\nRegional Development Patterns:\n")print(regional_summary)```**Step 5: Final Visualisation and Summary**```{r}#| label: final#| fig-width: 12#| fig-height: 8# Create comprehensive final mapfinal_map <-tm_shape(analysis_data) +tm_polygons(fill ="development_level",fill.scale =tm_scale_categorical(values =c("#d73027", "#fc8d59", "#fee08b", "#91cf60" )),fill.legend =tm_legend(title ="Development Level") ) +tm_shape(cities_sf) +tm_symbols(size ="population",size.scale =tm_scale_continuous(values.scale =2),size.legend =tm_legend(title ="City Population"),col ="black",alpha =0.6 ) +tm_text("name", size =0.6, ymod =1.3) +tm_title("Global Development Patterns: Economic, Demographic, and Urban Factors") +tm_layout(title.size =1.4,legend.position =c("left", "bottom"),legend.title.size =1.1 ) +tm_credits("Analysis combines economic data, population density, and urban hierarchies",position =c("center", "bottom") )print(final_map)# Key findings summarycat("\nKey Findings:\n")cat("1. Economic development shows strong spatial clustering\n")cat("2. High-income countries concentrated in North America, Europe, Oceania\n") cat("3. Population density varies independently of economic development\n")cat("4. Geographic factors (elevation) show weak correlation with development\n")cat("5. Urban centers often indicate broader regional development patterns\n")```:::------------------------------------------------------------------------## Conclusion and Next StepsYou've completed an introduction to spatial analysis in R. You now have:- Spatial thinking skills to recognise location-based opportunities in research- Technical proficiency with modern spatial R packages (sf, terra, tmap)- Analytical toolkit for fundamental spatial operations and statistics- Communication abilities to create effective maps for different audiences- Workflow knowledge for reproducible spatial research### Learning Resources**General Spatial Analysis**:- [Spatial Data Science with R](https://r-spatial.org/book/)- [Geocomputation with R](https://r.geocompx.org/)- [Applied Spatial Data Analysis with R](https://asdar-book.org/)- [R for Spatial Analysis](https://rspatial.org/r4spatial/)- [Geographic Data Science with R](https://geographicdata.science/book/)**Domain-Specific Applications**:- [Spatial Ecology in R](https://paezha.github.io/spatial-analysis-r/)- [Geospatial Health Data](https://www.paulamoraga.com/book-geospatial)- [Spatial Statistics for Data Science](https://www.paulamoraga.com/book-spatial/)- [Spatial Econometrics in R](https://cran.r-project.org/web/views/Spatial.html)- [RSpatial tutorials](https://rspatial.org/)**Additional Resources**:With thanks to Abbie Chapman, Joana Pereira Da Cruz, and Bin Chi:- [Earthdata Search](https://search.earthdata.nasa.gov/search?q=CIESIN%20ESDIS)- [Open Spatial Demographic Data and Research](https://www.worldpop.org/)- [Digimap](https://digimap.edina.ac.uk/)- [Free GIS Datasets](https://freegisdata.rtwilson.com/)- [AddressBase Premium](https://www.ordnancesurvey.co.uk/products/addressbase-premium)- [Earth Engine Data Catalog](https://developers.google.com/earth-engine/datasets/)### Final ReflectionSpatial analysis opens new perspectives on research questions across virtually every domain. The techniques you've learned provide a foundation, but the real power comes from applying spatial thinking to your specific research challenges.**Remember**:- Start simple and build complexity gradually- Always question whether spatial patterns might be meaningful- Consider how location and context affect your phenomena- Share your spatial discoveries with othersThe spatial perspective transforms how we understand our world from local patterns in your research site to global processes affecting humanity. You now have the tools to explore these spatial relationships systematically and rigorously.