1. Overview

This project aims to explore a data set containing US and Canadian Fire Occurrences from 1986-2013. The analysis will mainly focus on attributes such as:

fire cause
fire area
location

to examine the distribution of fires based on their cause and areas. Following the examination of the distributions of the fires, I hope to discuss the trends that are discernible from the data.

2. Load Necessary Libraries

library(ncdf4)
library(sf)
library(ggplot2)
library(dplyr)
library(stars)
library(maps)

3. Load Fire data set and grab variables

importing the fire data set and opening the file

# fire data set path
fire_data_path <- "/Users/annekebrouwer/Documents/geog490/project/na10km_USCAN_1986-2013_ann_all.nc"

# Open file
nc <- nc_open(fire_data_path)

retrieving and defining variables from dataset

# Grabbing variables
all_area <- ncvar_get(nc, "all_area")
all_area_tot <- ncvar_get(nc, "all_area_tot")
all_npts <- ncvar_get(nc, "all_npts")
all_npts_tot <- ncvar_get(nc, "all_npts_tot")
hu_area <- ncvar_get(nc, "hu_area")
hu_area_tot <- ncvar_get(nc, "hu_area_tot")
hu_npts <- ncvar_get(nc, "hu_npts")
hu_npts_tot <- ncvar_get(nc, "hu_npts_tot")
lat <- ncvar_get(nc, "lat")
lon <- ncvar_get(nc, "lon")
lt_area <- ncvar_get(nc, "lt_area")
lt_area_tot <- ncvar_get(nc, "lt_area_tot")
lt_npts <- ncvar_get(nc, "lt_npts")
lt_npts_tot <- ncvar_get(nc, "lt_npts_tot")
time <- ncvar_get(nc, "time")
time_bnds <- ncvar_get(nc, "time_bnds")
unk_area <- ncvar_get(nc, "unk_area")
unk_area_tot <- ncvar_get(nc, "unk_area_tot")
unk_npts <- ncvar_get(nc, "unk_npts")
unk_npts_tot <- ncvar_get(nc, "unk_npts_tot")
x <- ncvar_get(nc, "x")
y <- ncvar_get(nc, "y")

4. Exploring the total number of fires and their cause

Let’s make a histogram showing the number of fires caused by lightning, fires caused by humans, and fires with an unknown cause.

First, we must create a data frame that contains the total number of fires per cause.

# Create a DataFrame containing the number of human, lighting, and unknown caused fires
fire_counts <- data.frame(
  cause = c("Human", "Lightning", "Unknown"),
  count = c(sum(hu_npts_tot, na.rm = TRUE), sum(lt_npts_tot, na.rm = TRUE), sum(unk_npts_tot, na.rm = TRUE))
)

Next, we define a color palette and plot the data on a histogram using ggplot.

# Define a custom color palette
custom_colors <- c("Human" = "tomato3", "Lightning" = "orange", "Unknown" = "indianred4")

# Create the plot with custom colors
ggplot(fire_counts, aes(x = cause, y = count, fill = cause)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = custom_colors) +
  labs(x = "Cause of Fire", y = "Number of Fires", title = "Histogram of Fires by Cause US and Canada 1986-2013") +
  theme_minimal()

According to this histogram, most fires were caused by humans. However, it is important to note that lightning caused over 400,000 fires between 1986-2013. Although it seems small in comparison to the human-caused fires bar that nears 1,500,000 fire occurrences, lightning fires make up a great deal of the total number of fire occurrences.

5. Observing the fire occurrence variation by year

To grasp a better understanding of the pattern of fire occurrence, it is important to look at the dates in which these fires took place. Let’s plot the number of fires per year for each fire cause between 1986-2013.

To start, we must convert the date format from days since 01-01-1900 to year.

# Convert days since Jan 1, 1900 to a date format
start_date <- as.Date("1900-01-01")
date <- start_date + time - 1  # Subtract 1 because days since start_date starts from 1

# Extract the year from the date
year <- as.numeric(format(date, "%Y"))

Then, I created separate data frames for each fire cause that each contain the number of fires per year for every year between 1986 and 2013.

# Sum the lightning-caused fires across all grid cells for each year
lt_npts_tot <- apply(lt_npts, 3, sum, na.rm = TRUE)

# Create a data frame with year and total number of lightning-caused fires
lt_data <- data.frame(year = time, lt_fires = lt_npts_tot)

# Sum the human-caused fires across all grid cells for each year
hu_npts_tot <- apply(hu_npts, 3, sum, na.rm = TRUE)

# Create a data frame with year and total number of human-caused fires
hu_data <- data.frame(year = time, hu_fires = hu_npts_tot)

# Sum the unknown-caused fires across all grid cells for each year
unk_npts_tot <- apply(unk_npts, 3, sum, na.rm = TRUE)

# Create a data frame with year and total number of human-caused fires
unk_data <- data.frame(year = time, unk_fires = unk_npts_tot)

I then merged the data frames so that I could plot them on the same plot.

# Merge hu_data, lt_data, and unk_data by year
merged_data <- merge(merge(hu_data, lt_data, by = "year", all = TRUE), unk_data, by = "year", all = TRUE)

# Extract the year from the date
merged_data$year <- as.numeric(format(date, "%Y"))

# Plot merged data with a legend
ggplot(merged_data, aes(x = year)) +
  geom_line(aes(y = hu_fires, color = "Human-caused Fires")) +
  geom_line(aes(y = lt_fires, color = "Lightning-caused Fires")) +
  geom_line(aes(y = unk_fires, color = "Unknown-caused Fires")) + 
  labs(x = "Year", y = "Number of Fires", title = "Fire Occurrence by Cause Over Time") +
  scale_color_manual(values = c("Human-caused Fires" = "tomato3", "Lightning-caused Fires" = "orange", "Unknown-caused Fires" = "indianred4"),
                     name = "Cause") +
  theme_minimal()

Finally, using the following code I was able to calculate the year in which each cause of fire had the most fire occurrences.

hu_max_year <- merged_data$year[which.max(merged_data$hu_fires)] = 2006
lt_max_year <- merged_data$year[which.max(merged_data$lt_fires)] = 2006
unk_max_year <- merged_data$year[which.max(merged_data$unk_fires)] = 2007

It appears that the years with the most fire occurrences in this time period were 2006 and 2007.

6. Mapping Fire Occurrences by cause (1986-2013)

Now, we can use the fire area data to visualize the spatial distribution of these fires within the time frame.

First, we need to create data sets containing fire area (separated by cause) and their locations.

# create area of fire data frames lt_area
# Filter out NA values from lat and lon, keeping the same rows

valid_indices <- !is.na(lat) & !is.na(lon)
lt_area_filtered <- lt_area[valid_indices]
lat_filtered <- lat[valid_indices]
lon_filtered <- lon[valid_indices]

# Create lt_area_df dataframe with time column included
lt_area_df <- data.frame(
  lt_area = lt_area_filtered,
  lat = lat_filtered,
  lon = lon_filtered,
  time = time
)

# Filter out rows with NA values in the lt_area and time variables
lt_area_df <- lt_area_df[!is.na(lt_area_df$lt_area) & !is.na(lt_area_df$time), ]

# Convert days since Jan 1, 1900 to a date format
lt_area_df$date <- as.Date("1900-01-01") + lt_area_df$time - 1
lt_area_df <- lt_area_df[lt_area_df$lt_area != 0, ]
lt_area_df$year <- as.integer(format(lt_area_df$date, "%Y"))

# create area of fire data frames hu_area
# Filter out NA values from lat and lon, keeping the same rows
valid_indices_h <- !is.na(lat) & !is.na(lon)
hu_area_filtered <- hu_area[valid_indices_h]
lat_filtered_h <- lat[valid_indices_h]
lon_filtered_h <- lon[valid_indices_h]

# Create lt_area_df dataframe
hu_area_df <- data.frame(
  hu_area = hu_area_filtered,
  lat = lat_filtered_h,
  lon = lon_filtered_h,
  time = time
)

hu_area_df <- hu_area_df[!is.na(hu_area_df$hu_area) & !is.na(hu_area_df$time), ]

# Convert days since Jan 1, 1900 to a date format
hu_area_df$date <- as.Date("1900-01-01") + hu_area_df$time - 1
hu_area_df <- hu_area_df[hu_area_df$hu_area != 0, ]
hu_area_df$year <- as.integer(format(hu_area_df$date, "%Y"))

# Filter unk_area to get non-zero and non-NA values
# Filter out NA values from lat and lon, keeping the same rows
valid_indices_u <- !is.na(lat) & !is.na(lon)
unk_area_filtered <- unk_area[valid_indices_u]
lat_filtered_u <- lat[valid_indices_u]
lon_filtered_u <- lon[valid_indices_u]

# Create lt_area_df dataframe
unk_area_df <- data.frame(
  unk_area = unk_area_filtered,
  lat = lat_filtered_u,
  lon = lon_filtered_u,
  time = time
)
unk_area_df <- unk_area_df[!is.na(unk_area_df$unk_area) & !is.na(unk_area_df$time), ]


# Convert days since Jan 1, 1900 to a date format
unk_area_df$date <- as.Date("1900-01-01") + unk_area_df$time - 1
unk_area_df <- unk_area_df[unk_area_df$unk_area != 0, ]
unk_area_df$year <- as.integer(format(unk_area_df$date, "%Y"))

Let’s load the shapefiles for plotting

# Load the US map including Alaska
us_sf <- st_as_sf(maps::map("world", region = "USA", plot = FALSE, fill = TRUE))
us_sf_states <- st_as_sf(
  map("state", 
      region = c("california", "nevada", "idaho", "montana", "washington", "oregon", "wyoming", "utah", "colorado", "north dakota", "south dakota", "nebraska", "kansas", "oklahoma", "texas", "minnesota", "iowa", "missouri", "arkansas", "louisiana", "wisconsin", "michigan", "illinois", "indiana", "kentucky", "tennessee", "mississippi", "alabama", "ohio", "west virginia", "virginia", "north carolina", "south carolina", "georgia", "florida", "pennsylvania", "new york", "vermont", "new hampshire", "maine", "massachusetts", "rhode island", "connecticut", "new jersey", "delaware", "maryland", "new mexico", "arizona"), 
      plot = FALSE, 
      fill = TRUE))


# Load the Canada map
canada_sf <- st_as_sf(maps::map("world", region = "Canada", plot = FALSE, fill = TRUE))

# Combine the US and Canada maps
us_canada_sf <- rbind(us_sf, canada_sf)

Now, let’s visualize where lightning fires took place between 1986-2013.

# Create a scatter plot of fire locations in US and CANADA lt_fires
ggplot() +
  geom_point(data = lt_area_df, aes(x = lon, y = lat), color = "orange", alpha = 0.1, size = 0.01) +
  geom_sf(data = us_canada_sf, fill = "transparent", color = "black") +
  geom_sf(data = us_sf_states, fill = "transparent", color = "black") +
  labs(x = "Longitude", y = "Latitude", title = "Lightning Fire Locations in North America (lt_area > 0)")+
  theme_minimal() +
  coord_sf(xlim = c(-180, -50), ylim = c(22, 85))

Next, let’s look at the distribution of human-caused fires.

# Create a scatter plot of fire locations in US and CANADA hu_fires
ggplot() +
  geom_point(data = hu_area_df, aes(x = lon, y = lat), color = "tomato3", alpha = 0.1, size = .01) +
  geom_sf(data = us_canada_sf, fill = "transparent", color = "black") +
  geom_sf(data = us_sf_states, fill = "transparent", color = "black") + 
  labs(x = "Longitude", y = "Latitude", title = "Human-caused Fire Locations in North America (hu_area > 0)")+
  theme_minimal() +
  coord_sf(xlim = c(-180, -50), ylim = c(22, 85))

Finally, the distribution of unknown-caused fires within the time frame.

# Create a scatter plot of fire locations in US and CANADA unk_fires
ggplot() +
  geom_point(data = unk_area_df, aes(x = lon, y = lat), color = "indianred4", alpha = 0.1, size = 0.01) +
  geom_sf(data = us_canada_sf, fill = "transparent", color = "black") +
  geom_sf(data = us_sf_states, fill = "transparent", color = "black") +
  labs(x = "Longitude", y = "Latitude", title = "Location of fires with an unknown cause in North America (unk_area > 0)")+
  theme_minimal() +
  coord_sf(xlim = c(-180, -50), ylim = c(22, 85))

The spatial distribution of fires, categorized by their causes, reveals patterns and trends. Lightning-induced fires exhibit a clustering, particularly prevalent in the Western and southeastern coastal regions of the United States, as well as across Canada. This clustering aligns with established meteorological patterns, as most lightning storms occur in tropical and subtropical regions—which encompasses the southern United States. Moreover, the combination of increased temperatures, drought conditions, forested areas, and specific wind patterns prevalent in western US states and Canada creates an environment conducive to the ignition and spread of fires, especially during the summer and early fall months.

Human-caused fires exhibit a less pronounced spatial pattern compared to lightning-induced fires. While human-caused fires show a relatively even distribution across the United States, with a higher density along coastal states, their occurrence in Canada appears less frequent. Interestingly, there were more human-caused fires in the central US than lightning fires, suggesting a potentially different underlying cause.

The distribution of human-caused fires may not be easily attributed to specific environmental variables, as these fires are often the result of accidental ignition that escalates out of control. Instead, the occurrence of these fires might be more closely linked to factors such as human activity and land use patterns.

Unknown-caused fires, on the other hand, display little discernible pattern in their distribution. There seems to be some clustering in North Carolina and South Carolina and other southeastern states, but otherwise the distribution is much more sparse than the others. It is challenging to attribute a specific cause or environmental factor to these fires, as their origins are unclear.

7. Mapping all fire causes by size

While mapping the distribution of fires by their cause shows us where lightning-fires and human-caused fires are potentially more prone to occur, we should delve into where the all fires were more prone to spreading. According to the paper linked at this site, https://www.researchgate.net/figure/Relationship-fire-duration-and-fire-size-Small-fires-were-defined-as-having-sizes_fig12_307833081#, “Small fires were defined as having sizes between 0-1,000 ha, medium fires between 1,000-10,000 ha, large fires between 10,000-50,000 ha, and very large fires as greater than 50,000 ha.”

Let’s map where large and very large size fires occurred. First, we need to create a data frame containing all fire locations and their area.

# create area of fire data frame all fires
# Filter out NA values from lat and lon, keeping the same rows
valid_indices_all <- !is.na(lat) & !is.na(lon)
all_area_filtered <- all_area[valid_indices_all]
lat_filtered_all <- lat[valid_indices_all]
lon_filtered_all <- lon[valid_indices_all]

all_area_df <- data.frame(
  all_area = all_area_filtered,
  lat = lat_filtered_all,
  lon = lon_filtered_all,
  time = time
)
all_area_df <- all_area_df[!is.na(all_area_df$all_area) & !is.na(all_area_df$time), ]

# Convert days since Jan 1, 1900 to a date format
all_area_df$date <- as.Date("1900-01-01") + 
all_area_df$time - 1

all_area_df <- all_area_df[all_area_df$all_area != 0, ]
all_area_df$year <- as.integer(format(all_area_df$date, "%Y"))

Let’s plot the locations of large-sized fires

# Filter the data frame
filtered_df_large <- all_area_df %>%
  filter(all_area > 10000 & all_area < 50000)

# Create the map
ggplot() +
  geom_sf(data = us_canada_sf, fill = "transparent", color = "black") +
  geom_sf(data = us_sf_states, fill = "transparent", color = "black") +
  geom_point(data = filtered_df_large, aes(x = lon, y = lat, size = all_area), color = "indianred2", alpha = 0.2, stroke = 1) +
  scale_size_continuous(range = c(1, 10), name = "Area (ha)", labels = scales::comma, limits = c(10000, 50000), breaks = c(10000, 20000, 30000, 40000, 50000)) +
  labs(x = "Longitude", y = "Latitude", title = "The distribution of large-sized fires (10,000 - 50,000 ha)") +
  theme_minimal() +
   coord_sf(xlim = c(-180, -50), ylim = c(22, 85))

Let’s plot the locations of very large-sized fires

# Filter the data frame
filtered_df_vlarge <- all_area_df %>%
  filter(all_area > 50000)

# Create the map
ggplot() +
  geom_sf(data = us_canada_sf, fill = "transparent", color = "black") +
  geom_sf(data = us_sf_states, fill = "transparent", color = "black") +
  geom_point(data = filtered_df_vlarge, aes(x = lon, y = lat, size = all_area), color = "indianred2", alpha = 0.2, stroke = 1) +
  scale_size_continuous(range = c(1, 10), name = "Area (ha)", labels = scales::comma, limits = c(50000, 500000),  breaks = c(50000, 200000, 350000, 500000)) +
  labs(x = "Longitude", y = "Latitude", title = "Distribution of Very large fires ( > 50,000 ha)") +
  theme_minimal() +
   coord_sf(xlim = c(-180, -50), ylim = c(22, 85))

Large-sized fires: It appears that large-sized fires clustered towards the Western portions of both Canada and the US. In addition to this, large-sized fires also seemed to occur frequently across the middle portion of Canada. These regions may have a higher density of dried vegetation or abundant vegetation that is more susceptible to ignition and rapid spread of fires. This clustering pattern could be indicative of specific land use practices, vegetation types, or environmental conditions that contribute to the increased occurrence of large-sized fires in these areas

Very Large-sized fires: The frequency of very large-sized fires appeared to be lower compared to large-sized fires throughout the observed time frame. Interestingly, very large-sized fires were more prevalent in Canada than in the United States. This disparity could potentially be attributed to a shift in climate conditions as one moves northward, with environmental factors in Canada possibly favoring the occurrence of very large-sized fires compared to the US. This observation underscores the influence of climate and environmental conditions on fire behavior and highlights the need for region-specific fire management strategies.

Exploring North American Fire Occurrences 1986-2013

Anneke Brouwer

2024-03-13