Mapping Walmart Growth Across the US using R

Have you seen the Walmart growth maps video at flowingdata? And, did you wish that you could create similar animated movies using R? Well, you came to the right place. In this post, you will learn how to create point maps at the zip code level and then animate them to make a movie using your favorite libraries in R such as ggplot2 and dplyr. The bonus, of course, is to recreate the Walmart growth maps in R.

Let’s get started then.

Loading Libraries

You can see from the following commands that I owe a tremendous amount of gratitude to Hadley Wickham, for he has made many of the programming tasks easier with his R packages. ggplot2 has to be one of the best plotting packages and with dplyr manipulating and aggregating data has become less tedious and more enjoyable. He also has created packages for string manipulation (stringr) and date manipulation (lubridate). We need all of them.

library(ggplot2) #for plotting, of course
library(ggmap) #to get the US map
library(dplyr) #for data manipulation like a ninja 
library(readr) #to read the data in csv
library(lubridate) #to play with dates
library(scales) #for number labeling
library(stringr) #to play with strings
library(Cairo) #for anti-aliasing images on Windows
library(zipcode) #to clean up and geocode zipcodes

Gathering, Cleaning-up, Summarizing the Data

Let’s load up the zip code data using the following command:

data(zipcode)

Next, use the fantastic ggmap library to get the US map. Here, I’m using the toner-lite type of map from stamen maps, a great resource for fantastic maps.

us_map <- get_stamenmap(c(left = -125, bottom = 25.75, right = -67, top = 49), zoom = 5, maptype = "toner-lite") #you can change the map type to get something more colorful

Get the store openings data along with the dates and addresses.

walmart_op_data <- read_csv("http://users.econ.umn.edu/~holmes/data/WalMart/store_openings.csv")

A couple of clean up-items: convert the date column to the data format, create opening year and month columns. I also created another date column for looping through and creating maps for each month. You may ask, “Why did you not use just the opening date?” Good question. I am not using the opening date because after creating a map, I am saving the map using the same value for the filename. We are using ffmpeg here to create movie out of images, and ffmpeg likes sequential numbering of files. For a Mac OS, the file name is irrelevant, because we can use the glob argument of ffmpeg.

walmart_op_data <- walmart_op_data %>% 
                    mutate(OPENDATE = as.Date(OPENDATE, '%m/%d/%Y'), OpYear = year(OPENDATE), OpMonth = month(OPENDATE),
                           OpDate = as.Date(paste(OpMonth, 1, OpYear, sep = "/"), '%m/%d/%Y'), ZIPCODE = clean.zipcodes(ZIPCODE))

Now, calculate monthly openings and cumulative openings by month and year.

wm_op_data_smry <- walmart_op_data %>% 
                    count(OpYear, OpMonth) %>% ungroup() %>%
                    arrange(OpYear, OpMonth) %>%
                    mutate(cumm_n = cumsum(n))

Join the data with zipcode to get the latitude and longitude for each zipcode

walmart_op_data <- left_join(walmart_op_data, select(zipcode, zip, latitude, longitude), by = c("ZIPCODE" = "zip"))

Get all the zipcodes for each month’s Walmart opening.

wm_op_data_smry <- inner_join(wm_op_data_smry, select(walmart_op_data, ZIPCODE, latitude, longitude, OpYear, OpMonth), 
                              by = c("OpYear" = "OpYear", "OpMonth" = "OpMonth"))

Creating maps

The fun part: to actually see the data on the US map. Since we have to create a map for each month, we are using a function here for repetition and ease of use. I’ve explained the code using R comments.

#this function has three arguments. 
#' df: a dataframe used for plotting
#' plotdate: date used for splitting the data frame into before and after
#' mapid: a number for naming the final map file
my_zip_plot <- function(df, plotdate, mapid){
  # create the background map. using the darken argument to make the map filled with black color.
  g <- ggmap(us_map, darken = c("0.8", "black")) 
  # split the data frame for all Walmarts before a plot date i.e. a month
  old_df <- filter(df, OpDate < plotdate)
  # split the data frame for all Walmarts for the plot date i.e. during a month
  new_df <- filter(df, OpDate == plotdate)
  # plot all the Walmarts before the current opening month. Make all the older store locations as shown in circles smaller
  g <- g + geom_point(data = old_df, aes(x = longitude, y = latitude), size = 5, color = "dodgerblue", alpha = 0.4)
  #plot all the Walmarts during the current opening month. Make all the newer store locations as shown in circles bigger to get the "pop" effect
  g <- g + geom_point(data = new_df, aes(x = longitude, y = latitude), size = 8, color = "dodgerblue", alpha = 0.4)
  # remove axis marks, labels, and titles
  g <- g + theme(axis.ticks = element_blank(), axis.title = element_blank(), axis.text = element_blank(), plot.title = element_blank())  
  # place the label for year 
  g <- g + annotate("text", x = -75, y = 34, label = "YEAR:", color = "white", size = rel(5), hjust = 0)
  # place the value of for year 
  g <- g + annotate("text", x = -75, y = 33, label = unique(new_df$OpYear), color = "white", size = rel(6), fontface = 2, hjust = 0)
  # place the label for stores opened  
  g <- g + annotate("text", x = -75, y = 32, label = "STORE COUNT:", color = "white", size = rel(5), hjust = 0)
  # place cumulative store openings
  g <- g + annotate("text", x = -75, y = 31, label = comma(unique(new_df$cumm_n)), color = "white", size = rel(6), fontface = 2, hjust = 0)
  # generate the file name for the map. Using str_pad to make the filename same length and prefixed with zeroes. 
  # create a maps directory inside the directory of this script.
  filename <- paste0("maps/img_" , str_pad(mapid, 7, pad = "0"),  ".png")
  # save the map as a png using Cairo for anti-aliasing on Windows.
  ggsave(filename = filename, plot = g, width = 13, height = 7, dpi = 150, type = "cairo-png")
}

A note about color: I played with a few tools such as paletton to find good contrasting colors for the individual dots. After spending crazy amounts of time just trying different combinations, I just picked one of the blue shades from R named colors.

Loop through the data frame for each month and create a map for that month. Warning: these commands will take about 5 minutes and create a lot of maps. Make sure that you created a directory named maps inside the directory of this file and that you have enough space to store all the maps.

wm_op_data_smry %>%  
  mutate(mapid = group_indices_(wm_op_data_smry, .dots = 'OpDate')) %>% # created a group id for each group defined using OpDate i.e. a month
  group_by(OpDate) %>% 
# note when you are using user-defined functions in dplyr, you have to use do
  do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpDate), unique(.$mapid))) ## pass the summary data frame, the date, and map number to the function

Creating the Walmart Growth Movie

This is the fun part in which we put all the images together to make a “motion picture.” For this to work, you will need ffmpeg installed on your computer. On Windows, for ease of use, install it in the C drive.

# prepare the command for execution
# the framerate argument controls how many frames per second we want to see. Increase that number for a faster transition between months.
# since we used 7 digits as the fixed length of filenames, %7d pattern will match those filenames.
makemovie_cmd <- paste0("C:/ffmpeg/bin/ffmpeg -framerate 5 -y -i ", paste0(getwd(), "/maps/img_%7d.png"),  " -c:v libx264 -pix_fmt yuv420p ",  paste0(getwd(), "/maps/"), "movie.mp4")
# for mac os, you can use the glob argument and obviate the need for sequential numbering of files.
#makemovie_cmd <- paste0("ffmpeg -framerate 5 -y -pattern_type glob -i '", paste0(getwd(), "/maps/"), "*.png'", " -c:v libx264 -pix_fmt yuv420p '", paste0(getwd(), "/maps/"), "movie.mp4'")
system(makemovie_cmd)  # the system command will execute the ffmpeg command and create the final movie.

Play!

What do you think? I like this approach as it offers maximum flexibility of ggplot, you can adjust the sizes of new points, and you adjust the frame rate. Another advantage: you can relatively change the data source and use it for any other mapping purposes.

Complete Code

library(ggplot2)
library(ggmap)
library(dplyr)
library(readr)
library(lubridate)
library(scales)
library(stringr)
library(Cairo)
library(zipcode)
data(zipcode)
 
us_map <- get_stamenmap(c(left = -125, bottom = 25.75, right = -67, top = 49), zoom = 5, maptype = "toner-lite")
 
 
walmart_op_data <- read_csv("http://users.econ.umn.edu/~holmes/data/WalMart/store_openings.csv")
walmart_op_data <- walmart_op_data %>% 
                    mutate(OPENDATE = as.Date(OPENDATE, '%m/%d/%Y'), OpYear = year(OPENDATE), OpMonth = month(OPENDATE),
                           OpDate = as.Date(paste(OpMonth, 1, OpYear, sep = "/"), '%m/%d/%Y'), ZIPCODE = clean.zipcodes(ZIPCODE))
 
wm_op_data_smry <- walmart_op_data %>% 
                    count(OpYear, OpMonth) %>% ungroup() %>%
                    arrange(OpYear, OpMonth) %>%
                    mutate(cumm_n = cumsum(n))
 
walmart_op_data <- left_join(walmart_op_data, select(zipcode, zip, latitude, longitude), by = c("ZIPCODE" = "zip"))
 
wm_op_data_smry <- inner_join(wm_op_data_smry, select(walmart_op_data, ZIPCODE, latitude, longitude, OpYear, OpMonth), 
                              by = c("OpYear" = "OpYear", "OpMonth" = "OpMonth"))
 
 
 
 
my_zip_plot <- function(df, plotdate, mapid){
  g <- ggmap(us_map, darken = c("0.8", "black")) 
  old_df <- filter(df, OpDate < plotdate)
  new_df <- filter(df, OpDate == plotdate)
  g <- g + geom_point(data = old_df, aes(x = longitude, y = latitude), size = 5, color = "dodgerblue", alpha = 0.4)
  g <- g + geom_point(data = new_df, aes(x = longitude, y = latitude), size = 8, color = "dodgerblue", alpha = 0.4)
  g <- g + theme(axis.ticks = element_blank(), axis.title = element_blank(), axis.text = element_blank(), plot.title = element_blank())  
  g <- g + annotate("text", x = -75, y = 34, label = "YEAR:", color = "white", size = rel(5), hjust = 0)
  g <- g + annotate("text", x = -75, y = 33, label = unique(new_df$OpYear), color = "white", size = rel(6), fontface = 2, hjust = 0)
  g <- g + annotate("text", x = -75, y = 32, label = "STORE COUNT:", color = "white", size = rel(5), hjust = 0)
  g <- g + annotate("text", x = -75, y = 31, label = comma(unique(new_df$cumm_n)), color = "white", size = rel(6), fontface = 2, hjust = 0)
  filename <- paste0("maps/img_" , str_pad(mapid, 7, pad = "0"),  ".png")
  ggsave(filename = filename, plot = g, width = 13, height = 7, dpi = 150, type = "cairo-png")
}
 
wm_op_data_smry %>%  
  mutate(mapid = group_indices_(wm_op_data_smry, .dots = 'OpDate')) %>% 
  group_by(OpDate) %>% 
  do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpDate), unique(.$mapid)))
 
makemovie_cmd <- paste0("C:/ffmpeg/ffmpeg/bin/ffmpeg -framerate 5 -y -i ", paste0(getwd(), "/maps/img_%7d.png"),  " -c:v libx264 -pix_fmt yuv420p ",  paste0(getwd(), "/maps/"), "movie.mp4")
#works on mac
#makemovie_cmd <- paste0("ffmpeg -framerate 5 -y -pattern_type glob -i '", paste0(getwd(), "/maps/"), "*.png'", " -c:v libx264 -pix_fmt yuv420p '", paste0(getwd(), "/maps/"), "movie.mp4'")
system(makemovie_cmd)

About the Author

A co-author of Data Science for Fundraising, an award winning keynote speaker, Ashutosh R. Nandeshwar is one of the few analytics professionals in the higher education industry who has developed analytical solutions for all stages of the student life cycle (from recruitment to giving). He enjoys speaking about the power of data, as well as ranting about data professionals who chase after “interesting” things. He earned his PhD/MS from West Virginia University and his BEng from Nagpur University, all in industrial engineering. Currently, he is leading the data science, reporting, and prospect development efforts at the University of Southern California.

  • TaoYan says:

    Hi, Nandeshwar!
    This is a wonderful tutorial. But when I run the code on my R software, there is error:
    the code:
    wm_op_data_smry %>%
    mutate(mapid = group_indices_(wm_op_data_smry, .dots = “OpMonth”)) %>%
    group_by(OpMonth) %>%
    do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpMonth), unique(.$mapid)))
    the error:
    Error in filter_impl(.data, dots) : object ‘OpDate’ not found.
    I don’t know where it is wrong,could you please pick it out for me.Thank you!Best wishes!

  • […] Mapping Walmart Growth Across the US using R […]

  • […] R, I was able to show the growth of Walmart in the US. Graphs, check. Animation, […]

  • >