6 Merging and Wrangling

6.1 base R

merged_data <- merge(data1, data2, by = c("time", "space"), all = TRUE)

6.2 dplyr

joined_data <- inner_join(data1, data2, by ("time" = "time", "space" = "space"))
joined_data <- left_join(data1, data2, by ("time" = "time", "space" = "space"))
joined_data <- full_join(data1, data2, by ("time" = "time", "space" = "space"))

6.3 merge.stats

If you’re used to merging in STATA, you’ll probably miss the _merge column, which nicely summarizes how year observation merged (or didn’t). To replicate this, I created the merge.stats package. This package is currently in devlopment, but it can be installed from GitHub and tried out by running

This package has two commands, merge_stats() and join_stats(). Both packages add a new column, merge to the merged dataframe, as well as printing statistics, such as how many observations from each dataframes did and did not successfully merge. merge_stats() is build on top of the base R merge() function and takes all of the same parameters. In addition, you can specify show.stats = TRUE to print the statics of the merge, or show.stats = FALSE if you want to cut down on how much is being printed to the console. merge_join is built on top of the various _join() functions from dyplr. This function has two additional arguements, show_stats = which says whether to print the statistics of the join, and join = which specifies wither the joint is "inner", "right", "left", "full", "semi", or "anti".

6.4 Into the tidyverse

6.4.1 filter()

6.4.2 mutate()

6.4.3 group_by()

6.4.4 select()

6.4.5 %>%

6.4.6 Stringing it Together