6 Merging and Wrangling
6.1 base R
merged_data <- merge(data1, data2, by = c("time", "space"), all = TRUE)
6.2 dplyr
joined_data <- inner_join(data1, data2, by ("time" = "time", "space" = "space"))
joined_data <- left_join(data1, data2, by ("time" = "time", "space" = "space"))
joined_data <- full_join(data1, data2, by ("time" = "time", "space" = "space"))
6.3 merge.stats
If you’re used to merging in STATA, you’ll probably miss the _merge
column, which nicely summarizes how year observation merged (or didn’t).
To replicate this, I created the merge.stats
package. This package is currently in devlopment, but it can be installed from GitHub and tried out by running
This package has two commands, merge_stats()
and join_stats()
. Both packages add a new column, merge
to the merged dataframe, as well as printing statistics, such
as how many observations from each dataframes did and did not successfully merge. merge_stats()
is build on top of the base R merge()
function and takes all of the
same parameters. In addition, you can specify show.stats = TRUE
to print the statics of the merge, or show.stats = FALSE
if you want to cut down on how
much is being printed to the console. merge_join
is built on top of the various _join()
functions from dyplr. This function has two additional arguements,
show_stats =
which says whether to print the statistics of the join, and join =
which specifies wither the joint is "inner"
, "right"
, "left"
, "full"
, "semi"
, or "anti"
.