7 One Stop Modeling: Zelig

Zelig is a statistical software originally created by Kosuke Imai, Gary King, and Olivia Lau. It attampts to unify different models, tests, vizualizations, and other statistical activities into a single framework. R is not a piece of software, it is an open source programing language. As such, as it has adapted and grown, different people have created different packages. This is a strength as it keeps R flexible and on the cutting-edge of statistics, but it means that there is often inconsistency. There are variaous packages in R that you can use for different models; Zelig was created to keep the syntax consistent and make using different models as simple as changind a single command. More information on Zelig can be found on its .

In addition to running your typical statistical models, Zelig also replicates Clarify from STATA, integrates with multiple imputation methods (through the package Amelia) as well as matching methods (through the packages cem and MatchIt)

The generic syntax for statistical models in Zelig is as follows

zelig(y ~ x1 + x2 + x3, data = example.data, model = "model.type")

This syntax is very similar to most other models, but with the added arguement model = in order to specifiy what you’re running. If we wanted to replicate the model we estimated in Chapter 3, this is how it’d look.

As you can see, Zelig automatically prints the citation for the model your using. This can be very useful when preparing your references for a project, but very annoying when you’re running a bunch of models. Adding cite = FALSE (or cite = F) will stop this.

Zelig models can be easily exported to a table for publication-ready results, but the require an additional command, from_zelig_model(). We can compare the original logit from Chapter 3 to the one above to varify that they’re the same.2

Ch 3 model Zelig model
(Intercept) 0.685* 0.685*
(0.273) (0.273)
Class2nd −1.018*** −1.018***
(0.196) (0.196)
Class3rd −1.778*** −1.778***
(0.172) (0.172)
ClassCrew −0.858*** −0.858***
(0.157) (0.157)
SexFemale 2.420*** 2.420***
(0.140) (0.140)
AgeAdult −1.062*** −1.062***
(0.244) (0.244)
Num.Obs. 2201 2201
AIC 2222.1 2222.1
BIC 2256.2 2256.2
Log.Lik. −1105.031 −1105.031
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

As we can see, the output is identical, proving that either method will work. Now we can look at various different models by simply changing the model = parpameter. Here’s a compariason of a logit, and probit.

Logit Probit
(Intercept) 0.685* 0.367*
(0.273) (0.161)
Class2nd −1.018*** −0.630***
(0.196) (0.114)
Class3rd −1.778*** −1.027***
(0.172) (0.098)
ClassCrew −0.858*** −0.540***
(0.157) (0.094)
SexFemale 2.420*** 1.450***
(0.140) (0.080)
AgeAdult −1.062*** −0.580***
(0.244) (0.141)
Num.Obs. 2201 2201
AIC 2222.1 2224.6
BIC 2256.2 2258.8
Log.Lik. −1105.031 −1106.314
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

You can see all of the supported models, and their specific syntax in the .

7.1 Simulation and Counterfactuals

Zelig can take a model and generate simulated values. This can be particularly useful for plotting counterfactuals. We’ll use the mtcars dataset. This is a dataset of various car features for 32 models, from a Motor Trend magazine’s 1974 issue.

Let’s say we’re interested in the effect the that number of cylinders has on a car’s fuel effeciency. We’ll start by looking at the data.

To look at the effect of cylinders all else equal, we need to estimate a model. Miles per gallon is roughly countious and regularly distributed, so we’ll use OLS. In addition to the number of cylinders lets adjust for the weight, horsepower, and the number of gears a car has.

The table shows that adding a cylinder to the engine decreases the average miles per gallon by 0.8. Now we can simulate some values and visualize the relationship.

First we have to set the values of interest. Let’s see a 4 cylinder vs 8 cylinder car. This is done with the setx() and setx1() commands. We then simulate values with sim(), and finally plot the results with plot(). This can all be chained together with %>%.

We can see from the graphs that a car with 8 cylinders (in blue) is expected to have a lower mgp when compared to a car with 4 cylinders. That said, there is some overlap in the comparison.


  1. There are two options for converting Zelig’s output to a format that’s readable for modelsummary(). As seen above you can add a pipe (%>%) followed by from_zelig_model(). Alternatively, you can wrap the code inside of from_zelig_model(). For example from_zelig_model(zelig(Survived ~ Class + Sex + Age, data = titanic.expanded, model = "logit")). Both methods do the same thing, the difference is ultimately asthetic. I prefer using %>% as the commands read in the order that they are being executed, where as nesting commands requires you to start near the end of the line and read backwards.