Here are some notes I put together for my Quant II class on this topic: PDF. My personal take-away from this is that OLS is fine for these kinds of applications, and carries with it the benefits of simplicity, well-understood operating characteristics, and consistency when we have, e.g., lots of dummy variable to soak up “fixed effects.” But reasonable people may disagree, although with little consequence I think. I emphasize “aggregation bias” in these notes—something that is still not part of conventional training in statistics or econometrics but is of central concern in the contemporary causal inference literature. In the simulations here, the aggregation bias is not such a big deal. To see an example where it is a big deal, overwhelming omitted variable bias, see this simulation code: R code. Acknowledgement of aggregation bias renders many conventional practices illegitimate for estimating causal effects. For example, the practice of interpreting MLE estimates by “setting all other variables to their mean” and then looking at predicted values doesn’t work. Rather, one needs to look at within-sample predictions that incorporate all of the heterogeneity present in the sample. The simulations in the notes here demonstrate.

## 2 Replies to “Treatment effects with binary outcomes”

Comments are closed.

Great notes, Cyrus. And logit could be more biased than OLS in some real-world scenarios. Good to see more emphasis on interactions and flexible specification of the covariates. (It might be helpful to point out that with a saturated model, the appropriate OLS and logit-derived estimators are equivalent.)

Thanks for the comment, Winston. I missed that fact about the saturated model. I’ve added it to the notes now.