Imbens and Kolesar have a new NBER working paper, “Robust standard errors in small samples: Some practical advice”: gated link. They propose a new “default” choice for standard errors and confidence intervals to accompany OLS estimates of treatment effects, based on recommendations of Bell and McCaffrey (2002): ungated link. Their ideas are consistent with a previous blog post here, and actually discuss explicitly some of the points that were raised in the comments: link (see especially the comments section). In fact, many of the points that Imbens and Kolesar make are anticipated in a recent paper by Lin, who discusses the performance of OLS-based methods in estimating treatment effects in randomized experiments: ungated link. (See also the discussion of the paper at the Development Impact Blog: link)

The “robust inference for treatment effects” problem can be broken down into two components: (i) getting good standard errors for your point estimates of the treatment effect and (ii) relating these to a reasonable approximation of the sampling (or randomization) distribution of the point estimate to construct hypothesis tests or confidence intervals. Standard practice today for robust inference is captured by Stata’s default under the “, robust” option (or, for cluster-robust, the “, cluster(id)” option). For the non-clustered inference scenario, these defaults use (i) White’s heteroskedasticity consistent covariance estimator to obtain the standard errors, and then (ii) approximate the sampling distribution of the point estimates using t(n-k), where n is the number of data points and k the number of regressors. While White’s estimator is consistent, in finite samples the bias may be pronounced, a point that White himself developed in MacKinnon and White (1985): ungated link. In that 1985 paper, MacKinnon and White introduced the leverage-adjusted HC2 estimator that removes the finite sample bias. It is precisely this HC2 estimator that Imbens and Kolesar recommend as the default that people should use. Incidentally, Peter and I have a paper from last year that also demonstrates that HC2 provides a robust and conservative approximation to the exact randomization variance for unit-randomized experiments, and we also recommend its use: ungated link.

What about component (ii), an approximation of the sampling (or randomization) distribution? The t(n-k) approximation is from an analogy to the case of homoskedastic normal errors, where t(n-k) is in fact the exact sampling distribution. As discussed in the previous blog post (referenced above), the hope is that this gives an adequate amount of “tail fattening” to the reference distribution for the non-homoskedastic case. However, as Imbens and Kolesar demonstrate in a simple example, this approximation is obviously bad when the treatment variable is highly skewed. Consider a case where we have n1 treated, with n1 very large, and n0 control, with n0 very small. We estimate the treatment effect by regressing outcomes on a constant and treatment dummy, and so the treatment effect estimate is equivalent to taking the difference in treated and control means. Then, the treated mean will be highly precise, with a sampling/randomization variance of about 0. The estimate of the control mean will be very imprecise. The sampling/randomization variance of the treatment effect estimate will be driven almost exclusively by the variability in the control mean. The appropriate degrees of freedom for the approximate sampling/randomization distribution is probably closer to n0-1 rather than n-2 = n1 + n0 – 2. This could make a big difference. This is an old problem, and a classical way to deal with it is to use Welch’s approximation to the degrees of freedom for the sampling distribution (link). Welch’s approximation addresses the problems that arise due to skew. (Lin studies Welch-based approxomations in the simulations in the paper linked above, and he also explained it in the comments to the blog post linked above.) The problem with Welch’s degrees-of-freedom approximation is that it relies on estimates of the conditional error variances, which can be quite imprecise in finite samples. This is where Bell and McCaffrey come in. They propose to use a Welch approximation to the degrees of freedom that assumes homoskedasticity, allowing one to avoid having to plug in estimates of the conditional error variances. Of course the homoskedasticity assumption is typically false, but using it for the degrees of freedom approximation at least partially handles the skew problem without introducing new volatility problems. It stands as a reasonable compromise. Simulation studies in the Bell and McCaffrey paper as well as in the Imbens and Kolsar paper shows that it performs well (even outperforming bootstrap methods, such as the wild-t bootstrap, at least as the latter is conventionally applied).

The two papers develop these ideas for more general regression scenarios, including the clustered inference scenario. As for practice, estimating HC2 is easy (it is already and option with the “, robust” command in Stata, and in R it should be simple to program). I don’t know if the Bell-McCaffrey degrees of freedom approximations are pre-canned, but the expressions are not so complicated.