Suppose you are studying the effects of some policy adopted at the state level in the United States, and you are using data from all 50 states to do it. Well,

When a researcher estimates a regression function with state level data, why are there standard errors that differ from zero? Clearly the researcher has information on the entire population of states. Nevertheless researchers typically report conventional robust standard errors, formally justified by viewing the sample as a random sample from a large population. In this paper we investigate the justification for positive standard errors in cases where the researcher estimates regression functions with data from the entire population. We take the perspective that the regression function is intended to capture causal effects, and that standard errors can be justified using a generalization of randomization inference. We show that these randomization-based standard errors in some cases agree with the conventional robust standard errors, and in other cases are smaller than the conventional ones.

From a new working paper on “Finite Population Causal Standard Errors ” by the econometrics all-star team of Abadie, Athey, Imbens, and Wooldridge (updated link): link.

I have been to a few presentations of papers like this where someone in the audience thinks they are making a smart comment by noting that the paper uses population data, and so the frequentist standard errors “don’t really make sense.” Abadie et al. show that such comments are often misguided, arising from a confusion over how causal inference differs from descriptive inference. Sure — there is no uncertainty as to what is the value of the regression coefficient for this population given the realized outcomes. But the value of the regression coefficient is not the same as the causal effect.

To understand the difference, it helps to define causal effects precisely. A causal effect for a given *unit* in the population is most coherently defined to be a comparison between the outcome observed under a given treatment (being the “state level policy” in the case of the example above) and what would obtain were that same unit to be given another treatment. It is useful to imagine this schedule of treatment-value-specific outcomes as an array of “potential outcomes.” *Population average* causal effects take the average of the unit level causal effects in a given population. Now, suppose that there is some random (at least with respect to what the analyst can observe) process through which units in the population are assigned treatment values. Maybe this random process occurred because a bona fide randomized experiment was run on the population, or maybe it was the result of “natural” stochastic processes (that is, not controlled by the analyst). Then, for each unit we only get to observe the potential outcome associated with the treatment *received*, and not the other “counterfactual” potential outcomes associated with the other possible treatments. As such, we cannot actually construct the population average causal effect directly. Doing so would require that we were able to compute each of the unit-level causal effects. So, we have to *estimate* the population average causal effect using the *incomplete* potential outcomes data available to us. If the results of the random treatment assignment processes had turned out differently, the estimate we would obtain could very well differ as well (since there would be a different set of observed and unobserved potential outcomes). Even though we have data from everyone in the population, we are lacking the full schedule of potential outcomes that would allow us to estimate causal effects without uncertainty.

As it turns out, the random treatment assignment process is directly analogous to the *random sampling of potential outcomes*, in which case we can use standard sample theoretic results to quantify our uncertainty and compute standard errors for our effect estimates. Furthermore, as a happy coincidence, such sample theoretic standard errors are algebraically equivalent or even conservatively approximated by the “robust” standard errors that are common in current statistical practice. (This previous sentence was revised so that its meaning is now clearer.) It’s a point that Peter Aronow and I made in our 2012 paper on using regression standard errors for randomized experiments (link), and a point that Winston Lin develops even further (link; also see Berk’s links in the comments below to Winston’s great discussion of these results). Abadie et al. take this all one step further by indicating that this framework for inference makes sense for observational studies too.

Now, some of you might know of Rosenbaum’s work (e.g. this) and think this has all already been said. That’s true, to a point. But whereas Rosenbaum’s randomization inference makes use of permutation distributions for making probabilistic statements about specific causal hypotheses, Abadie et al.’s randomization inference allows one to approximate the randomization distribution of effect estimates without fixing causal hypotheses a priori. (See more on this point in this old blog post, especially in the comments: link).