Regression discontinuity designs and endogeneity

The Social Science Statistics blog posts a working paper by Daniel Carpenter, Justin Grimmer, Eitan Hersch, and Brian Fienstein on possible endogeneity problems in close electoral margins as a source of causal identification in regression discontinuity studies (link). In their abstract, they summarize their findings as such:

In this paper we suggest that marginal elections may not be as random as RDD analysts suggest. We draw upon the simple intuition that elections that are expected to be close will attract greater campaign expenditures before the election and invite legal challenges and even fraud after the election. We present theoretical models that predict systematic divergences between winners and losers, even in elections with the thinnest victory margins. We test predictions of our models on a dataset of all House elections from 1946 to 1990. We demonstrate that candidates whose parties hold structural advantages in their district are systematically more likely to win close elections. Our findings call into question the use of close elections for causal inference and demonstrate that marginal elections mask structural advantages that are troubling normatively.

A recent working paper by Urquiola and Verhoogen draws similar conclusions about non-random sorting in studies that use RDDs to study the effects of class size on student performance (link).

The problem here is that the values of the forcing variable assigned to individuals are endogenous to complex processes that, very likely, are based on the anticipated gains or losses associated with crossing the cut-off point that defines the discontinuity. Though such is not the case in the above examples, it can also be the case that the values of the cut-off are endogenous. Causal identification requires that the processes determining values of the forcing variable and cut-off are not confounding. What these papers indicate is that RDD analysts need a compelling story for why this is the case. (In other words, they need to demonstrate positive identification [link]).

This can be subtle. As both Carpenter et al and Urquiola and Verhoogen demonstrate, it’s useful to think of this in terms of a mechanism design problem. Take a simple example drawing on the “original” application of RD: test scores used to determine eligibility for extra tutoring assistance. Suppose you have two students and they are told that they will take a diagnostic test at the beginning of the year and that the one with the lower score will receive extra assistance during the year, with a tie broken by a coin flip. At the end of the year they will both take a final exam that determines whether they win a scholarship for the following year. The mechanism induces a race to the bottom: both students have incentive to flunk the diagnostic test, each scoring 0 actually, in which case they have a 50-50 chance of getting the help that might increase their chances of landing a scholarship. Interestingly, this actually provides a nice identifying condition. But suppose only one of the students is quick enough to learn what would be the optimal strategy in this situation and the other is a little slow. Then the slow student would put in sincere effort, score above 0 and guarantee that the quick-to-learn student got the tutoring assistance. Repeat this process many times, and you systematically have quick-learners below the “cut-off” and slow learners above it, generating a biased estimate of the average effect of tutoring in the neighborhood of the cut-point. What you need for the RD to produce what it purports to produce is a mechanism by which sincere effort is induced (and, as Urquiola and Verhoogen have discussed, a test that minimizes mean-reversion effects).

UPDATE: A new working paper by Caughey and Sekhon (link) provides even more evidence about problems with close elections as a source of identification for RDD studies. They provide some recommendations (shortened here; the full phrasing is available in the paper):

The burden is on the researcher to…identify and collect accurate data on the observable covariates most likely to reveal sorting at the cut-point. [A] good rule of thumb is to always check lagged values of the treatment and response variables.

Careful attention must be paid to the behavior of the data in the immediate neighborhood of the cut-point. [Our analysis] reveals that the trend towards convergence evident in wider windows reverses close to the cut-point, a pattern that may occur whenever a…treatment is assigned via a competitive process with a known threshold.

Automated bandwidth- and speciﬁcation-selection algorithms are no sure solution. In our case, for example, the methods recommended in the literature select local linear regression bandwidths that are an order of magnitude larger than the window in which covariate imbalance is most obvious.

It is…incumbent upon the researcher to demonstrate the theoretical relevance of quasi-experimental causal estimates.