hidden confounding in survey experiments

I had the opportunity to participate in a fun seminar via Skype with faculty and students at Uppsala University’s department of peace and conflict research. We were discussing exciting new avenues for using experimental methods to study microfoundations of conflict and IR theories. The discussion was led by Allan Dafoe (Berkeley, visiting at Uppsala), who is doing really interesting work on reputation and strategic interaction (link).

An interesting point on “hidden confounding” in survey experiments came up that I don’t think gets enough play in analyses of survey experiments, so I thought I’d relay it here as a reference and also to see if others have any input. A common approach in a survey experiment is to provide subjects with hypothetical scenarios. The experimental treatments then consist of variations on the content of the scenarios.

What makes this kind of research so intriguing is that it would seem that you can obtain exogenous variation in circumstances that rarely obtains in the real world. Thus, if your experiment involves a scenario about an international negotiation over a dispute, you could vary, say, the regimes from which the negotiators come in a manner that does not occur frequently in the real world.

The problem is that subjects come to a survey experiment with prior beliefs about “what things go with what”—that is, about how salient features correlate. In our example, people will tend to associate regime types with things like national wealth or region. In that case, by manipulating the negotiators’ regime types in the experiment, you are implicitly changing people’s beliefs about other features of the countries from which the negotiators come. You can try to hold these things “constant”—e.g., by having one treatment where negotiator A comes from a “rich democracy” and another where negotiator A comes from a “rich dictatorship”—but to the extent that you are creating a scenario that departs from what typically occurs in the real world, you might be causing the subject to wonder whether we are talking about some “unusual” circumstance. If so, the subject might apply a different evaluative framework than what the subject would apply to “usual” circumstances. Thus, you are obtaining a causal estimate that is dependent on the frame of reference, which may not be generalizable.

It’s a bit thorny, so what are solutions? Ironically, it seems to me that one solution would be to focus the experiment on treatments that are “plausibly exogenous.” One could focus on conditions that respond easily to choices, and where choices in either direction are conceivable. Or, one could focus the experiment on things that can vary randomly—like weather, most famously. I find this ironic because it seems that the survey experiment doesn’t get us very far from what we attempt to do with natural experiments. It would seem that the sweet spot for survey experiments would be for things that we are pretty sure could occur as a natural experiment, but either haven’t occurred often enough or haven’t been measured, in which case we can’t just study the natural experiment directly. Applying this rule would greatly limit the areas of application for survey experiments, but I think this formula would result in survey experiments that have more credible causal interpretations.

(By the way, Allan clued me into a discussion of this very point in a current working paper by Michael Tomz and Jessica Weeks: link.)

UPDATE: Allan provided this initial reaction:

I actually think the problem with survey experiments is a bit worse than you describe. It’s not just that confounding can be avoided in survey experiments by focussing on those factors that are plausibly manipulable; one has to vary factors that are in the population typically uncorrelated with other factors of interest, given the scenario. That is, one wants that the respondents believe Pr(Z|X1)=Pr(Z|X2) where X1 and X2 are two values of the treatment condition, and Z is any other factor of potential relevance that is not a consequence of treatment. For example, the decision of whether the US should stay in Afghanistan (X1) is plausibly manipulable and could plausibly go either way; Obama could decide to leave (X2). But even though such a counterfactual is plausible and could involve a hypothetical manipulation, we are unlikely to believe that Pr(Z|X1)=Pr(Z|X2), where Z could be the domestic support for war, or the strength of the US economy, or the resilience of the Taliban. So perhaps this implies that the only treatments that will not generate information leakage are either (1) those that are exogenous to begin with in the world (which are thus relatively easy to study using observational data), or (2) those that provide a compelling hypothetical natural experiment to account for the variation. So in this sense—perhaps I am actually just restating your main point—survey experiments only generate clear causal inferences if the key variation arises from a credible (hypothetical) natural experiment.