Pursuing external validity by letting treatments vary?

Earlier this week I had the pleasure to speak on field experimental methods at GIGA in Hamburg (link). This was part of their series on methods for research on peace and conflict studies. During the talk (and afterward over some nice beers the names of which I cannot remember) we had a lively discussion about external validity for field experiments. I proposed that there are two ways to think about external validity. The first way is to think about taking the same treatment and same outcome, and seeing how the two relate to each other in different contexts. That is what I take to be the conventional view. But, I would say, this view is probably not so helpful. Treatments are never exactly the same, neither are instantiations of outcomes. So, under this conventional view, one is always playing the game of assessing whether treatment and outcomes are really being held constant.

An alternative view is one that emphasizes theory and parsimony. Let me develop this idea through an example. Field experiments usually involve treatments that are somehow “bundled.” By that I mean that the treatments involve multiple components. Each component could, in principle, play a causal role. A door to door canvassing treatment involves the personal contact as well as aspects of the content of the message. You could use various parsimonious characterizations to describe the treatment. You could call it a “personal appeals” treatment. Or you could characterize it in terms of the message content.

If you find an effect in this experiment, you leave open the possibility that personal appeals are effective. But the case is not closed, because it may be that the appeals are not what matters, but rather the content of the message. Having a control condition that holds the content fixed but varies only whether the message is delivered via a personal appeal helps to make the case that is it the personal nature of the appeal, per se, that matters. With this, we have established pretty strong evidence for the effectiveness of personal appeals in this particular setting.

How then to pursue external validity? We could run experiments with the same treatment-outcome configuration in new contexts. This would follow the first way of thinking about external validity. Another approach would be to consider other types of treatments that somehow involve isolating the effect of a personal appeal. We do not try to replicate the precise treatment-outcome configuration. Rather, we consider various treatment-outcome configurations that test the parsimonious proposition that “personal appeals can cause people to change their political behavior”. This gets us beyond a cookie cutter understanding of experimentation. It moves us toward a theory-building understanding of experimentation.

Understanding how a parsimonious proposition fares across various treatment-outcome configurations results in many implications. Understanding how a precise treatment-outcome configuration fares across various contexts has fewer implications. In this way, I judge the parsimonious proposition approach to be more valuable.

If you buy this argument, then for any empirical study we want to know, what are the parsimonious propositions that you’re testing?

P.S.’s

Neil Stenhouse (@n_stenhou) points me to a relevant paper by Wells & Windschitl (link). They discuss “stimulus sampling” as a way of constructing treatment-outcome configurations that allow one to isolate the relevant causal factor in a parsimonious proposition. They also point out that whether a given experiment generates evidence that speaks specifically to a parsimonious proposition is a question of “construct validity” (cf. Cook & Campbell, 1979: link). And here’s a neat paper Neil suggested on power analysis for sampled stimuli designs: link.
Brendan Nyhan (@Brendan_Nyhan) points me to a recent debate in psychology over whether “conceptual replications” should be taken as evidence in favor of the validity of a given study: link1 link2. The challenge here, it seems, is the subjectivity involved in claiming that different treatment-outcome configurations test the same parsimonious proposition. I would say that the challenge could be addressed through ex ante discussion of whether such a claim makes sense.
A subsequent post goes even further in discussing these points: link.