Pursuing external validity by letting treatments vary?

Earlier this week I had the pleasure to speak on field experimental methods at GIGA in Hamburg (link). This was part of their series on methods for research on peace and conflict studies. During the talk (and afterward over some nice beers the names of which I cannot remember) we had a lively discussion about external validity for field experiments. I proposed that there are two ways to think about external validity. The first way is to think about taking the same treatment and same outcome, and seeing how the two relate to each other in different contexts. That is what I take to be the conventional view. But, I would say, this view is probably not so helpful. Treatments are never exactly the same, neither are instantiations of outcomes. So, under this conventional view, one is always playing the game of assessing whether treatment and outcomes are really being held constant.

An alternative view is one that emphasizes theory and parsimony. Let me develop this idea through an example. Field experiments usually involve treatments that are somehow “bundled.” By that I mean that the treatments involve multiple components. Each component could, in principle, play a causal role. A door to door canvassing treatment involves the personal contact as well as aspects of the content of the message. You could use various parsimonious characterizations to describe the treatment. You could call it a “personal appeals” treatment. Or you could characterize it in terms of the message content.

If you find an effect in this experiment, you leave open the possibility that personal appeals are effective. But the case is not closed, because it may be that the appeals are not what matters, but rather the content of the message. Having a control condition that holds the content fixed but varies only whether the message is delivered via a personal appeal helps to make the case that is it the personal nature of the appeal, per se, that matters. With this, we have established pretty strong evidence for the effectiveness of personal appeals in this particular setting.

How then to pursue external validity? We could run experiments with the same treatment-outcome configuration in new contexts. This would follow the first way of thinking about external validity. Another approach would be to consider other types of treatments that somehow involve isolating the effect of a personal appeal. We do not try to replicate the precise treatment-outcome configuration. Rather, we consider various treatment-outcome configurations that test the parsimonious proposition that “personal appeals can cause people to change their political behavior”. This gets us beyond a cookie cutter understanding of experimentation. It moves us toward a theory-building understanding of experimentation.

Understanding how a parsimonious proposition fares across various treatment-outcome configurations results in many implications. Understanding how a precise treatment-outcome configuration fares across various contexts has fewer implications. In this way, I judge the parsimonious proposition approach to be more valuable.

If you buy this argument, then for any empirical study we want to know, what are the parsimonious propositions that you’re testing?

P.S.’s

  • Neil Stenhouse (@n_stenhou) points me to a relevant paper by Wells & Windschitl (link). They discuss “stimulus sampling” as a way of constructing treatment-outcome configurations that allow one to isolate the relevant causal factor in a parsimonious proposition. They also point out that whether a given experiment generates evidence that speaks specifically to a parsimonious proposition is a question of “construct validity” (cf. Cook & Campbell, 1979: link). And here’s a neat paper Neil suggested on power analysis for sampled stimuli designs: link.
  • Brendan Nyhan (@Brendan_Nyhan) points me to a recent debate in psychology over whether “conceptual replications” should be taken as evidence in favor of the validity of a given study: link1 link2. The challenge here, it seems, is the subjectivity involved in claiming that different treatment-outcome configurations test the same parsimonious proposition. I would say that the challenge could be addressed through ex ante discussion of whether such a claim makes sense.
  • A subsequent post goes even further in discussing these points: link.
Share

Why do I favor a “design based approach” in my methodological work?

The essence of the design-based approach is that one establishes optimality principles in terms of research design parameters, while making minimal assumptions about the distributional properties of the variables in the analysis. The motivation for a design-based perspective is three-fold.

First, I do a lot of field research. Most field research projects seek to obtain data on a variety of outcome variables. Each of these outcome variables might differ in its distributional properties. In such cases, one wants optimal design principles that are robust to such variety. The design- based approach achieves precisely this robustness goal. This is the line of thinking one associates with Leslie Kish, William Cochran, and the literature on “official statistics.”

Second, the design-based approach aims to minimize errors of inference (for example, inaccurate confidence intervals) that arise when one uses methods that rely on distributional assumptions that are inaccurate. The design based approach achieves such error minimization by defining inferential properties primarily in terms of the design parameters that are directly controlled, and therefore known, by the researcher. This is the line of thinking that one associates with David Freedman.

Third, the design-based approach minimizes reliance on data parameters that are only revealed after the data are collected. This allows for the pre-specification and ex ante agreement among scholars on the inferential leverage that a design offers. For example, it allows for ex ante specification and agreement on what hypotheses can be tested and with what power. A design document and pre-analysis plan can guard against ex post manipulations and “results fishing,” arguably allowing for more credible accumulation of scientific knowledge. This is the line of thinking that one associates with current proponents of pre-analysis plans like Edward Miguel and Macartan Humphreys.

Share

Motivations for service provision

The Development Impact Blog (link) has an outstanding post by Ken Leonard discussing the differences between prosocial and intrinsic motivations for public service workers:

When we hear of a fireman who works for money, we immediately think about the wage, its relationship to performance and the way incentives are organized within the system. If you hope to be rescued by someone who works only for the money, you want to know how and under what circumstances he earns more. The term intrinsic motivation, however, often leads to policy paralysis: if they love to do their job, then let them continue to do their job. Once we recognize, however, that public sector workers might be motivated by the gratitude or admiration of others (not the act of doing their job), we might be more likely to ask about wages, incentives and organizations—the same questions we would automatically ask about money. For example, does seeking the gratitude of patients or students increase or decrease useful effort? Can organizations increase exposure to positive incentives and decrease exposure to negative incentives?

(Link to post.).

So intrinsic: “I like doing this work.” Prosocial (or, perhaps better termed, Other-regarding): “I like what this work does for others” or “I like how others view my doing this work.”

If you find such a discussion of intrinsic/extrinsic, other/self-regarding motivations interesting, you should really read Jon Elster: link.

This also reminds me of a conversation I had with Leonard Wantchekon last year:

“I am hoping to organize a conference on organization theory—you know, as a way to gain insights on why state institutions sometimes work and sometimes don’t,” he said.

“Sounds great!” I replied.

“Yeah, and you know what?” he continued. “If the key result of your model or field experiment is ‘higher wages lead to better performance’, you’re not invited!”
Share