For example this: link. Snowflakes, as a metaphor, came up today in a conversation about the implications of the rampant heterogeneity that we face in the social sciences.
For a market to function well,  you must be able to trust most of the people most of the time [to live up to contractual obligations];  you must be secure from having your property expropriated;  information about what is available where at what quality must flow smoothly;  any side effects of third parties must be curtailed; and  competition must be at work.
So concludes John McMillan in his magisterial and highly engaging 2002 book on institutions and markets, Reinventing the Bazaar: A Natural History of Markets (amazon). McMillan provides fantastic examples from across time and around the world on how formal and informal institutions have served in meeting these five conditions. Examples range from produce vendors in the Makola market in Accra to bidders for public construction contracts in Tokyo.
I was reading this while traveling through the DRCongo the past two weeks. Helped to open my eyes about the various third party roles that state, armed group, and traditional elites play in market exchange there.
Himelein et al. have a draft working paper (link) covering methods for household sampling in the field when you don’t have administrative lists of households or full enumeration on-site is not possible. This includes various “random route”/”random walk” as well as methods that use satellite data. Some choice tidbits:
- On using satellite maps to construct a frame: “Based on the experience mapping the three PSUs used in the paper, it takes about one minute per household to construct an outline. If the PSUs contain approximately 250 structures (the ones used here contain 68, 309, and 353 structures, respectively), mapping the 106 PSUs selected for the full Mogadishu High Frequency Survey would have required more than 50 work days.” Yikes! Of course they probably could have cut this time down if they sampled subclusters within the PSUs and only enumerated those. Nonetheless, the 1-minute/household estimate is a useful rule of thumb.
- They define the “Mecca method” as choosing a random set of GPS locations in an area, and then walking in a fixed direction (e.g., the direction of Mecca, which almost everyone in Mogadishu knows) until you hit an eligible structure. The method amounts to a form of probability proportional to size (PPS) sampling, where “size” in this case amounts to the area on the ground that allows for an unobstructed path to the structure. This may not be such an easy thing to measure, although the authors propose that one could approximate the PPS weights using distance between the selected household and the next household going up the line that was traveled. Also it’s possible that some random points induce paths that never come upon an eligible structure. This would create field complications, particular in non-urban settings where domicile layouts may be sparse.
The authors take images of domicile patterns Mogadishu and some information on consumption variable distributions to construct simulations. They use the simulations to evaluate satellite-based full enumeration, field listing within PSU segments, interviewing within GPS-defined grid squares, the Mecca method, and then the Afrobarometer “random walk” approach. No surprise that satellite-based full enumeration was the least biased, segmentation next, and then Mecca method with PPS weights and approximate PPS weights third and fourth. All four of these were quite good and unbiased though. Grid, random walk, and unweighted Mecca method were quite biased. Such bias needs to be weighed against costs and ability to validate. Satellite full enumeration is costly but one can validate. The segment method is also costly and rather hard to enumerate. The grid method fares poorly on both counts. The Mecca method with true PPS weights is somewhat costly, but with approximate PPS weights is quite good on both counts. The random walk is cheap but hard to validate. Again, I would say that some of these results may be particular to the setting (relatively dense settlement in an urban area). But the insights are certainly useful.
I found this paper from David Evans fantastic summary of the recently concluded conference on Annual Bank Conference on Confronting Fragility and Conflict in Africa: link.
The Papers and Hot Beverages (PHB) blog had a nice discussion (link) of some of the points I raised in my previous post about “pursuing external validity by letting treatments vary” (link). PHB starts by proposing that we can rewrite a simple treatment effects model along the lines of the following (modified from PHB’s expression to make things clearer):
The idea is that the treatment may bundle various components, captured by the terms, each of which has its own effect. Moreover, each of these components may interact with features of the context, captured by the terms.
The proposal to explore external validity by “letting treatments vary” amounts to trying to identify the effect of one of the components by generating variation in that component that is independent of the other components. Of course, in doing so, one does not resolve the problem of covariation with the components. So in this way, I understand why PHB was not “convinced” about the strategy of letting treatments vary as being sufficient for testing a parsimonious proposition that focuses on the effect of a particular component of a treatment bundle in a manner that does not incorporate contextual conditions. Of course, I was not trying to propose that such a strategy is sufficient in this way. Just that it is another way to think about accumulating knowledge across studies.
We can also go further and provide a more complete characterization of the problem of interpreting a treatment effect. Indeed, PHB’s characterization imposes some restrictions relative to the following:
The s are effects of elements in that depend on neither other elements of the treatment bundle nor the context . The s are the ways that elements of the treatment bundle modify each others’ effects regardless of context. The s are ways that the context modifies the effects of elements of separately. Finally, the s are ways that context modifies the ways that elements of modify the effects of each other.
When using causal estimates to develop theories, we typically want to interpret manipulations of in parsimonious terms. The upshot is that in trying to be parsimonious we may ignore elements of or . Even if the effect of is well identified, our parsimonious interpretation may not be valid.
This is a mess of an expression. But I find it strangely mesmerizing. It gives some indication of how complicated is the work of interpreting causal effects.
Earlier this week I had the pleasure to speak on field experimental methods at GIGA in Hamburg (link). This was part of their series on methods for research on peace and conflict studies. During the talk (and afterward over some nice beers the names of which I cannot remember) we had a lively discussion about external validity for field experiments. I proposed that there are two ways to think about external validity. The first way is to think about taking the same treatment and same outcome, and seeing how the two relate to each other in different contexts. That is what I take to be the conventional view. But, I would say, this view is probably not so helpful. Treatments are never exactly the same, neither are instantiations of outcomes. So, under this conventional view, one is always playing the game of assessing whether treatment and outcomes are really being held constant.
An alternative view is one that emphasizes theory and parsimony. Let me develop this idea through an example. Field experiments usually involve treatments that are somehow “bundled.” By that I mean that the treatments involve multiple components. Each component could, in principle, play a causal role. A door to door canvassing treatment involves the personal contact as well as aspects of the content of the message. You could use various parsimonious characterizations to describe the treatment. You could call it a “personal appeals” treatment. Or you could characterize it in terms of the message content.
If you find an effect in this experiment, you leave open the possibility that personal appeals are effective. But the case is not closed, because it may be that the appeals are not what matters, but rather the content of the message. Having a control condition that holds the content fixed but varies only whether the message is delivered via a personal appeal helps to make the case that is it the personal nature of the appeal, per se, that matters. With this, we have established pretty strong evidence for the effectiveness of personal appeals in this particular setting.
How then to pursue external validity? We could run experiments with the same treatment-outcome configuration in new contexts. This would follow the first way of thinking about external validity. Another approach would be to consider other types of treatments that somehow involve isolating the effect of a personal appeal. We do not try to replicate the precise treatment-outcome configuration. Rather, we consider various treatment-outcome configurations that test the parsimonious proposition that “personal appeals can cause people to change their political behavior”. This gets us beyond a cookie cutter understanding of experimentation. It moves us toward a theory-building understanding of experimentation.
Understanding how a parsimonious proposition fares across various treatment-outcome configurations results in many implications. Understanding how a precise treatment-outcome configuration fares across various contexts has fewer implications. In this way, I judge the parsimonious proposition approach to be more valuable.
If you buy this argument, then for any empirical study we want to know, what are the parsimonious propositions that you’re testing?
- Neil Stenhouse (@n_stenhou) points me to a relevant paper by Wells & Windschitl (link). They discuss “stimulus sampling” as a way of constructing treatment-outcome configurations that allow one to isolate the relevant causal factor in a parsimonious proposition. They also point out that whether a given experiment generates evidence that speaks specifically to a parsimonious proposition is a question of “construct validity” (cf. Cook & Campbell, 1979: link). And here’s a neat paper Neil suggested on power analysis for sampled stimuli designs: link.
- Brendan Nyhan (@Brendan_Nyhan) points me to a recent debate in psychology over whether “conceptual replications” should be taken as evidence in favor of the validity of a given study: link1 link2. The challenge here, it seems, is the subjectivity involved in claiming that different treatment-outcome configurations test the same parsimonious proposition. I would say that the challenge could be addressed through ex ante discussion of whether such a claim makes sense.
- A subsequent post goes even further in discussing these points: link.