For those interested in some statistical self-flagellation, here’s a link to work in progress on estimation and inference with dyadic data, joint with Peter Aronow and Valentina Assenova: link. Dyadic data are ubiquitous in various fields of social science, including network sociology, international relations, and even research on “speed dating.” The problem of dyadic dependence complicates inference for such data. From what we’ve seen, most people either make hopeful assumptions about the nature of this dependence or just sweep it under the rug entirely. What we’ve done is to derive some results under highly “agnostic” assumptions, to show that on the one hand, the heavy parameterizations used in current approaches may be unnecessary, while on the other hand, ignoring dyadic dependence can be extremely misleading. We’re working on more applications and efficient software implementation. Comments appreciated.

### There is too much idolatry of whether contexts are “representative” or effects “generalizable”

From Wikipedia:

Idolatry is a pejorative term for the worship of an idol or a physical object such as a cult image as a god, or practices believed to verge on worship, such as giving honour and regard to created forms…. In current context, however, idolatry is not limited to religious concepts. It can also refer to a social phenomenon where false perceptions are created and worshipped….

In the recent past I reviewed a paper for an academic journal. The paper covered an interesting subject, it was well done, and so I recommended some revisions and that the author resubmit once those were done. Other reviewers disagreed, arguing most centrally that the context in which the study was undertaken was highly specific and therefore not “representative,” in which case the empirical results may not be “generalizable”. They recommended reject.

Even more recently on the blog, I pointed to Meyersson’s newly published paper on the effects of the rule of Islamic parties in Turkish municipalities (post). Meyersson’s most remarkable finding was that opportunities for women seemed to expand substantially under the municipal rule of Islamic parties. I received a few responses via Twitter and in person critiquing Meyersson’s findings, suggesting that the constellation of historical, economic, and institutional conditions in Turkey undermine the “generality” of these effects on women’s opportunity.

While I appreciate that academic papers sometimes underplay scope conditions for their results, I find such obsession with whether an empirical result “generalizes” or whether the empirical context is “representative” to be poorly motivated in many cases. First, there are no research designs or analytical methods that can reliably deliver “representative” or “generalizable” findings. For example, using “representative” data does not guarantee that your results will be representative even for units in your dataset. (See here for more: [link 1] [link 2] [link 3] [link 4].) To pursue a “representative” estimate is often to chase a mirage.

Second, working with “non-representative” groups may provide more theoretical traction. If existing theory suggests that effects should go one way with a particular group of units but you find the effects go the other way, well this is the kind of anomaly that allows theoretical elaboration to advance.

Third, it is often unclear as to what would be a “representative” or “general” case. Was the skepticism toward Meyersson’s paper coming from some implicit comparison to, say, Saudi Arabia? If so, why on earth should we take findings for Saudi Arabia to be “general” and dismiss those from Turkey as “idiosyncratic”? The fact that effects vary by context is interesting and worth understanding.

If the objective is to learn about such heterogeneity across contexts or, the other side of the coin, to demonstrate stability across contexts, then one should conduct studies that seek unusual contextual conditions!

### Does Islamic rule boost women’s opportunities? RD evidence from Turkey

From a remarkable study by Erik Meyersson in the new *Econometrica*, highlights from the abstract:

In 1994, an Islamic party [in Turkey] won multiple municipal mayor seats across the country. Using a regression discontinuity (RD) design, I compare municipalities where this Islamic party barely won or lost elections…The RD results reveal that, over a period of six years, Islamic rule increased female secular high school education. Corresponding effects for men are systematically smaller and less precise. In the longer run, the effect on female education remained persistent up to 17 years after, and also reduced adolescent marriages. An analysis of long-run political effects of Islamic rule shows increased female political participation and an overall decrease in Islamic political preferences. The results are consistent with an explanation that emphasizes the Islamic party’s effectiveness in overcoming barriers to female entry for the poor and pious.

Ungated version posted on Meyersson’s website: link. On the Econometrica website: link.

### Two year research scientist post at the Hertie School of Governance, Berlin

Piero (webpage) at Hertie forwarded this announcement:

The Hertie School has an opening beginning in April 2014 for a Research Scientist (f/m). We are looking for highly motivated candidates interested in conducting research on governance indicators for the School ́s Governance Report (www.governancereport.org) and related projects. The Governance Report seeks to understand the challenges of multi-actor, multi-level governance, analysing the reasons behind governance successes and failures, identifying and examining governance innovations, and developing governance indicators. The results are published in annual reports and other publications.

See more on the official announcement: link. Berlin is of course a fabulous city and this is interesting and engaged research.

### Should you use frequentist standard errors with causal estimates on population data? Yes.

Suppose you are studying the effects of some policy adopted at the state level in the United States, and you are using data from all 50 states to do it. Well,

When a researcher estimates a regression function with state level data, why are there standard errors that differ from zero? Clearly the researcher has information on the entire population of states. Nevertheless researchers typically report conventional robust standard errors, formally justified by viewing the sample as a random sample from a large population. In this paper we investigate the justification for positive standard errors in cases where the researcher estimates regression functions with data from the entire population. We take the perspective that the regression function is intended to capture causal effects, and that standard errors can be justified using a generalization of randomization inference. We show that these randomization-based standard errors in some cases agree with the conventional robust standard errors, and in other cases are smaller than the conventional ones.

From a new working paper on “Finite Population Causal Standard Errors ” by the econometrics all-star team of Abadie, Athey, Imbens, and Wooldridge (updated link): link.

I have been to a few presentations of papers like this where someone in the audience thinks they are making a smart comment by noting that the paper uses population data, and so the frequentist standard errors “don’t really make sense.” Abadie et al. show that such comments are often misguided, arising from a confusion over how causal inference differs from descriptive inference. Sure — there is no uncertainty as to what is the value of the regression coefficient for this population given the realized outcomes. But the value of the regression coefficient is not the same as the causal effect.

To understand the difference, it helps to define causal effects precisely. A causal effect for a given *unit* in the population is most coherently defined to be a comparison between the outcome observed under a given treatment (being the “state level policy” in the case of the example above) and what would obtain were that same unit to be given another treatment. It is useful to imagine this schedule of treatment-value-specific outcomes as an array of “potential outcomes.” *Population average* causal effects take the average of the unit level causal effects in a given population. Now, suppose that there is some random (at least with respect to what the analyst can observe) process through which units in the population are assigned treatment values. Maybe this random process occurred because a bona fide randomized experiment was run on the population, or maybe it was the result of “natural” stochastic processes (that is, not controlled by the analyst). Then, for each unit we only get to observe the potential outcome associated with the treatment *received*, and not the other “counterfactual” potential outcomes associated with the other possible treatments. As such, we cannot actually construct the population average causal effect directly. Doing so would require that we were able to compute each of the unit-level causal effects. So, we have to *estimate* the population average causal effect using the *incomplete* potential outcomes data available to us. If the results of the random treatment assignment processes had turned out differently, the estimate we would obtain could very well differ as well (since there would be a different set of observed and unobserved potential outcomes). Even though we have data from everyone in the population, we are lacking the full schedule of potential outcomes that would allow us to estimate causal effects without uncertainty.

As it turns out, the random treatment assignment process is directly analogous to the *random sampling of potential outcomes*, in which case we can use standard sample theoretic results to quantify our uncertainty and compute standard errors for our effect estimates. Furthermore, as a happy coincidence, such sample theoretic standard errors are algebraically equivalent or even conservatively approximated by the “robust” standard errors that are common in current statistical practice. (This previous sentence was revised so that its meaning is now clearer.) It’s a point that Peter Aronow and I made in our 2012 paper on using regression standard errors for randomized experiments (link), and a point that Winston Lin develops even further (link; also see Berk’s links in the comments below to Winston’s great discussion of these results). Abadie et al. take this all one step further by indicating that this framework for inference makes sense for observational studies too.

Now, some of you might know of Rosenbaum’s work (e.g. this) and think this has all already been said. That’s true, to a point. But whereas Rosenbaum’s randomization inference makes use of permutation distributions for making probabilistic statements about specific causal hypotheses, Abadie et al.’s randomization inference allows one to approximate the randomization distribution of effect estimates without fixing causal hypotheses a priori. (See more on this point in this old blog post, especially in the comments: link).