From Wikipedia:
Idolatry is a pejorative term for the worship of an idol or a physical object such as a cult image as a god, or practices believed to verge on worship, such as giving honour and regard to created forms…. In current context, however, idolatry is not limited to religious concepts. It can also refer to a social phenomenon where false perceptions are created and worshipped….
In the recent past I reviewed a paper for an academic journal. The paper covered an interesting subject, it was well done, and so I recommended some revisions and that the author resubmit once those were done. Other reviewers disagreed, arguing most centrally that the context in which the study was undertaken was highly specific and therefore not “representative,” in which case the empirical results may not be “generalizable”. They recommended reject.
Even more recently on the blog, I pointed to Meyersson’s newly published paper on the effects of the rule of Islamic parties in Turkish municipalities (post). Meyersson’s most remarkable finding was that opportunities for women seemed to expand substantially under the municipal rule of Islamic parties. I received a few responses via Twitter and in person critiquing Meyersson’s findings, suggesting that the constellation of historical, economic, and institutional conditions in Turkey undermine the “generality” of these effects on women’s opportunity.
While I appreciate that academic papers sometimes underplay scope conditions for their results, I find such obsession with whether an empirical result “generalizes” or whether the empirical context is “representative” to be poorly motivated in many cases. First, there are no research designs or analytical methods that can reliably deliver “representative” or “generalizable” findings. For example, using “representative” data does not guarantee that your results will be representative even for units in your dataset. (See here for more: [link 1] [link 2] [link 3] [link 4].) To pursue a “representative” estimate is often to chase a mirage.
Second, working with “non-representative” groups may provide more theoretical traction. If existing theory suggests that effects should go one way with a particular group of units but you find the effects go the other way, well this is the kind of anomaly that allows theoretical elaboration to advance.
Third, it is often unclear as to what would be a “representative” or “general” case. Was the skepticism toward Meyersson’s paper coming from some implicit comparison to, say, Saudi Arabia? If so, why on earth should we take findings for Saudi Arabia to be “general” and dismiss those from Turkey as “idiosyncratic”? The fact that effects vary by context is interesting and worth understanding.
If the objective is to learn about such heterogeneity across contexts or, the other side of the coin, to demonstrate stability across contexts, then one should conduct studies that seek unusual contextual conditions!
Cyrus, I think you are right on the money here concerning generalizability. However, many concerns about external validity are driven by the trend toward papers of the style “Pick a topic, search for data in a specific instance where identification is clear-cut, and then either make broad claims or completely punt on the underlying mechanism”. This style of social science became widespread over the past decade (it is easy to find these papers, since they almost always have the “: Evidence From” title!), and I think it really is damaging for the profession, since it represents the ultimate look-under-the-lamppost thinking even when we know we can do better at answering exact questions of interest using theory or alternative (less well-identified) data.
That said, I think the Turkey paper looks fine – the impact on Turkey’s women is in and of itself interesting! And some discussion of the underlying mechanism whereby we see this result in Turkey is interesting as well. You are right that generalizability concerns don’t seem serious here.
Thanks, Kevin. I think the “keys, lamppost” metaphor is too imprecise, however. If a treatment of theoretical interest is applied in a rarefied setting, the findings are still enormously useful. Not just that, but there really aren’t alternatives to the “evidence from…” studies in terms of empirical opportunities. To believe that there are is to fall for the illusion that we have tried to dispel in the paper linked above. So, that being the case, the question is how to make the most of these empirical opportunities. From my vantage point, using the most reliable empirical methods is of utmost importance. I imagine from your vantage point, giving adequate attention to theoretical questions is of utmost importance. The two goals can be compatible.
Thank you for writing this. Sometimes, people like my husband think that all statisticians are trying to do is draw broad generalizations. It’s good to point him to the fact that we are as motivated by micro-level, heterogenous studies as people in the humanities are – we are just using a different methodology.
Thanks Cyrus for this useful post. I enjoyed reading it. I also feel the different types of generalization and generalizability need to be considered. Statistical generalizability, which most people know, is not the only form of generalization. Here are the common types of generalizability: http://publication2application.org/2014/03/24/generalization-and-generalizability/