“Best evidence synthesis” as a better alternative to meta-analysis

For those involved in evidence review or meta analysis projects, I highly recommend a few (rather old, but still relevant) articles by Robert Slavin:

  • Slavin, R.E. (1984). Meta-Analysis in Education: How Has It Been Used? Educational Researcher 13(8):6-15. [gated link]
  • Slavin, R.E. (1986). Best-Evidence Synthesis: An alternative to meta-analytic and traditional reviews. Educational Research 15(9): 5-11. [gated link]
  • Slavin, R.E. (1995). Best evidence synthesis: An intelligent alternative to meta-analysis. Journal of Clinical Epidemiology 48(1):9–18.[gated link]

Slavin, who I take to be a strong proponent of good quantitative social research, makes some great points about how meta analysis has been misused in attempts to synthesize evidence on social programs. By my reading, what Slavin emphasizes is that fact that in very few cases will we have enough high quality and truly comparable studies for meta-analytic methods to be applied in a way that makes sense. That being the case, what we need is a scientifically defensible compromise between the typically unattainable ideal of a meta analysis on the one hand, and narrative reviews that have too little in the way of structure or replicability on the other.

Unfortunately, proponents of meta-analysis often suggest (see the literature that he cites) that only through the use of meta-analytic methods for effect aggregation and heterogeneity analysis can a literature review be considered as “scientific.” Those doing reviews are then faced with either doing a review that will be deemed “unscientific” or trying to apply meta analytic methods in situations where they shouldn’t be applied. Because of this pressure, we end up with reviews that compromise either on study quality standards or comparability standards so as to obtain a large enough set of studies to fulfill the sample size needs for conducting a meta analysis! These compromises are masked by the use of generic effect size metrics. This is the ultimate in the tail wagging the dog, and the result is a lot of crappy evidence synthesis (see the studies that he reviews in the 1984 article for examples). I’m inclined to view some attempts at applying meta analysis to development interventions (including some of my own!) in this light. See also the recent CBO review of minimum wage laws.

Slavin’s alternative is a compromise approach that replaces rote subservience to meta analysis requirements with scientific judgment in relation to a clear policy question. He recommends retaining the rigorous and replicable methods for searching the literature (including devising search strategies that attend to potential publication biases) that are the first stage of a meta analysis, but in a manner that is stringent in applying standards pertaining to study quality (internal validity) and relevance of the treatments (ecological validity) and gathers ample evidence to assess the applicability of results to the target context (external validity). The nature of the question will determine what is the “best evidence,” and the review should focus on such best evidence. From here, the method of synthesis and exploration of heterogeneity will depend on the amount of evidence available. It may be that only one or a few studies meets the stringent selection criteria, in which case the review should scrutinize the studies with reference to the policy questions at hand. In the very rare circumstance that many studies are available or in cases where micro data that capture substantial forms of heterogeneity are available, then statistical analyses may be introduced, but in a manner that is focused on addressing the policy questions (and not a cookbook application of homogeneity tests, random effects estimates, and so on). As Slavin writes “[no] rigid formula for presenting best evidence synthesis can be prescribed, as formats must be adapted to the literature being reviewed” (1986, p. 9).

For most policy questions in development, we face a paucity of high quality studies and limited comparability between such studies. Abandoning the meta analysis model in favor of Slavin’s more realistic but no less principled approach seems like the right compromise.

Share

What to read to understand Bayesian statistics?

Easy: Sander Greenland (link). I had read and worked with Bayesian methods for some time before encountering his work. (One of my dissertation committee members was kinda well-known in Bayesian methods: link.) As I saw it, the way Bayesian methods were developed in political science, my field of application, was mostly about building big latent variable models and then using a Bayesian approach mostly for reasons of computational practicality—that is, simulation methods were often easier than trying to characterize the likelihood, and so one just threw some non-informative priors at the problem to get the machinery going and then let JAGS or BUGS rip…. But my research didn’t really benefit from latent variable modeling. I also came to interpret the retreat to simulation as a case of people trying to fit models they didn’t understand, which struck me as a suspect.* Bayesian methods I concluded simply weren’t for me.

Then I read this: PDF Gated. And this: PDF Gated. (The latter builds on Efron and Morris’s seminal work on Stein estimation: PDF1 PDF2). The papers are simple and accessible, but no less deep for that. Greenland performs a wonderful synthesis of various hierarchical, empirical Bayes, and “informative priors” Bayes methods. After reading these I eagerly consumed pretty much anything he wrote and I recommend that you do the same.

Also worth reading is Efron and Greenland’s recent Stat. Sci. exchange (as well as others’ comments, including Gelman’s): link.

*This isn’t to impugn all Bayesian latent variable modeling of course! Heck, I am working with some text analysis methods now that benefit greatly from Bayesian latent variable methods, or at least variational approximations.

Share

Northeast Political Methodology annual meeting at NYU, May 2

Here are the details:

Date: Friday, May 2, 11:30am-6:30pm.

Location: Room 217 of the Politics Department (19 W. 4th Street – corner of W. 4th Street and Mercer Street).

Registration: link. (Note that registration will close April 28.)

Agenda

11:30 – 12:30: Lunch

12:30 – 1:30: Bruce Desmarais, Department of Political Science, U-Mass Amherst: “Communication Network Content and Structure: A Modeling Approach with Application to Gender Mixing in Local Government Internal E-Mail Communication”

1:45 – 2:45: Elizabeth Ogburn, Department of Biostatistics, Johns Hopkins University: “Causal and Statistical Inference for Network Dependent Data”

3:00 – 4:00: Erin Hartman, Bluelabs: TBA

4:15 – 5:15: Brian Keegan, Northeastern University: “Get Back! You Don’t Know Me Like That: The Social Mediation of Fact Checking Interventions in Twitter Conversations”

5:30 – 6:30: Post Paper Discussion Event

6:45: Dinner for Speakers and Invited Faculty Guests

Share

How to judge a theoretical model

Theoretical models are simplified approximations of the world, or more specifically thought experiments predicated on assumptions that try to approximate first order aspects of behavior. As an empiricist I cannot knock theorists for using approximations—heck I do it all the time too (see, asymptotics, or the delta method). But we should ask whether the approximations are reasonable against what we see in the real world. That is,

  1. If verisimilitude is not a criterion for assumptions, any result can be reverse engineered by picking the assumptions that deliver the result.

  2. If any result can be engineered then results themselves have no special ontological status.

  3. Exploring the implications of assumptions for its own sake can be technically demanding but no more credible as a way to map reality for that. This practice generates “bookshelf” models whose practical utility depends on filtering the assumptions and implications against data and our beliefs about the real world. Without filtering we are building a Tower of Babel (or maybe an art museum). (Note this goes beyond Friedman’s famous arguments about predictive validity without regard for assumptions: we need to filter assumptions too because the implications of a model are not unique to one set of assumptions, by 1 and 2.)

  4. How complicated can the problems be that we allow our agents to solve in a model? Is a dynamic program ever admissible as a reasonable assumption on the objective function of an agent? That depends on the situation. If the goal that the agent is seeking is sufficiently clear (albeit complicated to achieve) and the agent has lots of opportunity to experiment and come upon something that works well, it may be reasonable to assume that the agents’ actions will converge in a way that it appears as if it is solving such a program. The validity of the “as if” assumption should be vetted in this way though.

All of this from an essay by Paul Pfleiderer on “chameleon” models and the misuse of theory in economics: link. (Ht @noahpinion)

Share

Why do countries work so hard to *lose* their access to World Bank loans?

scheme In a new working paper posted to SSRN, Peter Aronow, Allison Carnegie and I propose an answer to the puzzle: by doing so, countries can reap “status” gains that outweigh the material costs of losing access to loans [SSRN link].

The World Bank loans program is an interesting setting for analyzing how international pressures affect the behavior of governments. This is because the terms of loans and other sorts of support that the World Bank offers depend on how a country falls along a fixed schedule of income classifications. The figure above illustrates the classifications as they were applied in the year 2000. Our paper focuses on what happens when countries cross the threshold shown at $5225 GNI/capita in the figure. Here, countries are made eligible for “graduation,” which, when achieved, means that they can no longer receive the type of generosity that the Bank provides to middle income and lower income countries. We reviewed the case histories of countries crossing this threshold and found, quite curiously, that countries seem always to welcome this offer to graduate. The case histories provide no evidence of a tendency for countries to try to avoid or stall graduation even though this means losing access to benefits.

Moreover, the clear cut nature of the classification rules allows us to use a regression discontinuity design to study just how countries react to crossing the threshold—that is, to obtain a very credible estimate of the causal effect of becoming eligible to graduate. We find, remarkably, that countries tend to react by liberalizing. This is not what we expected: we expected to see that countries would react in a manner indicative of becoming extra sensitive to risks, perhaps even by reigning in liberties.

We investigated various possible explanations for this seemingly puzzling behavior, and the one that seems best supported by the evidence is that governments view graduation as an opportunity to increase their institutionally conferred status and join the “club” of developed nations. The liberalization that we witness is part of that exchange, given the hegemony of liberal western governments in defining terms of “success” within the World Bank.

Share