“Best evidence synthesis” as a better alternative to meta-analysis

For those involved in evidence review or meta analysis projects, I highly recommend a few (rather old, but still relevant) articles by Robert Slavin:

Slavin, R.E. (1984). Meta-Analysis in Education: How Has It Been Used? Educational Researcher 13(8):6-15. [gated link]
Slavin, R.E. (1986). Best-Evidence Synthesis: An alternative to meta-analytic and traditional reviews. Educational Research 15(9): 5-11. [gated link]
Slavin, R.E. (1995). Best evidence synthesis: An intelligent alternative to meta-analysis. Journal of Clinical Epidemiology 48(1):9–18.[gated link]

Slavin, who I take to be a strong proponent of good quantitative social research, makes some great points about how meta analysis has been misused in attempts to synthesize evidence on social programs. By my reading, what Slavin emphasizes is that fact that in very few cases will we have enough high quality and truly comparable studies for meta-analytic methods to be applied in a way that makes sense. That being the case, what we need is a scientifically defensible compromise between the typically unattainable ideal of a meta analysis on the one hand, and narrative reviews that have too little in the way of structure or replicability on the other.

Unfortunately, proponents of meta-analysis often suggest (see the literature that he cites) that only through the use of meta-analytic methods for effect aggregation and heterogeneity analysis can a literature review be considered as “scientific.” Those doing reviews are then faced with either doing a review that will be deemed “unscientific” or trying to apply meta analytic methods in situations where they shouldn’t be applied. Because of this pressure, we end up with reviews that compromise either on study quality standards or comparability standards so as to obtain a large enough set of studies to fulfill the sample size needs for conducting a meta analysis! These compromises are masked by the use of generic effect size metrics. This is the ultimate in the tail wagging the dog, and the result is a lot of crappy evidence synthesis (see the studies that he reviews in the 1984 article for examples). I’m inclined to view some attempts at applying meta analysis to development interventions (including some of my own!) in this light. See also the recent CBO review of minimum wage laws.

Slavin’s alternative is a compromise approach that replaces rote subservience to meta analysis requirements with scientific judgment in relation to a clear policy question. He recommends retaining the rigorous and replicable methods for searching the literature (including devising search strategies that attend to potential publication biases) that are the first stage of a meta analysis, but in a manner that is stringent in applying standards pertaining to study quality (internal validity) and relevance of the treatments (ecological validity) and gathers ample evidence to assess the applicability of results to the target context (external validity). The nature of the question will determine what is the “best evidence,” and the review should focus on such best evidence. From here, the method of synthesis and exploration of heterogeneity will depend on the amount of evidence available. It may be that only one or a few studies meets the stringent selection criteria, in which case the review should scrutinize the studies with reference to the policy questions at hand. In the very rare circumstance that many studies are available or in cases where micro data that capture substantial forms of heterogeneity are available, then statistical analyses may be introduced, but in a manner that is focused on addressing the policy questions (and not a cookbook application of homogeneity tests, random effects estimates, and so on). As Slavin writes “[no] rigid formula for presenting best evidence synthesis can be prescribed, as formats must be adapted to the literature being reviewed” (1986, p. 9).

For most policy questions in development, we face a paucity of high quality studies and limited comparability between such studies. Abandoning the meta analysis model in favor of Slavin’s more realistic but no less principled approach seems like the right compromise.

3 Replies to ““Best evidence synthesis” as a better alternative to meta-analysis”

Slavin faced an environment where much (most?) of the research was of dubious quality.

The solution? Only synthesize the best-quality research.

Yet, doing so means that research not selected for a review has zero weight in the conclusions of the synthesis. Moreover, it is not necessarily the case that lower quality research will reach worse conclusions about a phenomenon. It is entirely possible that quality has no bearing on a phenomenon. (Of course, if one is trying to determine whether there is a causal impact of some variable on another, then randomized-controlled trials [RCTs] are usually the best option.)

The problem is that so little research actually compares when methodological quality matters to the results that are obtained. I blog about the problem here and describe an article I’ve written with colleagues that addresses the issue directly (see this link. In fact, we recommend against a pure best-evidence strategy, arguing that scholars should make empirical comparisons of the results of controlled and uncontrolled trials. Sure, RCTs are helpful for inferences of whether a treatment is causally related to some problem, but in gauging what factors are related to heterogeneity in trial outcomes, we must often look more broadly. Non-RCTs might be employed in many cases for practical reasons, and provide valuable information about what treatments work, and, and perhaps more importantly, how much they work. Finally, RCTs might be RCTs, but that does not necessarily mean that they are superior along other dimensions. Randomization might permit stronger causal inferences, yet other flaws can overwhelm this advantage, meaning that carefully conducted trials may be more informative.

Since Slavin’s work two other approaches for synthesis of different types of evidence have been developed which have been producing credible and useful summaries of evidence. Realist synthesis focuses on understanding what works for whom in what circumstances, identifying the causal mechanisms that work in particular contexts, making this particularly useful for developing knowledge that can be translated to new settings. Best evidence synthesis, as developed by the New Zealand government, builds more on realist synthesis than Slavin’s approach – and adds a consultative process to both test the analysis and findings and explore ways to apply it. You can read more about these on the BetterEvaluation page on methods for synthesising evidence across evaluations http://betterevaluation.org/plan/synthesize_value/synthesize_across_evaluations.

Interesting that the same term is being used by both. I suppose this speaks to commonalities in aspirations (“best evidence…”) despite the differences in approaches.

Comments are closed.