Suppose we have perfectly executed and perfectly consistent, balanced randomized control trials for a binary treatment applied to populations 1 and 2. Suppose that even the sample sizes are the same in each trial (). We obtain consistent treatment effect estimates and from each, respectively, with consistent estimates of the asymptotic variances of and computed as and , respectively. As far as asymptotic inference goes, suppose we are safe to assume that and , with and .* (This is pretty standard notation, where is convergence in distribution, and is convergence in probability, under the sample sizes for each experiment growing large.) Even with the same sample sizes in both population, we may have that , because outcomes are simply noisier in population 1. Suppose this is the case.

A standard meta-analytical effect synthesis will compute a synthesized effect by taking a weighted average where the weights are functions, either in part or in their totality, of the inverses of the estimated variances. That is, weights will be close or equal to and . Of course, if , then this inverse variance weighted mean is asymptotic variance-minimizing estimator for . This is the classic minimum distance estimation result. The canonical econometrics reference for the optimality of inverse variance weighted estimator for general problems is Hansen (1982) [link], although it is covered in any graduate econometrics textbook.

But what if there is no reason to assume ? Then, how should we interpret the inverse variance weighted mean, which for finite samples would tend to give more weight to ? Perhaps one could interpret it in Bayesian terms. From a frequentist perspective though, which would try to relate this to stable population parameters, it seems to be interpretable only as “a good estimate of what you get when you compute the inverse variance weighted mean from the results of these two experiments,” which of course gets us nowhere.

Now, I know that meta-analysis textbooks talk about how, when it doesn’t make sense to assume assume , one should seek to explain the heterogeneity rather than produce synthesized effects. But the standard approaches for doing so rely on assumptions of conditional exchangeability— that is, replacing with , where these are effects for subpopulations defined by a covariate profile . Then, we effectively apply the same minimum distance estimation logic, using inverse variance weighting to compute the , most typically with an inverse variance weighted linear regression on the components of . The modeling assumptions are barely any weaker than what one assumes to produce the synthesized estimate. So does this really make any sense either?

It seems pretty clear to me that the meta-analysis literature is in need of a “credibility revolution” along the same lines as we’ve seen in the broader causal inference literature. That means (i) thinking harder about the *estimands* that are the focus of the analysis, (ii) entertaining an assumption of *rampant effect heterogeneity*, and (iii) understanding the properties and robustness of estimators under (likely) *misspecification* of the relationship between variables that characterize the populations we study (the s for populations indexed by ) and the estimates we obtain from them (the ‘s).

*Edited based on Winston’s corrections!

WinstonCyrus, I’m guessing you know some or all of these references, but in case any of them help give moral support:

Freedman and Berk, “Statistical assumptions as empirical commitments” (“Just say no” to meta-analysis.)

Freedman, “Oasis or mirage?” (Annotated references point to several critiques of meta-analysis.)

Briggs, “Meta-analysis: A case study” (“Researcher subjectivity is no less problematic in the context of a meta-analysis than in a narrative review.”)

Berk, “Statistical inference and meta-analysis” (with discussion), J Exp Criminol (2007) 3:247-297. (Comments by Lipsey and Shadish are thoughtful and witty, but I’m more persuaded by Berk’s rejoinder: “My take is that by and large, the meta-analyses used to inform social science and public policy do not deliver on their claims. On the matter of whether they are good enough, my view is that they typically are not even close. I would be delighted to be proved wrong.”)

Minor point: on the asymptotics, I think you meant to write something like suppose we’re safe to assume sqrt(n) (tau1hat – tau) converges in distribution to N(0, V1) with n times v1hat converging in probability to V1, etc.

CyrusPost authorGreat references, Winston.

And thanks for catching the rookie mistake with the asymptotics! I’ve edited those in the post.

WinstonJust remembered this NYT article from 5 years ago:

“Dr. Bruce P. Barrett, an associate professor of family medicine at the University of Wisconsin who was not involved with the review, said he was not convinced of the value of combining the studies in a single analysis.

” ‘If you’re testing the same intervention on the same population using the same outcome measures, then meta-analysis is a very good technique,’ Dr. Barrett said. ‘But here every one of those things fails.’ “