Suppose we have perfectly executed and perfectly consistent, balanced randomized control trials for a binary treatment applied to populations 1 and 2. Suppose that even the sample sizes are the same in each trial (). We obtain consistent treatment effect estimates and from each, respectively, with consistent estimates of the asymptotic variances of and computed as and , respectively. As far as asymptotic inference goes, suppose we are safe to assume that and , with and .* (This is pretty standard notation, where is convergence in distribution, and is convergence in probability, under the sample sizes for each experiment growing large.) Even with the same sample sizes in both population, we may have that , because outcomes are simply noisier in population 1. Suppose this is the case.
A standard meta-analytical effect synthesis will compute a synthesized effect by taking a weighted average where the weights are functions, either in part or in their totality, of the inverses of the estimated variances. That is, weights will be close or equal to and . Of course, if , then this inverse variance weighted mean is asymptotic variance-minimizing estimator for . This is the classic minimum distance estimation result. The canonical econometrics reference for the optimality of inverse variance weighted estimator for general problems is Hansen (1982) [link], although it is covered in any graduate econometrics textbook.
But what if there is no reason to assume ? Then, how should we interpret the inverse variance weighted mean, which for finite samples would tend to give more weight to ? Perhaps one could interpret it in Bayesian terms. From a frequentist perspective though, which would try to relate this to stable population parameters, it seems to be interpretable only as “a good estimate of what you get when you compute the inverse variance weighted mean from the results of these two experiments,” which of course gets us nowhere.
Now, I know that meta-analysis textbooks talk about how, when it doesn’t make sense to assume assume , one should seek to explain the heterogeneity rather than produce synthesized effects. But the standard approaches for doing so rely on assumptions of conditional exchangeability— that is, replacing with , where these are effects for subpopulations defined by a covariate profile . Then, we effectively apply the same minimum distance estimation logic, using inverse variance weighting to compute the , most typically with an inverse variance weighted linear regression on the components of . The modeling assumptions are barely any weaker than what one assumes to produce the synthesized estimate. So does this really make any sense either?
It seems pretty clear to me that the meta-analysis literature is in need of a “credibility revolution” along the same lines as we’ve seen in the broader causal inference literature. That means (i) thinking harder about the estimands that are the focus of the analysis, (ii) entertaining an assumption of rampant effect heterogeneity, and (iii) understanding the properties and robustness of estimators under (likely) misspecification of the relationship between variables that characterize the populations we study (the s for populations indexed by ) and the estimates we obtain from them (the ‘s).
*Edited based on Winston’s corrections!