Suppose we have perfectly executed and perfectly consistent, balanced randomized control trials for a binary treatment applied to populations 1 and 2. Suppose that even the sample sizes are the same in each trial ($latex n$). We obtain consistent treatment effect estimates $latex \hat \tau_1$ and $latex \hat \tau_2$ from each, respectively, with consistent estimates of the asymptotic variances of $latex \hat \tau_1$ and $latex \hat \tau_2$ computed as $latex \hat v_1$ and $latex \hat v_2$, respectively. As far as asymptotic inference goes, suppose we are safe to assume that $latex \sqrt{n}(\hat \tau_1 – \tau) \overset{d}{\rightarrow} N(0, V_1)$ and $latex \sqrt{n}(\hat \tau_2 – \tau) \overset{d}{\rightarrow} N(0, V_2)$, with $latex N\hat v_1 \overset{p}{\rightarrow} V_1$ and $latex N\hat v_2 \overset{p}{\rightarrow} V_2$.* (This is pretty standard notation, where $latex \overset{d}{\rightarrow}$ is convergence in distribution, and $latex \overset{p}{\rightarrow}$ is convergence in probability, under the sample sizes for each experiment growing large.) Even with the same sample sizes in both population, we may have that $latex V_1 > V_2$, because outcomes are simply noisier in population 1. Suppose this is the case.

A standard meta-analytical effect synthesis will compute a synthesized effect by taking a weighted average where the weights are functions, either in part or in their totality, of the inverses of the estimated variances. That is, weights will be close or equal to $latex 1/\hat v_1$ and $latex 1/\hat v_2$. Of course, if $latex \tau_1 = \tau_2 = \tau$, then this inverse variance weighted mean is asymptotic variance-minimizing estimator for $latex \tau$. This is the classic minimum distance estimation result. The canonical econometrics reference for the optimality of inverse variance weighted estimator for general problems is Hansen (1982) [link], although it is covered in any graduate econometrics textbook.

But what if there is no reason to assume $latex \tau_1 = \tau_2 = \tau$? Then, how should we interpret the inverse variance weighted mean, which for finite samples would tend to give more weight to $latex \hat \tau_2$? Perhaps one could interpret it in Bayesian terms. From a frequentist perspective though, which would try to relate this to stable population parameters, it seems to be interpretable only as “a good estimate of what you get when you compute the inverse variance weighted mean from the results of these two experiments,” which of course gets us nowhere.

Now, I know that meta-analysis textbooks talk about how, when it doesn’t make sense to assume assume $latex \tau_1 = \tau_2$, one should seek to explain the heterogeneity rather than produce synthesized effects. But the standard approaches for doing so rely on assumptions of conditional exchangeability— that is, replacing $latex \tau_1 = \tau_2$ with $latex \tau_1(x) = \tau_2(x)$, where these are effects for subpopulations defined by a covariate profile $latex x$. Then, we effectively apply the same minimum distance estimation logic, using inverse variance weighting to compute the $latex \tau_2(x)$, most typically with an inverse variance weighted linear regression on the components of $latex x$. The modeling assumptions are barely any weaker than what one assumes to produce the synthesized estimate. So does this really make any sense either?

It seems pretty clear to me that the meta-analysis literature is in need of a “credibility revolution” along the same lines as we’ve seen in the broader causal inference literature. That means (i) thinking harder about the *estimands* that are the focus of the analysis, (ii) entertaining an assumption of *rampant effect heterogeneity*, and (iii) understanding the properties and robustness of estimators under (likely) *misspecification* of the relationship between variables that characterize the populations we study (the $latex X_j$s for populations indexed by $latex j$) and the estimates we obtain from them (the $latex \hat \tau_j$’s).

*Edited based on Winston’s corrections!