Tracy L. Nolen and Michael G. Hudgens have new paper posted to JASA’s preprint website (gated link, ungated preprint) on randomization inference in situations where intermediate post-treatment outcomes are important in defining causal effects. Their motivating example is one where we want to see how a medical treatment affects people’s recovery from an infection, but infection status is something that is itself affected by the treatment. Other examples where post-treatment outcomes are important are estimating causal effects under noncompliance and related instrumental variables methods (classic paper link), as well as the “truncation by death” situation (link) for which causal effects are only meaningful for certain endogenously revealed subpopulations. In these cases, principal strata refer to subpopulations that are distinguished by intermediate potential outcomes. The key contribution here is to develop exact tests for principal strata estimation. The authors want to use exact tests, rather than asymptotic-frequentist or Bayesian approaches, because exact tests have better type-I/type-II error performance in small samples, and many principal strata situations involve making inferences on small subgroups of, possible already-small, subject pools.

To formalize their argument a bit, let refer to a subject’s treatment status, refer to a subject’s infection status (observed after treatment), and refer to a subject’s outcome given infection and treatment statuses. We are interested in the effect of treatment on progress after infection:

.

(Clearly this estimand is only meaningful for those that could be infected under either condition.) But,

and

for in treated and in control, because is endogenous to . Thus, the expression,

for in treated and in control does not estimate the effect of interest. In terms of principal strata, is an element in a sample from the mixed population of people for whom only when (the “harmed” principal stratum) or irrespective of (the “always infected” principal stratum), while is an element in a sample from the mixed population of people for whom only when (“protected”) or irrespective of (“always infected”). ~~The two mixed populations are thus different and it is reasonable to expect that treatment effects also differ across these two subpopulations. For example, imagine that “the harmed” are allergic to the treatment but otherwise very healthy, and so the treatment not only causes the infection but leads to an allergic reaction that catastrophically interferes with their bodies’ responses to infection. In contrast, suppose the “protected” are in poor health; their infection status may respond to treatment, but their general prognosis is unaffected and is always bad. Finally, suppose the “always infected” do not respond to treatment in their outcomes either. Here, on average, the treatment is detrimental due to the allergic response among “the harmed”. But if one estimates a treatment effect by comparing these two subpopulations, one may find that the treatment is on average benign. This is a made-up example, but does not seem so far-fetched at all.~~ [**Update**: Upon further reflection, I realize that the preceding illustration that appeared in the original post had a problem: it failed to appreciate that the causal effect of interest here is only properly defined for members of the “always infected” population. The point about bias still holds, but it arises because one is not simply taking difference in means between treated and control “always infected” groups, but rather between the two mixed groups described above. The problem, then, is to find a way to isolate the comparison between treated and control “always infected” groups, removing the taint introduced from the presence of the “harmed” subgroup among the treated and the “protected” subgroup among the control. This is interesting, because it is precisely the *opposite* of what one would want to isolate in a LATE IV analysis. Nonetheless, the identifying condition is the same — as discussed below, it hinges on monotonicity.]

The authors construct an exact test for the null hypothesis of no treatment effect within a given principal stratum under a *monotonicity* assumption that states that the treatment can only affect infection status in one direction (essentially, this is the same “no defier” monotonicity assumption that Angrist, Imbens, and Rubin use to identify the LATE IV estimator). This rules out the possibility of anyone being in the “harmed” group. The assumption thus allows you to bound the number of people in each of the remaining principal strata (“always infected”, “protected”, and “never infected”). Then, an exact test can be carried out that computes the maximum exact test p-value under all principal stratum assignments consistent with these bounds. The analysis can assess consequences of violations of monotonicity through a sensitivity analysis: the proportion of “harmed” can be fixed by the analysts and the test re-computed to assess robustness.

An alternative approach used in the current literature is what is called the “burden of illness” approach (BOI). BOI collapses intermediate and endpoint outcomes into a single index and then carries out an ITT analysis on this index. The authors find that their exact test on principal strata has substantially more power than BOI ITT analysis. The authors also show that Rosenbaum (2002) style covariate adjustment can be applied (regress outcomes on covariates, perform exact test with residuals), and the usual inverted exact test confidence intervals can be used, both with no added complication.

It’s a nice paper, related to some work that some colleagues and I are currently doing on randomization inference. Exact tests are fine for null hypothesis testing, but I am not at all sold on constant-effects-based inverted exact tests for confidence intervals. Certainly for moderate or large samples, there is no reason at all to use such tests, which can miss the mark badly. Maybe for small samples though you don’t really have a choice.