Reflecting on some of the recent discussions of matching as a tool for causal analyses in social science (see here as well as this really nice commentary—hat tip, Chris Blattman), I wonder if it’s useful to make a distinction between “positive” versus “negative” causal identification. Define “positive” causal identification as causal identification via some observable mechanism or source of exogenous variation. Controlled experiments clearly fall into this class. IV analyses and matching analyses that are pinned to measured sources of exogenous variation also fall into this class. RD analyses also fall into this class, as assignment is based on a measured forcing variable. A matching-based analysis that certifiably reproduces an assignment process and demonstrates that within matched strata or clusters, assignment was as if random falls into this class. With positive identification, matching may still be necessary for properly aggregating heterogenous treatment effects in order to consistently estimate a population average treatment effect or effect of the treatment on the treated. Or, it may be that exclusion is valid only within certain subgroups, and matching is a robust alternative to regression based adjustment to realize such exclusion.
“Negative” identification would refer to analyses based on a claim that all potential confounders (that is, variables that affect both assignment and outcomes) have been measured. There are no instruments or forcing variables. I call it “negative” identification because it based on an assertion of the absence of any confounders, rather than the presence of some verifiable source of exogenous variation. It seems pretty clear to me that negative identification requires that we take much more on faith. And yet, this has been the modal approach to causal inference in non-experimental research, although that has been changing since the onset of the “credibility revolution” in social science (link) over the past decade or so.
Is this a useful distinction?