#USAID #evaluation policy & essential features of a rigorous #impact evaluation policy

USAID has just released its new evaluation policy (link). I find this to be a very encouraging development. I have thought until now that USAID has lagged about a decade behind other major international and government development agencies in their evaluation practices. Part of this may be due to the fact that USAID relies extensively on commercial contractors to implement its programs, and so incentives are not well aligned to promote rigorous evaluation. This may continue to be the case, and as I discuss below, the “incentives” issue is the most crucial in making a rigorous evaluation policy work. But the policy statement suggests that USAID is taking some terrific steps forward. Some highlights from the policy document for impact evaluation are (i) that impact evaluation is defined in terms of comparisons against a counterfactual, (ii) an emphasis is placed on working out evaluation details in the project design phase, (iii) the document states unequivocally that experimental methods should be used for impact evaluation whenever feasible, and (iv) the document repeatedly emphasizes replicability.

Nonetheless, a line that concerns me about whether this will all work comes from Adminstrator Shah’s preface:

We will be unbiased, requiring that evaluation teams be led by outside experts and that no implementing partner be solely responsible for evaluating its own activities.

What we need to do is to unpack the idea of “outside experts,” and to ensure that this is not operationalized in a manner that contractors and program staff take to mean “external auditors,” but rather in a manner that translates into “incorruptible, but otherwise sympathetic, partners.” Let me explain.

First, let’s take a step back and ask, what’s needed for a rigorous impact evaluation policy to work for USAID? In my view, the most crucial is to ensure that USAID staff, contractors, and evaluators are poised to take advantage of solid evidence on how programs fail. Yes, fail.

To see why this is the case, consider an idealized version of what an impact evaluation is all about. An impact evaluation is not something to be added onto a program, in many ways, it is the program. That said, it doesn’t always make sense to do an impact evaluation (e.g., it may be excessive for an intervention that has a well-documented record of positive impacts and the intervention is being done at large scale). But when an impact evaluation is on order, what it does is to take a program and fashion it into a scientific hypothesis about how desired outcomes can be produced. The program/hypothesis is derived from assumptions and theories (perhaps latent) that lurk in the minds of those who draft the program proposal. (In that way, the program proposal is a theoretical exercise, and so if academics are engaged to participate in an impact evaluation, they really should be part of the proposal process, even though this does not happen regularly.) When the commissioning agency (e.g. USAID) signs off on the proposal, it is agreeing that this hypothesis is well-conceived. For an impact evaluation, the entire intervention is designed in such a way as to test this program/hypothesis. It may turn out that the hypothesis is false (or, more accurately, that we cannot convincingly demonstrate that it is not false). To demonstrate this, we need to know that the program/hypothesis was operationalized as intended and to have credible evidence of its impact. Then, ideally, incentives should be in place such that the demonstration of a program’s non-effectiveness is rewarded just as much as demonstration of effectiveness. In principle, there is no reason that this should not be the case. After all, the commissioning agency signaled that it found the program/hypothesis to be well-conceived and therefore should be quite willing to accept evidence to the contrary. But one does not get the sense that such negative findings are rewarded. Even when programs are implemented faithfully, my experience is that the prevailing sense is that contractors and all in the chain of the program management need to demonstrate effectiveness.

This being the case, a number of problems arise when “external evaluators” are actually understood by contractors and program staff to be “external auditors.” A first consequence is a bias in the information stream from program staff to evaluators: program staff will have good reason to hide negative information and present only positive information. They will have little reason to facilitate the evaluator’s work, since the harder time the evaluator has in doing a rigorous evaluation, the more tenuous will be the findings of the evaluation, and it’s less risky for a program manager to have a an evaluation with highly ambiguous conclusion than one that may produce clear evidence of benefits, but may just as well produce clear evidence of no benefits or even harm.

What do I see as ways to properly incentive rigorous impact evaluation? Two thoughts come to mind:

Contractors, program staff, and evaluation consultants (especially academics) should have a deep collaborative relationship, working together in developing program proposals that are understood to be well-motivated scientific hypotheses that are to be tested in the field. The idea of deeper collaboration may be counter-intuitive to some. It may sound as if it is creating too “cozy” a relationship between evaluators and practitioners. But so long as credible evidence is genuinely valued by the commissioning agency, I see this kind of deep collaboration as far more promising than the kind of “external evaluation consultant” arrangements that tend to be used currently. This idea is what a previous post was all about (link).
Contractors and USAID staff should be rewarded for being associated with impact evaluations that demonstrate how, in faithfully implemented programs, outcomes differ, sometimes for the worse, from expectations.