Practitioners, academics, and development impact evaluation

Micro-Links forum is hosting a three-day online discussion on “Strengthening Evaluation of Poverty and Conflict/Fragility Interventions” (link). Based on my own experience in trying to pull off these kinds of evaluations, I wanted to chime in with some points about the nature of academics’ involvement in impact evaluations. A common approach for impact evaluation is for implementing organizations to team with academics. The nature of the relationship between practitioners and academics is a crucial aspect of the evaluation strategy. Let me propose this: for impact evaluations to work, it’s good to have everyone in the program understand that they are involved in a scientific project that aims to test ideas about “what works.”

A way that I have seen numerous impact evaluations fail is through a conflict that emerges between practitioners who understand their role as helping people irrespective of whether or not the manner in which they do it is good for an evaluation, and academics who see themselves as tasked with evaluating what the program staff are doing with little stake themselves in the program. This “division of labor” might seem rational, but I have found that it introduces tension and other obstacles.

An arrangement that seems to work better is for practitioners and academics to all see themselves as engaged together in (i) the conceptualization of the program itself; (ii) the elaboration of the details about how it should be implemented; and (iii) and then the methods for measuring impact. All three steps should harness both practitioners’ and academics’ technical and substantive knowledge. What does not seem to work very well is to try to force a division between tasks (i) and (ii) on the one hand, which are often “reserved” for practitioners to use their substantive knowledge, and (iii) on the other hand, which is reserved for academics to use their technical knowledge. Often times steps (i) and (ii) will have been worked out among practitioners from the commissioning and implementing agencies, and then academics will be consulted to realize step (iii). In my experience, this process rarely works out. There is a logic that needs to flow from conceptualization of the program to measurement of impacts, and this requires that academics and practitioners work together from square one, conceptualizing “what needs to be done” by the program all the way to designing a means for determining “whether it worked” to produce the desired change. To put it another way, impact evaluations are better when they are seen less as technical exercises tacked on to existing programs and more as rich, substantive exercises to design programs for testing ideas and discovering how to bring about the desired change.

For the academics, a fair representation of this might be for key actors involved in the implementation of the program to be considered as co-authors in at least some of the studies that emerge from the program; indeed, the assumptions and hypotheses on which programs are based are often drawn largely from the experiences and thoughts of members of program staff, and their input should be acknowledged. Along similar lines, practitioners should appreciate that when academics are engaged, their substantive expertise is a resource that should be tapped in developing the program itself, and that there needs to be a logic that holds between the design of the program itself and the evaluation. For all, this is a deeper kind of interaction between academics and practitioners than is often the case, though I should note that, e.g., Poverty Action Lab/IPA projects often operate with such richness and nuance.

Share

Desegregation led to an unintended concentration of public resources for blacks and poor? (Reber, 2011)

School desegregation might have induced unintended behavioral responses of white families as well as state and local governments. This paper examines these responses and is the first to study the effects of desegregation on the finances of school districts. Desegregation induced white flight from blacker to whiter public school districts and to private schools, but the local property tax base and local revenue were not adversely affected. The state legislature directed significant new funding to districts where whites were particularly affected by desegregation. Desegregation therefore appears to have achieved its intended goal of improving resources available in schools that black attended.

From an interesting new paper by Sarah Reber in The Review of Economics and Statistics on desegregation achieving its goals via an unintended concentration of public resources (link).

Share

Extraction ratio as a political measure of income inequality (Milanovic et al, 2011)

A Fine Theorem blog (link) points to an interesting new paper on “Pre-Industrial Inequality” by Branko Milanovic, Peter H. Lindert, and Jeffrey G. Williamson forthcoming in The Economic Journal (link). The paper defines a new measure of income inequality called the “extraction ratio.” The extraction ratio is the ratio of the measured Gini coefficient to the “maximum possible Gini coefficient” that would obtain were all available societal surplus to be in the hands of a vanishingly small elite. The motivation for the extraction ratio measure is as follows: Consider a society divided into a poor and rich class, and for illustration suppose that the poor each earn P per year and the rich each earn R (that is, incomes are constant within classes) with P << R. If P is fixed to subsistence level, then the surplus that the rich enjoy relative to the poor (R-P) is a linear function of total income in society. As such, levels of inequality as measured by, say, the Gini coefficient are also a function total income. Two societies may exhibit class stratification that is similarly reprehensible in that in both cases, only the rich enjoy any surplus income over the subsistence level, but they may differ greatly in their Gini measures; or, two societies may have the same Gini coefficient, but differ in whether the poor obtain any surplus over subsistence. The extraction ratio allows one to distinguish these cases. The cases would seem to imply very different political circumstances, with a higher extraction ratio being associated with a more exploitative society, intuitively. The authors find that while the distribution of Gini coefficients does not differ so much between the pre-industrial and post-industrial ages, extraction ratios tended to be quite a bit higher in the pre-industrial age. Many studies attempt to correlate inequality as measured by the Gini coefficient to political outcomes, with very mixed results. It would be interesting to see if this new measure produces different insights.

Share

Statistical significance goes before the Supreme Court

The Economist View blog posts a letter from economist Steve Ziliak (link) describing a case due to be argued before the Supreme Court next week on whether “drug manufacturers and other companies [should] be required to report the adverse effect of a product on users, if the effect is not statistically significantly different from zero at the 5% level.” Briefs presented before the court commenting on the case are here (link). The blog post also links to this page, as well as to some of Ziliak’s writing on significance testing.

Share

Reading: 7 Properties of Good Models (Gabaix & Laibson, 2008)

This short essay argues that the following criteria should be used to judge whether an analytical economic model is good or not:

  1. parsimony, viz., minimal assumptions and parameters, to reduce risk of overfitting. This would seem to be the essence of modeling, right?
  2. tractability.
  3. conceptual insightfulness, which in the authors’ characterization bears some resemblance to Lakatos’s axiom that a scientific theory should produce “novel facts”.
  4. generalizability.
  5. falsifiability.
  6. empirical consistency.
  7. predictive precision, which is a necessary complement to falsifiability and empirical consistency: a model that makes vague predictions may hold up against the data, but a more useful model might be one that makes sharp predictions that are only slightly off from the data.

The authors acknowledge that these criteria may conflict, forcing trade-offs. Special tensions would seem to arise between parsimony/tractability and falsifiability/empirical consistency/predictive precision.

In their discussion, the authors claim that economic models should not be judged on whether they satisfy optimization axioms. They wish to create space for models that allow a separation between the normative preferences of agents and the actions that they ultimately take—the separation may be due to non-voluntary errors, biases, or emotions. Abandoning optimization axioms means that behavior does not immediately reveal preferences, which complicates normative analysis. The authors accept this, claiming that instead, we should specify models that incorporate parameters capturing non-voluntary processes, and then use data to identify “latent” preferences after conditioning on estimates of these parameters.

Full reference: Gabaix, Xavier, and David I. Laibson. 2008. “The Seven Properties of Good Models.” In The Foundations of Positive and Normative Economics, ed. Andrew Caplin and Andrew Schotter, 292–99. New York: Oxford University Press.

Ungated link: http://bit.ly/eL88IB

Share