More on contagion from #Tunisia to #Egypt and beyond

Jonah Schulhofer-Wohl and Julia Choucair have a post at HuffPo on the current contagion of protest movements in the Arab world (link). As I noted in a previous post (link), from the perspective of current social science theory it isn’t at all obvious why events in Tunisia should inspire events in other countries in the region. I proposed that the contagion may be based on the events in Tunisia creating a “normative reset” moment, with Tunisian protesters having “established a new embodiment of dignity” to which others in similar circumstances are emotionally driven to live up. In addition to rational “focal point” effects and updating about the vulnerability of authoritarian regimes, Schulhofer-Wohl and Chocair propose a similar hypothesis to explain this contagion:

Tunisians’ sacrifices have created a new moral climate in the region. If Tunisians were willing to die for the future of their country, then citizens of other countries have to ask a new question about facing down their regimes. Rather than calculating the risks and rewards to participating in uprisings, the question now is: If Tunisians were willing to make this sacrifice, why shouldn’t I also be willing? Continuing sacrifices, now on the streets of Egypt, underscore it.

I find this to be a compelling hypothesis, worthy of more rigorous investigation. I find that it is consistent, for example, with the fact that the Iranian protests did not inspire protests elsewhere: the emotional connection to other peoples in the region would not have been as strong.

Share

Technical reading (non-Egypt): “Measuring school segregation” (Frankel and Volij, 2011)

The authors examine ways to measure how “segregated” is a school district. One could imagine complete segregation as being the case where each school in a district hosts a different ethnic group, and nonsegregation as the case where all schools in a district have the same ethnic distribution.

The authors propose a set of 6 axiomatic desiderata for measures of segregation, desiderata that appeal to intuitions about how a measure should be affected or not by certain changes in underlying conditions. For example, one axiomatic desideratum is called “symmetry”, which amounts to an ordering produced by a segregation measure being invariant to the renaming of the ethnic groups in question.

On these grounds, they find that a measure producing an ordering equivalent to that of the so-called Atkinson index (link) is necessary and sufficient to satisfy 5 out of the 6 desiderata, with the symmetry property being the one that is not satisfied. This strikes me as a major problem with Atkinson-type indices, as they require ad hoc decisions to combine or exclude ethnic categories in cases where districts differ in the combinations of groups that they contain.

The authors then discuss the appealing properties of orderings that are equivalent to that which is produced by the Mutual Information index. This index is an entropy (link) based measure that quantifies the reduction in uncertainty about a student’s race that comes from learning about what school she comes from; in a symmetric manner, it also equals “the reduction in uncertainty about a student’s school that comes from learning her race.” Measures that always produce an ordering equivalent to that which is produced by mutual information are necessary and sufficient for all 6 desiderata except the so-called “composition invariance” property. Composition invariance is a controversial property. It implies that the ordering imposed by the measure does not change when the size of an ethnic group in a given district is increased in a uniform manner in all schools in that district (e.g., if the number of whites increases by 10% in all schools in a district). Composition invariance runs counter to conceptualizations of segregation that emphasize “contact” between people of different ethnicities (the authors cite work by Coleman, Hoffer, and Kilgore). For this reason, I find mutual information-based measures to be especially appealing.

Clearly these measures can be applied to measuring any kind of segregation. A useful discussion.

Full citation:

David M. Frankel and Oscar Volij (2011) “Measuring school segregation,” Journal of Economic Theory, 146:1-38. (gated link)

Share

It isn’t obvious why #Tunisia inspired #Egypt.

After the Eastern European democratic uprisings that brought down the Eastern Bloc, social scientists set upon explaining how such sudden mass political movements could arise. Timur Kuran (1989) modeled “unanticipated uprisings” in a manner that echoed Granovetter (1978). In Granovetter’s model, we assume that people vary in their tolerance of risk. Some people are fearless, and will take to the streets at any instigation irrespective of the potential costs. If they do so, and the police do not respond with repression, then those who are slightly less risk tolerant may update their beliefs about just how risky it is to take to the streets, learning that the risks are actually not so high, and thus join the action on the streets. So long as the police continue to hold off on repression, a “cascade” may be triggered of people updating their beliefs and taking to the streets, with ever more risk averse people deciding that it is okay for them to jump on the bandwagon. Granovetter clarified how the essential feature here is the distribution of risk tolerance and the feedback loop that occurs when people take action and consequences are withheld. Kuran contributed to this line of thinking by showing how an individual may underestimate the number of people who share his or her disdain for the incumbent regime because people are not willing to share their true feelings (perhaps because of fear of being ratted out). This may lead people to over estimate the risk of taking to the streets. Once such a person receives adequate reassurance that his or her beliefs are shared, a Granovetter-like cascade can be triggered. The belief updating that takes place to propel this mechanism was formalized by Lohmann (1994).

So far so good for a single “unanticipated” revolution, as with Tunisia. But what of the spill-over effects? With the Eastern European revolutions, there was a strand that tied the regimes together: the Soviet Union. Ever since the Hungarian revolution of 1956, it may have been reasonable for citizens of Soviet bloc countries to think that ultimately, incumbents had a guarantor in Moscow. But Moscow’s non-response in Poland in 1989 may have led citizens of other European bloc countries to update their beliefs about Moscow’s willingness and ability to serve as guarantor, inspiring the eventual cascade.

But to what extent does this logic apply to the current uprisings? One thing that seemed obvious to those of us to who watched the developments in Tunisia a few weeks ago was that these demonstrations would likely spread to other countries in the region—perhaps not with the same outcome, but spread nonetheless. Why did we share this intuition? It does not seem to me that a Soviet-style strand ties these countries together. That role, I suppose, would have to be played in this case by the US and European powers. But did Egyptians really think events in Tunisia to be informative of a likely US response to protests in their country? If so, it is not quite of same flavor of beliefs about a Soviet guarantor. Maybe some other relevant beliefs were updated. For example, perhaps the protesters in Tunisia established a new embodiment of dignity, causing certain in Egypt to reassess their own actions and decide that they had to live up to this model. Or maybe it was a more emotional mechanism. It begs to be theorized, and this may even have us revise our accounts of what happened 20 years ago.

References

Granovetter, Mark. 1978. “Threshold models of collective behavior.” The American Journal of Sociology. 83:1420-1443. (ungated link)

Kuran, Timur. 1989. “Sparks and prarie fires: A theory of unanticipated political revolution.” Public Choice. 61:41-74. (gated link)

Lohmann, Susanne. 1994. “The dynamics of informational cascades: The Monday demonstrations in Leipzig, East Germany, 1989-1991.” World Politics. 47:42-101. (ungated link)

Share

#USAID #evaluation policy & essential features of a rigorous #impact evaluation policy

USAID has just released its new evaluation policy (link). I find this to be a very encouraging development. I have thought until now that USAID has lagged about a decade behind other major international and government development agencies in their evaluation practices. Part of this may be due to the fact that USAID relies extensively on commercial contractors to implement its programs, and so incentives are not well aligned to promote rigorous evaluation. This may continue to be the case, and as I discuss below, the “incentives” issue is the most crucial in making a rigorous evaluation policy work. But the policy statement suggests that USAID is taking some terrific steps forward. Some highlights from the policy document for impact evaluation are (i) that impact evaluation is defined in terms of comparisons against a counterfactual, (ii) an emphasis is placed on working out evaluation details in the project design phase, (iii) the document states unequivocally that experimental methods should be used for impact evaluation whenever feasible, and (iv) the document repeatedly emphasizes replicability.

Nonetheless, a line that concerns me about whether this will all work comes from Adminstrator Shah’s preface:

We will be unbiased, requiring that evaluation teams be led by outside experts and that no implementing partner be solely responsible for evaluating its own activities.

What we need to do is to unpack the idea of “outside experts,” and to ensure that this is not operationalized in a manner that contractors and program staff take to mean “external auditors,” but rather in a manner that translates into “incorruptible, but otherwise sympathetic, partners.” Let me explain.

First, let’s take a step back and ask, what’s needed for a rigorous impact evaluation policy to work for USAID? In my view, the most crucial is to ensure that USAID staff, contractors, and evaluators are poised to take advantage of solid evidence on how programs fail. Yes, fail.

To see why this is the case, consider an idealized version of what an impact evaluation is all about. An impact evaluation is not something to be added onto a program, in many ways, it is the program. That said, it doesn’t always make sense to do an impact evaluation (e.g., it may be excessive for an intervention that has a well-documented record of positive impacts and the intervention is being done at large scale). But when an impact evaluation is on order, what it does is to take a program and fashion it into a scientific hypothesis about how desired outcomes can be produced. The program/hypothesis is derived from assumptions and theories (perhaps latent) that lurk in the minds of those who draft the program proposal. (In that way, the program proposal is a theoretical exercise, and so if academics are engaged to participate in an impact evaluation, they really should be part of the proposal process, even though this does not happen regularly.) When the commissioning agency (e.g. USAID) signs off on the proposal, it is agreeing that this hypothesis is well-conceived. For an impact evaluation, the entire intervention is designed in such a way as to test this program/hypothesis. It may turn out that the hypothesis is false (or, more accurately, that we cannot convincingly demonstrate that it is not false). To demonstrate this, we need to know that the program/hypothesis was operationalized as intended and to have credible evidence of its impact. Then, ideally, incentives should be in place such that the demonstration of a program’s non-effectiveness is rewarded just as much as demonstration of effectiveness. In principle, there is no reason that this should not be the case. After all, the commissioning agency signaled that it found the program/hypothesis to be well-conceived and therefore should be quite willing to accept evidence to the contrary. But one does not get the sense that such negative findings are rewarded. Even when programs are implemented faithfully, my experience is that the prevailing sense is that contractors and all in the chain of the program management need to demonstrate effectiveness.

This being the case, a number of problems arise when “external evaluators” are actually understood by contractors and program staff to be “external auditors.” A first consequence is a bias in the information stream from program staff to evaluators: program staff will have good reason to hide negative information and present only positive information. They will have little reason to facilitate the evaluator’s work, since the harder time the evaluator has in doing a rigorous evaluation, the more tenuous will be the findings of the evaluation, and it’s less risky for a program manager to have a an evaluation with highly ambiguous conclusion than one that may produce clear evidence of benefits, but may just as well produce clear evidence of no benefits or even harm.

What do I see as ways to properly incentive rigorous impact evaluation? Two thoughts come to mind:

  1. Contractors, program staff, and evaluation consultants (especially academics) should have a deep collaborative relationship, working together in developing program proposals that are understood to be well-motivated scientific hypotheses that are to be tested in the field. The idea of deeper collaboration may be counter-intuitive to some. It may sound as if it is creating too “cozy” a relationship between evaluators and practitioners. But so long as credible evidence is genuinely valued by the commissioning agency, I see this kind of deep collaboration as far more promising than the kind of “external evaluation consultant” arrangements that tend to be used currently. This idea is what a previous post was all about (link).

  2. Contractors and USAID staff should be rewarded for being associated with impact evaluations that demonstrate how, in faithfully implemented programs, outcomes differ, sometimes for the worse, from expectations.

Share

Practitioners, academics, and development impact evaluation

Micro-Links forum is hosting a three-day online discussion on “Strengthening Evaluation of Poverty and Conflict/Fragility Interventions” (link). Based on my own experience in trying to pull off these kinds of evaluations, I wanted to chime in with some points about the nature of academics’ involvement in impact evaluations. A common approach for impact evaluation is for implementing organizations to team with academics. The nature of the relationship between practitioners and academics is a crucial aspect of the evaluation strategy. Let me propose this: for impact evaluations to work, it’s good to have everyone in the program understand that they are involved in a scientific project that aims to test ideas about “what works.”

A way that I have seen numerous impact evaluations fail is through a conflict that emerges between practitioners who understand their role as helping people irrespective of whether or not the manner in which they do it is good for an evaluation, and academics who see themselves as tasked with evaluating what the program staff are doing with little stake themselves in the program. This “division of labor” might seem rational, but I have found that it introduces tension and other obstacles.

An arrangement that seems to work better is for practitioners and academics to all see themselves as engaged together in (i) the conceptualization of the program itself; (ii) the elaboration of the details about how it should be implemented; and (iii) and then the methods for measuring impact. All three steps should harness both practitioners’ and academics’ technical and substantive knowledge. What does not seem to work very well is to try to force a division between tasks (i) and (ii) on the one hand, which are often “reserved” for practitioners to use their substantive knowledge, and (iii) on the other hand, which is reserved for academics to use their technical knowledge. Often times steps (i) and (ii) will have been worked out among practitioners from the commissioning and implementing agencies, and then academics will be consulted to realize step (iii). In my experience, this process rarely works out. There is a logic that needs to flow from conceptualization of the program to measurement of impacts, and this requires that academics and practitioners work together from square one, conceptualizing “what needs to be done” by the program all the way to designing a means for determining “whether it worked” to produce the desired change. To put it another way, impact evaluations are better when they are seen less as technical exercises tacked on to existing programs and more as rich, substantive exercises to design programs for testing ideas and discovering how to bring about the desired change.

For the academics, a fair representation of this might be for key actors involved in the implementation of the program to be considered as co-authors in at least some of the studies that emerge from the program; indeed, the assumptions and hypotheses on which programs are based are often drawn largely from the experiences and thoughts of members of program staff, and their input should be acknowledged. Along similar lines, practitioners should appreciate that when academics are engaged, their substantive expertise is a resource that should be tapped in developing the program itself, and that there needs to be a logic that holds between the design of the program itself and the evaluation. For all, this is a deeper kind of interaction between academics and practitioners than is often the case, though I should note that, e.g., Poverty Action Lab/IPA projects often operate with such richness and nuance.

Share