Close elections in Africa: some important trends

With the Cote d’Ivoire election crisis having moved toward resolution, there is a lot of discussion about how to deal with the challenges posed by close, contested elections. For example, Knox Chitiyo provides a great analysis in a BBC report (link), emphasizing the need for creating “higher, independent judicial” bodies “to resolve post-electoral disputes”, and noting how international support for Ouattara in Cote d’Ivoire suggests a turn away from “the power-sharing default setting” that informed the approach to the recent election crises in Kenya (2008) and Zimbabwe (2008-9). Now is the time to think about a whole range of measures that can be used to minimize uncertainty about the validity of vote counts, and commit candidates to accepting validated results.

I thought I’d look at some data to help put the recent Cote d’Ivoire crisis into context. Conveniently enough, Staffan Lindberg at the University of Florida has provided a freely available African elections dataset covering 1969-2007 (link). The graphics posted below display some trends, using only data since 1980 (the pre-1980 data is quite patchy).

Figure 1 shows that close elections are increasingly the norm in Africa. The figure shows margins of victory for executive offices in both presidential and parliamentary elections since 1980. A trend line with error margins is overlaid (based on a loess fit). Whereas prior to 1990, landslides were the norm, since then close elections have become increasingly common. Interestingly, in very recent years, we see that there cease to be any “near 100%” margin-of-victory elections. This may reflect the effects of increased citizen awareness and activism, given that such outcomes are incompatible with free-and-fair elections process when there is any modicum of pluralism.

Figure 2 shows the flip side of the same coin, displaying trends in executive incumbent losses and resulting turnover since 1980. Consistent with what Figure 1 shows in terms of margins of victory, elections have become more competitive, with the proportion of elections resulting in executive turnover having risen from almost zero to about 25% as of 2007.

Figure 3 looks at how margins of victory and incumbent losses relate. In a free and fair system, there should be a systematic relationship between the two. Namely, a margin of victory of about 0 suggests a tie between the two front-runners. In such cases, assuming that the two front-runners have equal resources, it should be that each front runner has about a 50% chance of winning. In nearly all of the elections in these data, one of the front-runners is an incumbent or incumbent-party candidate. Thus, in close elections, we should see about a 50-50 split in whether or not there is an incumbent loss and consequent turnover of executive power.

Figure 3 shows that overall since 1980, this has not quite been the case, as incumbents and incumbent parties have lost only 40% of the time. However, when we break this out over time, we see that the pattern is converging to the expected outcome in fair elections. In 1980-1990, margins of victory were never close to zero, and so this phenomenon was unobservable. In 1990-2000, we see that there were many close elections, but that the outcomes were dominated by incumbents. Notice that with the exception of Niger 1993, those clumped very close to zero are almost entirely incumbent victories. This is suggestive of some electoral shenanigans. A possible story is that some portion of these elections were due to be incumbent losses, but that some form of fraud was perpetrated by incumbents (who, after all, are in a position of strength to do so) to ensure that the loss did not occur. This is just a conjecture, though. Note that such signs of incumbent advantage in close elections are not unique to Africa. For some US examples, see this past post (link).

But as the third graph in Figure 3 shows, in the past decade, the pattern expected of fair elections is evident. The predicted probability of an incumbent loss when the margin of victory is zero is 0.48, which is almost exactly the 0.50 that one would expect.

Although these data are too coarse to really allow us to tell what is going on, it does provide some reason for optimism about the effects of increased citizen awareness, increased opposition capacity building, and more benevolent international assistance in improving electoral outcomes in Africa.

Figure 1: Trends in margins of victory

Figure 2: Trends in incumbent losses and resulting executive turnover

Figure 3: Margins of victory and likelihood of incumbent turnover

Share

(technical) comparing neyman & heteroskedastic robust variance estimators

Here (link) is a note working through some algebra for comparing the following:

  1. The Neyman “conservative” estimator for the variance of the difference-in-means estimator for the average treatment effect. This estimator is derived by applying sampling theory to the case of a randomized experiment on a fixed population or sample. Hardcore experimentalists might insist on using this estimator to derive the standard errors of a treatment effect estimate from a randomized experiment. This is also known as the conservative “randomization inference” based variance estimator.

  2. The Huber-White heteroskedasticity robust variance estimator for the coefficient from a regression of an outcome on a binary treatment variable. This is a standard-use estimator for obtaining standard errors in contemporary econometrics. Taking from Freedman’s famous words, though, “randomization does not justify” this estimator.

If you work through the algebra some more, you will see that they are equivalent in balanced experiments, but not quite equivalent otherwise.

This post is part of a series of little explorations I’ve been doing into variance estimators for treatment effects. See also here, here, and here.

UPDATE 1 (4/8/11): A friend notes also that under a balanced design, the homoskedastic OLS variance estimator is also algebraically equivalent. When the design is not balanced, the homoskedastic and heteroskedastic robust estimators can differ quite a bit, with the latter being closer to the Neyman estimator, but still not equivalent to the Neyman estimator due to the manner in which treated versus control group residuals are weighted.

UPDATE 2 (4/12/11): The attached note is updated to carry through the algebra showing that the difference between the two estimators is very slight.

UPDATE 3 (4/12/11): A reader pointed out via email that this version of the heteroskedasticity robust estimator is known as “HC1”, and that Angrist and Pischke (2009) have a discussion of alternative forms (see Ch. 8, especially p. 304). From Angrist and Pischke’s presentation, we see that HC2 is exactly equivalent to the Neyman conservative estimator, and this estimator is indeed available in, e.g., Stata.

UPDATE 4 (4/12/11): Another colleague pointed out (don’t you love these offline comments?) that the Neyman conservative estimator typically carries an N/(N-1) finite sample correction premultiplying the expression shown in the note, in which case even in a balanced design, the estimators differ on the order of 1/(N-1). Later discovered that this was not true.

Share

(technical) imputation, ipw, causal inference (Snowden et al, 2011, with disc)

In the advance access pages of the American Journal of Epidemiology, Jonathan M. Snowden, Sherri Rose, and Kathleen M. Mortimer have a nice tutorial on what they refer to as “G-computation” for causal inference with observational data (ungated link). An average causal effect for a binary treatment can be defined as the average of individual level differences between the outcome that obtains when one is in treatment versus in control. Because people are either in treatment or control, one of these two “potential” outcomes is unobserved, or missing (within subjects designs do not overcome this, because the ordering of treatment assignment is itself another dimension of treatment). Given this “missing data” problem, G-computation refers to fitting models to available data that allow you to impute (i.e., predict) unobserved counterfactual values. You can then use this complete set of counterfactual values to estimate various types of causal effects. The idea isn’t so new or groundbreaking, but many theoretical insights have been elucidated only recently. Snowden et al’s presentation focuses on effects that average over the entire population.

These authors don’t cite it in their paper, but I think the most sophisticated application of this approach for a cross-sectional study is Jennifer Hill’s “Bayesian Nonparametric Modeling for Causal Inference” study (gated link). Hill uses the magical BART algorithm to fit a response surface and generate the imputations, from which various flavors of causal effect might be constructed (okay, full disclosure—Hill was one of my grad school teachers, but hey, BART is pretty cool). I understand that there has been a fair amount of application, at least beta-testing, of such counter-factual imputation methods in longitudinal studies as well, although I don’t have references handy.

This approach is especially appealing when you anticipate lots of measurable effect modification that you want to average-over in order to get average treatment effects. Actually, I think Snowden et al’s article does a good job of demonstrating how in some cases, it’s not classic confounding and omitted variable bias per se that is the major concern, but rather effect modification and effect heterogeneity (i.e., interaction effects) associated with variables that also affect treatment assignment. Traditional regression is clumsy in dealing with that. As far as I know conventional social science teaching, rooted as it is in constant effects models, does not have a catchy name for this kind of bias; maybe we can call it “heterogeneity bias.” Another thing that makes this kind of bias special relative to the usual kinds of confounding is that, as far as I understand, imputation-based strategies (like g-computation) that try to correct for it may in fact take advantage of measured heterogeneity associated with post-treatment variables. That is one of the reasons that these methods have appeal for longitudinal studies. (On this point, I’ll refer you to a little tutorial that I’ve written on a related set of methods—augmented inverse propensity weighting for attrition and missing data problems (link).)

Stijn Vansteelandt and Niels Keiding provide an invited commentary (gated link) on Snowden et al’s paper, and they make some really interesting points that I wanted to highlight. First, they note that imputation based strategies such as g-computation have a long history in association with the concept of “standardization.” More importantly are two points that they make later in their commentary. First is a point that Vansteelandt has made elsewhere, discussing the similarities and differences between imputation/standardization and inverse probability weighting:

The IPTW [inverse probability of treatment] approach is not commonly used in practice because of the traditional reliance on out- come-regression-based analyses, which tend to give more precise estimates. Its main virtue comes when the con- founder distribution is very different for the exposed and unexposed subjects (i.e., when there is near violation of the assumption of the experimental treatment assignment), for then the predictions made by the G-computation approach may be prone to extrapolate the association between out- come and confounders from exposed to unexposed subjects, and vice versa. The ensuing extrapolation uncer- tainty is typically not reflected in confidence intervals for model-based standardized effect measures based on tradi- tional outcome regression models, and thus the IPTW approach may give a more honest reflection of the overall uncertainty (provided that the uncertainty resulting from estimation of the weights is acknowledged) (19). A further advantage of the IPTW approach is that it does not re- quire modeling exposure effect modification by covariates and may thus ensure a valid analysis, even when effect modification is ignored.


I think this is an exceptionally important point, making clear that the apparent “inefficiency” of IP(T)W relative to imputation based methods is, in some sense, illusory. Vansteelandt and Keiding also discuss one approach to combining imputation and IPW in order to get the best of both worlds:

We here propose a compromise that combines the benefits of G-computation/ model-based standardization and of the IPTW approach. Its implementation is not more difficult than the implementa- tion of these other approaches. As in the IPTW approach, the first step involves fitting a model of the exposure on relevant covariates; this would typically be a logistic re- gression model. The fitted values from this model express the probability of being exposed and are commonly called ‘‘propensity scores.’’ They are used to construct a weight for each subject, which is 1 divided by the propensity score if the subject is exposed and 1 divided by 1 minus the pro- pensity score if the subject is unexposed. The second step involves fitting a model, the Q-model, for the outcome on the exposure and relevant covariates but using the afore- mentioned weights in the fitting procedure (e.g., using weighted least squares regression). Once estimated, the implementation detailed in the article by Snowden et al. (4) is followed; that is, counterfactual outcomes are pre- dicted for each observation under each exposure regimen by plugging a 1 and then subsequently a 0 into the fitted regression model to obtain predicted counterfactual outcomes. Finally, differences (or ratios) between the aver- age predicted counterfactual outcomes corresponding to different exposure regimens are calculated to arrive at a stan- dardized mean difference (or ratio) (see reference 19 for a similar implementation in the context of attributable fractions). We refer to this compromise approach as doubly robust standardization. Here, the name doubly robust expresses that doubly robust standardized effect measures have 2 ways to give the right answer: when either the Q-model or the propensity score model is correctly specified, but not necessarily both.


This approach has been demonstrated elsewhere—e.g. a recent paper by Vansteelandt and co-authors in the journal Methodology (ungated version, gated published). I am intrigued by this because it differs from the manner in which I have implemented doubly robust estimators that combine weighting and imputation (again, see link). I wonder if there is a difference in practice.

Share

race, belonging, & responses to adversity (Walton et al, forthcoming)

Findings from a paper by Walton and Cohen (link), forthcoming in Science, have very intriguing implications for how differences in racial groups’ past life experiences affect how they interpret present adversity, with consequences for motivation and success:

“We all experience small slights and criticisms in coming to a new school” said Greg Walton, an assistant professor of psychology whose findings are slated for publication in the March 18 edition of Science. “Being a member of a minority group can make those events have a larger meaning,” Walton said. “When your group is in the minority, being rejected by a classmate or having a teacher say something negative to you could seem like proof that you don’t belong, and maybe evidence that your group doesn’t belong either. That feeling could lead you to work less hard and ultimately do less well.”


The paper presents results from a social experiment in which,

Those in the treatment group read surveys and essays written by upperclassmen of different races and ethnicities describing the difficulties they had fitting in during their first year at school. The subjects in the control group read about experiences unrelated to a sense of belonging…The test subjects in the treatment group were then asked to write essays about why they thought the older college students’ experiences changed. The researchers asked them to illustrate their essays with stories of their own lives, and then rewrite their essays into speeches that would be videotaped and could be shown to future students. The point was to have the test subjects internalize and personalize the idea that adjustments are tough for everyone.


Outcomes:

The researchers tracked their test subjects during their sophomore, junior and senior years. While they found the social-belonging exercise had virtually no impact on white students, it had a significant impact on black students….[G]rade point averages of black students who participated in the exercise went up by almost a third of a grade between their sophomore and senior years. And 22 percent of those students landed in the top 25 percent of their graduating class, while only about 5 percent of black students who didn’t participate in the exercise did that well. At the same time, half of the black test subjects who didn’t take part in the exercise were in the bottom 25 percent of their class. Only 33 percent of black students who went through the exercise did that poorly….[T]he black students who were in the treatment group reported a greater sense of belonging…They also said they were happier and were less likely to spontaneously think about negative racial stereotypes. And they seemed healthier: 28 percent said they visited a doctor recently, as compared to 60 percent in the control group.


Of course we need to be careful in drawing conclusions about the “effects” of race, for reasons that have been discussed at length by proponents of the “manipulability” theory of causation (link, a theory that I find persuasive). In brief, race was not subject to experimental manipulation here. Our willingness to believe this interpretation of the results is based on plausible theoretical claims, but do not cleanly arrive as a result of the experimental design. Nonetheless, the results are quite suggestive of how one’s experience as a member of a stigmatized group can affect how you interpret adversity.

HT: Kim Yi Dionne (blog), The Situationist (link).

Share

hidden confounding in survey experiments

I had the opportunity to participate in a fun seminar via Skype with faculty and students at Uppsala University’s department of peace and conflict research. We were discussing exciting new avenues for using experimental methods to study microfoundations of conflict and IR theories. The discussion was led by Allan Dafoe (Berkeley, visiting at Uppsala), who is doing really interesting work on reputation and strategic interaction (link).

An interesting point on “hidden confounding” in survey experiments came up that I don’t think gets enough play in analyses of survey experiments, so I thought I’d relay it here as a reference and also to see if others have any input. A common approach in a survey experiment is to provide subjects with hypothetical scenarios. The experimental treatments then consist of variations on the content of the scenarios.

What makes this kind of research so intriguing is that it would seem that you can obtain exogenous variation in circumstances that rarely obtains in the real world. Thus, if your experiment involves a scenario about an international negotiation over a dispute, you could vary, say, the regimes from which the negotiators come in a manner that does not occur frequently in the real world.

The problem is that subjects come to a survey experiment with prior beliefs about “what things go with what”—that is, about how salient features correlate. In our example, people will tend to associate regime types with things like national wealth or region. In that case, by manipulating the negotiators’ regime types in the experiment, you are implicitly changing people’s beliefs about other features of the countries from which the negotiators come. You can try to hold these things “constant”—e.g., by having one treatment where negotiator A comes from a “rich democracy” and another where negotiator A comes from a “rich dictatorship”—but to the extent that you are creating a scenario that departs from what typically occurs in the real world, you might be causing the subject to wonder whether we are talking about some “unusual” circumstance. If so, the subject might apply a different evaluative framework than what the subject would apply to “usual” circumstances. Thus, you are obtaining a causal estimate that is dependent on the frame of reference, which may not be generalizable.

It’s a bit thorny, so what are solutions? Ironically, it seems to me that one solution would be to focus the experiment on treatments that are “plausibly exogenous.” One could focus on conditions that respond easily to choices, and where choices in either direction are conceivable. Or, one could focus the experiment on things that can vary randomly—like weather, most famously. I find this ironic because it seems that the survey experiment doesn’t get us very far from what we attempt to do with natural experiments. It would seem that the sweet spot for survey experiments would be for things that we are pretty sure could occur as a natural experiment, but either haven’t occurred often enough or haven’t been measured, in which case we can’t just study the natural experiment directly. Applying this rule would greatly limit the areas of application for survey experiments, but I think this formula would result in survey experiments that have more credible causal interpretations.

(By the way, Allan clued me into a discussion of this very point in a current working paper by Michael Tomz and Jessica Weeks: link.)

UPDATE: Allan provided this initial reaction:

I actually think the problem with survey experiments is a bit worse than you describe. It’s not just that confounding can be avoided in survey experiments by focussing on those factors that are plausibly manipulable; one has to vary factors that are in the population typically uncorrelated with other factors of interest, given the scenario. That is, one wants that the respondents believe Pr(Z|X1)=Pr(Z|X2) where X1 and X2 are two values of the treatment condition, and Z is any other factor of potential relevance that is not a consequence of treatment. For example, the decision of whether the US should stay in Afghanistan (X1) is plausibly manipulable and could plausibly go either way; Obama could decide to leave (X2). But even though such a counterfactual is plausible and could involve a hypothetical manipulation, we are unlikely to believe that Pr(Z|X1)=Pr(Z|X2), where Z could be the domestic support for war, or the strength of the US economy, or the resilience of the Taliban. So perhaps this implies that the only treatments that will not generate information leakage are either (1) those that are exogenous to begin with in the world (which are thus relatively easy to study using observational data), or (2) those that provide a compelling hypothetical natural experiment to account for the variation. So in this sense—perhaps I am actually just restating your main point—survey experiments only generate clear causal inferences if the key variation arises from a credible (hypothetical) natural experiment.
Share