Here’s a great quote from Pearl and Bareinboim (2014, p.2) [link] in their analysis of “external validity” and conditions that allow for one to transport the results of a causal analysis from one context to another:
[The literature on external validity] consists primarily of threats, namely, explanations of what may go wrong when we try to transport results from one study to another while ignoring their differences. Rarely do we find an analysis of “licensing assumptions,” namely, formal conditions under which the transport of results across differing environments or populations is licensed from first principles.
The reasons for this asymmetry are several. First, threats are safer to cite than assumptions. He who cites “threats” appears prudent, cautious and thoughtful, whereas he who seeks licensing assumptions risks suspicions of attempting to endorse those assumptions.
Second, assumptions are self destructive in their honesty. The more explicit the assumption, the more criticism it invites, for it tends to trigger a richer space of alternative scenarios in which the assumption may fail. Researchers prefer therefore to declare threats in public and make assumptions in private.
Third, whereas threats can be communicated in plain English, supported by anecdotal pointers to familiar experiences, assumptions require a formal language within which the notion “environment” (or “population”) is given precise characterization, and differences among environments can be encoded and analyzed.
There are so many truths in there that extend beyond research on external validity.
A working paper by UMich grad student Jason Kerwin considers how “fatalistic” thinking can lead even rational individuals to increase risky behavior when they learn that risks are higher than they thought. It sounds crazy, but the investigation is motivated by some interesting empirical patterns, including the fact that in a recent survey in Malawi, overestimation of HIV risks from unprotected sex was associated with higher rates of engagement in unprotected sex. How can this be? Kern crystalizes the logic as follows. Suppose that you tell a potential risk taker that the true risk of contracting HIV is higher than they thought. Well,
a change in the per-act risk affects not only the marginal cost of the acts the agent is deciding over, but also a stock of previously-chosen acts over which one no longer has any control. If an agent’s perceived per-sex-act risk of contracting HIV rises, this has a direct effect of increasing the marginal cost (in expected utility) of having more risky sex. But it also increases the probability that the agent already has HIV, which decreases the marginal cost of more risky sex. When the second effect dominates, increases in perceived risks will lead to more risk-taking rather than less.
Kerwin develops the logic formally by modeling perceived risk of HIV infection in terms of a cumulative distribution function that incorporates not only the next act in question, but all past acts. Such CDFs typically have inflection points and become concave in their upper reaches. So, an upward shock to someone’s belief about where they stand currently can result in a diminishment in the relative magnitude of added risk relative to any benefits whose value is unaffected by the belief shock. When this occurs, the effect of increasing perceived risk is to increase the attractiveness of the risky behavior.
Paper here: link
I’ve found myself explaining these things to colleagues a few times in the past month, with the response always being “wow, that makes a lot of sense — I never considered that” — so I thought I’d try to share more broadly here.
After attending lots of conferences and workshops, I have learned what formats actually work to produce meaningful discussions of papers and thus useful feedback for presenters. I am not alone in this — everyone, and I mean everyone, with whom I have participated under these formats agrees with this sentiment. The ideas aren’t my own but rather inherited from colleagues who participate in EGAP (link) and CAPERS (link) and who also come together on occasion at the big conferences like APSA and MPSA.
In my home discipline, political science, conferences and workshops are usually organized around the following two “traditional” formats:
- Short format panel presentations with discussant: typically a 1-2 hour session with authors of 3-5 papers given about 15 minutes each to present their papers, followed by a discussant or two providing summary comments on all of the papers, followed by floor Q&A.
- Long format presentation with discussant: typically about a 1 hour session with author of paper given about 30 min to present a paper followed by discussant comments and then floor Q&A.
I don’t know anyone who likes format 1, mostly because of the ridiculous way that the discussant role is defined and the fact that the Q&As tend to jump around between papers rather than following any intellectual progression. Format 2 makes sense for really big plenary type talks, but when the group is smaller, it’s a highly inefficient way to use an hour if the goal is actually engage with the material.
Here are some alternative formats that tend to generate much deeper discussions in the same amount of time:
- Short format panel redux: Take the sessions and divide the time evenly into blocks for each of the papers. So 2 hour session with five papers has 24 minutes per paper. Then, in each block, presenter can start by giving their presentation for a bit less than half the time, followed immediately by discussant for that paper and then immediately by floor Q&A for that paper. Also, instead of one discussant for all papers, a nice thing to do is to have a “discussant round robin”: each paper presenter serves as discussant for someone else’s paper. You can use the round robin to actually assign discussants in a way that emphasizes overlapping interests. We used this on all of my MPSA panels this year and it was SO MUCH BETTER! If you are really organized, an even better thing to do is to organize in advance both with panel participants and those who will attend in the audience. Among that group, you commit to read the papers before the panel. Then, you can actually skip the paper presentation altogether, and rather lead with the discussant who provides a short summary of the paper followed by comments to get a conversation going, altogether taking less than 10 minutes. Then you have a good chunk of time for an open discussion of the paper. This is the way to really get a lot out of 24 minutes. It is also a miniature version of the “no presentation” long format, to which I now turn.
- The “no presentation” long format: This is the best way that I have experienced to have a deep discussion of new academic work. EGAP, CAPERS, and NEWEPS (link) are organized around this format. It requires that all those attending the workshop/conference do some homework before arriving. The format is simple: there is no author presentation, rather there is simply an entire hour devoted to having a discussion with the author about the paper, which everyone has read in advance of the meeting! You can have a discussant who serves to “get the ball rolling” by providing a really short summary of the paper and offers some starting comments or questions. That’s how CAPERS and NEWEPS work, although EGAP doesn’t even do that. You might think that this format will tend to result in a bunch of people sitting in silence for an hour. But I can tell you that has never happened. Sometimes it takes a little time for the discussion “momentum” to build, but when it does it is always energetic and there is always the feeling that we wished we had more time to discuss (that’s a sign of a good discussion!). The format is fueled by a strong ethic among the group of reading and critically engaging with papers prior to arriving at the meeting.
Either of these formats benefits greatly from the following:
- Session chairs that are dynamic in promoting the discussion and managing time. Whereas the standard formats privilege presentations and leave floor discussion as an afterthought, these revised formats do the opposite. For that reason, the role of the chair is really important. The chair needs to scan the room actively and maintain a list of people wishing to raise a question or comment. The chair can also help to clarify questions or comments that paper authors misunderstand or address inadequately.
- Rules for managing the discussion. Very useful are to use what are known as the “one finger” and “two finger” rules. (I’m not sure from where these rules originate, but I’ve seen them used in settings ranging from academic workshops to formal conferences at the United Nations.) The session chair manages a list of people who want to ask a question or make a comment to the author. To indicate to the chair that you want to be added to the list, you show one finger. The session proceeds with the chair going down the list allowing each person to ask their question or comment, and then allowing the author to respond. But, if you want to contribute to the discussion at that moment (rather than waiting for your turn on the list), you signal to the chair at that moment with two fingers. The chair then has the option to suspend the list for the moment and take two-finger comments or questions. This is useful when people want to dig deeper on a point that is being discussed at the moment. When the session is nearing the time limit, the chair has the option to declare “no more two fingers” and even to tell the author to withhold any responses so that the list of one-finger questions and comments can be cleared. It might sound a little rigid, but the rules work really well in keeping the discussion lively and on track.
- Keeping it manageable and fun. For NEWEPS and CAPERS, we’ve established that we are going to limit things to four papers per meeting. That is the maximum that members of the working groups think that they can really commit to read, and read deeply, in advance of the meeting. So, NEWEPS and CAPERS are organized as semi-annual (Fall and Spring), four-paper meetings that kick off with lunch, followed by the four sessions (with a short break in the middle), and then end with a group dinner. That makes it a manageable, engaging, and fun format.
I find these revised formats to be so much better than the traditional formats that I actually feel sorry in situations where the traditional formats are still used.
For those involved in evidence review or meta analysis projects, I highly recommend a few (rather old, but still relevant) articles by Robert Slavin:
- Slavin, R.E. (1984). Meta-Analysis in Education: How Has It Been Used? Educational Researcher 13(8):6-15. [gated link]
- Slavin, R.E. (1986). Best-Evidence Synthesis: An alternative to meta-analytic and traditional reviews. Educational Research 15(9): 5-11. [gated link]
- Slavin, R.E. (1995). Best evidence synthesis: An intelligent alternative to meta-analysis. Journal of Clinical Epidemiology 48(1):9–18.[gated link]
Slavin, who I take to be a strong proponent of good quantitative social research, makes some great points about how meta analysis has been misused in attempts to synthesize evidence on social programs. By my reading, what Slavin emphasizes is that fact that in very few cases will we have enough high quality and truly comparable studies for meta-analytic methods to be applied in a way that makes sense. That being the case, what we need is a scientifically defensible compromise between the typically unattainable ideal of a meta analysis on the one hand, and narrative reviews that have too little in the way of structure or replicability on the other.
Unfortunately, proponents of meta-analysis often suggest (see the literature that he cites) that only through the use of meta-analytic methods for effect aggregation and heterogeneity analysis can a literature review be considered as “scientific.” Those doing reviews are then faced with either doing a review that will be deemed “unscientific” or trying to apply meta analytic methods in situations where they shouldn’t be applied. Because of this pressure, we end up with reviews that compromise either on study quality standards or comparability standards so as to obtain a large enough set of studies to fulfill the sample size needs for conducting a meta analysis! These compromises are masked by the use of generic effect size metrics. This is the ultimate in the tail wagging the dog, and the result is a lot of crappy evidence synthesis (see the studies that he reviews in the 1984 article for examples). I’m inclined to view some attempts at applying meta analysis to development interventions (including some of my own!) in this light. See also the recent CBO review of minimum wage laws.
Slavin’s alternative is a compromise approach that replaces rote subservience to meta analysis requirements with scientific judgment in relation to a clear policy question. He recommends retaining the rigorous and replicable methods for searching the literature (including devising search strategies that attend to potential publication biases) that are the first stage of a meta analysis, but in a manner that is stringent in applying standards pertaining to study quality (internal validity) and relevance of the treatments (ecological validity) and gathers ample evidence to assess the applicability of results to the target context (external validity). The nature of the question will determine what is the “best evidence,” and the review should focus on such best evidence. From here, the method of synthesis and exploration of heterogeneity will depend on the amount of evidence available. It may be that only one or a few studies meets the stringent selection criteria, in which case the review should scrutinize the studies with reference to the policy questions at hand. In the very rare circumstance that many studies are available or in cases where micro data that capture substantial forms of heterogeneity are available, then statistical analyses may be introduced, but in a manner that is focused on addressing the policy questions (and not a cookbook application of homogeneity tests, random effects estimates, and so on). As Slavin writes “[no] rigid formula for presenting best evidence synthesis can be prescribed, as formats must be adapted to the literature being reviewed” (1986, p. 9).
For most policy questions in development, we face a paucity of high quality studies and limited comparability between such studies. Abandoning the meta analysis model in favor of Slavin’s more realistic but no less principled approach seems like the right compromise.
Easy: Sander Greenland (link). I had read and worked with Bayesian methods for some time before encountering his work. (One of my dissertation committee members was kinda well-known in Bayesian methods: link.) As I saw it, the way Bayesian methods were developed in political science, my field of application, was mostly about building big latent variable models and then using a Bayesian approach mostly for reasons of computational practicality—that is, simulation methods were often easier than trying to characterize the likelihood, and so one just threw some non-informative priors at the problem to get the machinery going and then let JAGS or BUGS rip…. But my research didn’t really benefit from latent variable modeling. I also came to interpret the retreat to simulation as a case of people trying to fit models they didn’t understand, which struck me as a suspect.* Bayesian methods I concluded simply weren’t for me.
Then I read this: PDF Gated. And this: PDF Gated. (The latter builds on Efron and Morris’s seminal work on Stein estimation: PDF1 PDF2). The papers are simple and accessible, but no less deep for that. Greenland performs a wonderful synthesis of various hierarchical, empirical Bayes, and “informative priors” Bayes methods. After reading these I eagerly consumed pretty much anything he wrote and I recommend that you do the same.
Also worth reading is Efron and Greenland’s recent Stat. Sci. exchange (as well as others’ comments, including Gelman’s): link.
*This isn’t to impugn all Bayesian latent variable modeling of course! Heck, I am working with some text analysis methods now that benefit greatly from Bayesian latent variable methods, or at least variational approximations.