Over at the Social Science Statistics blog, Richard Nielsen riffs (link) on what is probably the biggest threat to valid inference in political science (I can’t speak for other social sciences, but wouldn’t be surprised if it were similar): the need to demonstrate that something, anything in your empirical analysis “significantly” departs from some null hypothesis. The recent article by Gerber et al (2010; linked on Richard’s post) is remarkable in revealing how this insidious norm manifests itself in the discipline’s publications, affecting the “most influential and highly cited outlets” the most.
The fact is, having “stars in your regression table” is still pretty much a sine qua non for publication, meaning that coefficients or effect estimates published in political science articles are systematically biased away from the null (an amusing discussion by Gelman and Weakliem on this: link).
Of course, there must be some standard by which quantitative papers are judged worthy for publication. Right now an almost necessary condition is whether the paper found anything that departed “significantly” from the null. It’s easy to see why this would be used as a criterion: it would seem to ensure that only “remarkable” research is published. But this view fails to appreciate the strategic incentives that such a criterion creates. It leads people to manipulate their findings or search for said significance in order to get published. In that case, “significance” loses much of its value.
Is statistical significance the best criterion among feasible alternatives? In my view, no. Papers should rather be judged on whether the research question is important (though that’s a hard one to judge) and well posed (slightly easier), and then, whether the design and methods are rigorous (a bit easier, since there are technical criteria). The Experiments in Governance and Politics (EGAP) working group is trying to make a push in this direction by providing a CONSORT-style (link) registry for new field experiments (link).
So, how about a publication review protocol that had reviewers only view the research question and motivating material and then the design and methods, without providing them access to any of the findings? Then, reviewers would decide on publication from this. Once this decision was made, the full paper would be submitted and revisions could be called-for. I know in the age of working papers on the internet word about findings will often have already percolated, but this would at least establish a norm that such things should be ignored.
I just proposed the same idea to someone few days ago. I would go even further though – the journal should have to decide before data is collected. That way it’s fully blinded to everyone. Otherwise you’re still likely to get publication bias due to leakage of results and authors not wanting to publish nulls.
Brendan: Your idea is better, in principle. My reason for shying away from it was that I thought it might be too radical a departure from current practices, where researchers are essentially allowed to carry out whatever work they want, and then to venue shop for a place to publish it, knowing that where they land will likely depend on whether they can demonstrate “significance.” The approach that you suggest is appealing in many ways, but I imagine resistance, as some would interpret it as allowing editorial boards to determine what is “worthy” research, and therefore may have a status quo bias. I think this raises a whole host of interesting issues, but certainly there is a strong case to be made that current practices undermine the credibility of political science as science, and some revision is necessary. In a nutshell, as it stands, because published political science results as a whole are certainly biased away from null hypotheses, as a discipline we definitely know a lot less than we profess to know through our journals. Let’s try to change that.
It’s definitely radical, but it seems appropriate in an area like medicine where clinical trials are often life and death (what you do might also fit in). The proposal could include a script with fake data detailing the analyses to be conducted and how the results would be interpreted. It definitely would be tough on researchers — you could even argue it would shatter confidence in science as we find out exactly how much doesn’t work — but on the other hand, it would make even null results publishable in top journals, so there is some upside for everyone.
PS One way to incentivize this would be to make submission to a pre-results journal (or whatever you want to call it) a plus factor in awarding NSF/NIH grants.