At a talk recently on new methods for inverse probability weighting for missing data, I put up the following picture, provoking the consternation of a few people in the room:

### Tabarrok on false findings

Related to my previous post, I followed a link from Chris Blattman’s blog (link) to the Marginal Revolution blog where Alex Tabarrok had posted a great discussion of “false findings” in science (link). The post was from 2005, and it was triggered by John Ioannidis’s now well-known paper on “why most published research findings are false” (link). Tabarrok proposes that economics may be in less worse shape because economics hypotheses tend to be better motivated by theory than the type of atheoretical “see what sticks” hypothesis testing that seems, from a casual glance, to characterize other literatures. Continue reading “Tabarrok on false findings”

### Risks, incentives, and shady research

Over at the Social Science Statistics blog, Richard Nielsen riffs (link) on what is probably the biggest threat to valid inference in political science (I can’t speak for other social sciences, but wouldn’t be surprised if it were similar): the need to demonstrate that something, *anything* in your empirical analysis “significantly” departs from some null hypothesis. The recent article by Gerber et al (2010; linked on Richard’s post) is remarkable in revealing how this insidious norm manifests itself in the discipline’s publications, affecting the “most influential and highly cited outlets” the most.

The fact is, having “stars in your regression table” is still pretty much a *sine qua non* for publication Continue reading “Risks, incentives, and shady research”

### Fun math: sum of integers

Math is often about finding the right analogy, often a spatial analogy. Suppose you want to compute the sum of integers, $latex S = 1+2+3+ \hdots + N$. Consider decomposing the sum as,

$latex \begin{array}{cccccc}S = & 1 & + 1 & + 1 & + \hdots & + 1\\ & & + 1 & + 1 & + \hdots & + 1 \\ & & & + 1 & + \hdots & + 1\\ & & & & \vdots & \\ & & & & & + 1,\end{array}$

where you’ll note the column sums equal the integers in the sequence. You’ll have a bunch of ones that form a triangle. Imagine taking such triangle, copying it, flipping it and then joining the copy to the first triangle. Removing the “$latex +$” signs you’d get something that looks like,

$latex \begin{array}{ccccc} 1 & 1 & 1 & \hdots & 1\\ (1) & 1 & 1 & \hdots & 1\\ (1) & (1) & 1 & \hdots & 1\\ & & & \vdots & \\ (1) & (1) & (1) & \hdots & 1 \\ (1) & (1) & (1) & \hdots & (1), \end{array}$

where I’ve put parentheses on the $latex 1$’s from the second, copied triangle. By analogy, the sum of the integers from the original sequence is equal to half the area of a rectangle that is characterized by this matrix of ones—that is, a rectangle of height $latex n+1$ and width $latex n$. As such, $latex S = n(n+1)/2$. This comes up, e.g., in the asymptotic approximation for the Wilcoxon signed rank test (link).

### Mundane algebra: stratified mean and IPW mean

Came up in a conversation, so I just wanted to store it: the stratified mean and inverse-probability weighted mean are algebraically equivalent:

$latex \underbrace{N^{-1}\sum_{s=1}^S \sum_{i \in s}y_{is}\frac{R_{is}N_s}{n_s}}_{\text{IPW mean}} = \sum_{s=1}^S\frac{N_s}{N}\frac{1}{n_s}\sum_{i \in s}y_{is}R_{is} = \underbrace{\sum_{s=1}^S\frac{N_s}{N}\bar{y}_s}_{\text{stratified mean}}$,

where $latex N$ is population size; $latex N_s$ and $latex n_s$ are stratum $latex s$ population and sample size, respectively; and $latex R_{is}$ is the response indicator for unit $latex i$ in stratum $latex s$.