Superpopulations and inference with “all the data”

Doug [Rivers, I presume?] added a nice comment on superpopulation versus (sampling) design-based approaches to inference in regression modeling (link). The comment is to a post on Andy Gelman’s blog from over a year ago about what to do when one is trying to fit regressions to data on the “full population.”

One comment: Doug suggests that “the definition of the regression parameters is arbitrary (why not LAD or some other estimator applied to the population?) and it’s not obvious how to interpret the parameters.” My reading of Goldberger (reference) and Angrist and Pischke (reference) suggests that OLS is well motivated, based on the best linear approximation criterion. Going beyond the trivial case of a binary predictor variable (for which OLS is a convenient way to calculate mean differences), with a continuous predictor, linear approximation is an easy-to-interpret summary of the predictor’s relationship to the outcome variable. Along similar lines, logistic regression is well motivated as the maximum entropy estimator of a relationship to a binary outcome. So I am not sure that it is all that arbitrary.