{"id":265,"date":"2010-11-19T11:07:14","date_gmt":"2010-11-19T16:07:14","guid":{"rendered":"https:\/\/cyrussamii.com\/?p=265"},"modified":"2010-11-19T11:10:23","modified_gmt":"2010-11-19T16:10:23","slug":"robustness-to-misspecification","status":"publish","type":"post","link":"https:\/\/cyrussamii.com\/?p=265","title":{"rendered":"Robustness to misspecification"},"content":{"rendered":"<p><a href=\"https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01.jpg\"><\/a>At a talk recently on new methods for inverse probability weighting for missing data, I put up the following picture, provoking the consternation of a few people in the room:<\/p>\n<p><a href=\"https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-273\" title=\"ipw_robustness_01\" src=\"https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01-1024x436.jpg\" alt=\"\" width=\"640\" height=\"272\" srcset=\"https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01-1024x436.jpg 1024w, https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01-300x128.jpg 300w, https:\/\/cyrussamii.com\/wp-content\/uploads\/2010\/11\/ipw_robustness_01.jpg 1842w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><!--more-->The little gray dots on both graphs are the same scatter plot from a population for which the relationship between the x-axis variable and the y-axis variable exhibit a non-linear relationship. \u00a0The gray line on both graphs is a linear OLS fit to this nonlinear relationship, which by definition is the best linear approximation in terms minimizing squared deviations. \u00a0 The hollow circles show data that are sampled with unequal probabilities from this underlying population of gray dots. \u00a0The departure from simple random sampling is that the probability of inclusion is increasing simply in the values of the x-axis variable. \u00a0The black dashed line on the graph to the left is the linear OLS fit to the non-random sample. The black dashed line on the graph to the right is a weighted linear OLS fit, where the weights are equal to the inverse of the predicted probabilities of inclusion in the sample taken from a logistic regression of an inclusion indicator on the x-axis variable. \u00a0The size of the weights for each observation is proportional to the size of the hollow circles in the right graph.<\/p>\n<p>The picture is meant to illustrate a few things: (1) that the x-axis variable contains two features, one that relates to the y-axis variable, and another separate feature that relates to an indicator for inclusion in the sample, and that these features are not redundant; (2) that using the information from the second feature of the x-axis variable allows us to recover the population-level linear approximation; and (3) that we should not trust the usual textbook advice that we get about departures from random sampling (or, when the goal is causal inference, random assignment), which says that so long as our model includes the variables that explain the departures from equal probability sampling, we can ignore such departures. \u00a0Clearly the usual textbook advice assumes that the model is correct, which is something that can be verified in only the simplest cases. \u00a0The picture above might be such a simple case, but for higher dimensional problems, such may not be the case. \u00a0Thus, what the picture shows is that inverse probability weighting allows us to recover an approximation, so long as the inverse probability weights are accurate. \u00a0This is half of the &#8220;double robustness&#8221; property of inverse probability weighted estimators. \u00a0Insofar as it is <em>relatively<\/em> easier to model the sample inclusion process, relative to the outcome (y) process, this is a useful feature. \u00a0(An example is when we are trying to estimate treatment effects. \u00a0In doing so, we can use post-treatment variables to model inclusion in the sample, and we can do so without having to worry about issues of compatibility between the inclusion model and the model we are using to quantify treatment effects. \u00a0If we wanted to model outcome directly&#8212;that is, if we wanted to use an imputation strategy&#8212;we&#8217;d have to ensure compatibility and marginalize over the values of the post-treatment variable.)<\/p>\n<p>When I presented this picture, it was interesting to see some people in the audience nodding their heads enthusiastically, but a few looked at me like I had two heads. \u00a0Someone from the latter camp asked, why is it at all useful to be able to recover an &#8220;inaccurate&#8221; description of the relationship between the x and y variables?<\/p>\n<p>This is a question about the value of an estimator&#8217;s &#8220;robustness to misspecification,&#8221; by which I mean that the estimator consistently estimates a statistic, $latex \\theta$, that <em>describes<\/em> the population, but that $latex \\theta$ may not, itself, be a &#8220;parameter&#8221; in a &#8220;data generating process&#8221; that gives rise to the population values. \u00a0\u00a0One way to look at it is to propose that outcome models are <em>always<\/em> approximations of complex, unknowable relationships, and so we are always estimating $latex \\theta$-type objects rather than parameters in any actual data generating process. \u00a0That being the case, we want to be sure that we are at least estimating an approximation that characterizes the population. \u00a0To put it another way, the statistic, $latex \\theta$, contains all the practical information that we can <em>use<\/em>, without containing all the information that may be needed to characterize the full data generating process. \u00a0Moving beyond $latex \\theta$ may require commitments to further modeling assumptions that we would prefer to avoid; rather, we are content with the population summary that $latex \\theta$ provides, so long as we can estimate that summary consistently with our sample.<\/p>\n<p>This all comes into play in a few situations. One is where the relationships between variables are highly irregular&#8212;e.g., lots of obscure nonlinearities&#8212;but there are pronounced lower-order features that explain enough variation such that a linear approximation is sufficient for the sake of making a decision. \u00a0To put this more straightforwardly, we have a decision to make, and it depends simply on the general extent to which y is increasing or decreasing in x. \u00a0In this case, a linear, or possibly quadratic, approximation would suffice. \u00a0Another situation where this comes into play is with\u00a0heterogeneous\u00a0treatment effects. \u00a0 Suppose we want a difference in means estimate of the average treatment effect in the population, but there is substantial heterogeneity in the differences in potential outcomes. \u00a0Then, a difference in means from a non-representative sample may fail to recover the population difference in means, and so we want to adjust&#8212;e.g. with weighting&#8212;to get back to the population difference in means. \u00a0 In my view, the two situations described in this paragraph are essentially the same, and the principles are all being illustrated in the picture above. \u00a0 Also, I see clear links between arguments extolling &#8220;robustness to misspecification&#8221; and the so-called &#8220;design-based&#8221; orientation toward inference that takes outcomes in the population as fixed and randomness coming from the sampling or experimental design.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At a talk recently on new methods for inverse probability weighting for missing data, I put up the following picture, provoking the consternation of a few people in the room:<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-265","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/posts\/265","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=265"}],"version-history":[{"count":15,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/posts\/265\/revisions"}],"predecessor-version":[{"id":284,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=\/wp\/v2\/posts\/265\/revisions\/284"}],"wp:attachment":[{"href":"https:\/\/cyrussamii.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cyrussamii.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}