Over at the Statistical Modeling, Causal Inference, and Social Science blog (link), Andrew Gelman writes,
I’m involved (with Irv Garfinkel and others) in a planned survey of New York City residents. It’s hard to reach people in the city–not everyone will answer their mail or phone, and you can’t send an interviewer door-to-door in a locked apartment building. (I think it violates IRB to have a plan of pushing all the buzzers by the entrance and hoping someone will let you in.) So the plan is to use multiple modes, including phone, in person household, random street intercepts and mail.
The question then is how to combine these samples. My suggested approach is to divide the population into poststrata based on various factors (age, ethnicity, family type, housing type, etc), then to pool responses within each poststratum, then to runs some regressions including postratsta and also indicators for mode, to understand how respondents from different modes differ, after controlling for the demographic/geographic adjustments.
Maybe this has already been done and written up somewhere?
It’s interesting to consider this problem by combining a “finite population” perspective with some ideas about “principal strata” from the causal inference literature. Suppose a finite population U from which we draw a sample of N units. We have two modes of contact, A and B. Suppose for the moment that each unit can be characterized by one of the following response types (these are the “principal strata”):
Type | Mode A response | Mode B response |
---|---|---|
I | 1 | 1 |
II | 1 | 0 |
III | 0 | 1 |
IV | 0 | 0 |
Then, there are two cases to consider, depending on whether mode of contact affects response:
Mode of contact does not affect response
This might be a valid assumption if the questions of interest are not subject to social desirability biases, interviewer effects, etc. In this case, it is easy to define a target parameter as the average response in the population. You could proceed efficiently by first applying mode A to the sample, and then applying mode B to those who did not respond with mode A. At the end, you would have outcomes for types I, II, and III units, and you’d have an estimate of the rate of type IV units in the population. You could content yourself with an estimate for the average response on the type I, II, and III subpopulation. If you wanted to recover an estimate of the average response for the full population (including type IV’s), you would effectively have to impute values for type IV respondents. This could be done by using auxiliary information either to genuinely impute or (in a manner that is pretty much equivalent) to determine which type I, II, or III units resemble the missing type IV units, and up-weight. In any case, if the response of interest has finite support, one could also compute “worst case” (Manski-type) bounds on the average response by imputing maximum and minimum values to type IV units.
Mode of contact affects response
This might be relevant if, for example, the modes of contact are phone call versus face-to-face interview, and outcomes being measured vary depending on whether the respondent feels more or less exposed in the interview situation. This possibility makes things a lot trickier. In this case, each unit is characterized by a response under mode A and another under mode B (that is, two potential outcomes). One immediately faces a quandary of defining the target parameter. Is it the average of responses under the two modes of contact? Maybe it is some “latent” response that is imperfectly revealed under the two modes of contact? If so, how can we characterize this “imperfection”? Furthermore, only for type I individuals will you be able to obtain information on both potential responses. Does it make sense to restrict ourselves to this subpopulation? If not, then we would again face the need for imputation. A design that applied both mode A and mode B to the complete sample would mechanically reveal the proportion of type I units in the population, and by implication would identify the proportion of type II, III, and IV units. For type II units we could use mode A responses to improve imputations for mode B responses, and vice versa for type III respondents. Type IV respondents’ contributions to our estimate of the “average response” would be based purely on auxiliary information. Again, one could construct worst case bounds by imputing maximum and minimum response values for each of the missing response types.
One wrinkle that I ignored above was that the order of modes of contact may affect either response behavior or outcomes reported. This multiplies the number potential response behaviors and the number of potential outcome responses given that the unit is interviewed. You could get some way past these issues by randomizing the order of mode of contact—e.g. A then B for one half, and B then A for the other half. But you would have to impose some more assumptions to make use of this random assignment. E.g., you’d have to assume that A-then-B always-responders are exchangeable with B-then-A always responders in order to combine the information from the always-responders in each half-sample. Or, you could “shift the goal posts” by saying that all you are interested in is the average of responses from modes A and B under the A-then-B design.
Update:
The above analysis did not explore how other types of assumptions might help to identify the population average. Andy’s proposal to use post-stratification and regressions relies (according to my understanding) on the assumption potential outcomes are independent of mode of contact conditional on covariates. Formally, if the mode of contact is $latex M$ taking on values $latex A$ or $latex B$, potential outcomes under mode of contact $latex m$ is $latex y(m)$, $latex T$ is principal stratum, and $latex X$ is a covariate, then $latex \left[y(A),y(B)\right] \perp M | T, X$ implies that,
$latex E(y(m)|T,X) = E(y(m)|M=m, T,X) = E(y(m)|M \ne m, T,X)$.
As discussed above, the design that applies modes A and B to all units in the sample can determine principal stratum membership, and so these covariate- and principal-stratum specific imputations can be applied. Ordering effects will again complicate things, and so more assumptions would be needed. A worthwhile type of analysis would be to study evidence of mode-of-contact as well as ordering effects among the type I (always-responder) units.
Now, it may be that mode of contact affects response but units are contacted via either mode A or B. Then, a unit’s principal stratum membership is not identifiable, nor is the proportion of types I through IV identifiable (we would end up with two mixtures of responding and non-responding types, with no way to parse out relative proportions of the different types). If some kind of response “monotonicity” held, then that would help a little. Response monotonicity would mean that either type II or type III responders didn’t exist. Otherwise, we would have to impose more stringent assumptions. The common one would be that principal stratum membership is independent of potential responses conditional on covariates. This is a classic “ignorable non-response” assumption, and it suffers from having no testable implications.