The National Research Council has released a prepublication version of their report on missing data in randomized trials (link):
Randomized clinical trials are the primary tool for evaluating new medical interventions. Randomization provides for a fair comparison between treatment and control groups, balancing out, on average, distributions of known and unknown factors among the participants. Unfortunately, these studies often lack a substantial percentage of data. This missing data reduces the benefit provided by the randomization and introduces potential biases in the comparison of the treatment groups. Missing data can arise for a variety of reasons, including the inability or unwillingness of participants to meet appointments for evaluation. And in some studies, some or all of data collection ceases when participants discontinue study treatment. Existing guidelines for the design and conduct of clinical trials, and the analysis of the resulting data, provide only limited advice on how to handle missing data. Thus, approaches to the analysis of data with an appreciable amount of missing values tend to be ad hoc and variable. The Prevention and Treatment of Missing Data in Clinical Trials concludes that a more principled approach to design and analysis in the presence of missing data is both needed and possible. Such an approach needs to focus on two critical elements: (1) careful design and conduct to limit the amount and impact of missing data and (2) analysis that makes full use of information on all randomized participants and is based on careful attention to the assumptions about the nature of the missing data underlying estimates of treatment effects. In addition to the highest priority recommendations, the book offers more detailed recommendations on the conduct of clinical trials and techniques for analysis of trial data.
The report starts off with recommendations for minimizing missing data. There are lots of sensible recommendations, including more use of pre-trial studies to identify what kinds of people are likely to drop out under treatment, adjustable dosing and protocol adjustments (in which case the study tests the protocol more than the treatment per se), and basing primary analyses on composite outcomes that may include drop-out as an indication of treatment “failure.” They also provide a nice list of strategies for increasing incentives to participate and collecting information on drop-out.
While the recommendations are mostly very sensible, there are some important trade-offs. First, some run the risk of creating conditions within the trial that may not be present when an experimental treatment would be used clinically (that is, in real world practice). As such, estimates from the trial may be misleading. Take the use of incentives or mechanisms to reduce the burden of participation. These may confound the estimation of treatment effects that would be applicable to the general population. For example, if social networking groups were created to make participation more enticing in, say, a trial for a depression treatment, then the effects of the social networking group may obscure the effects of the treatment, and provide misleading estimates for how the treatment would work in a clinical setting without the social networking groups. The same risks arise for other recommendations (flexible dosing, optimal background regimens, participant education) that create conditions within the trial that do not resemble the “real world.” Second, as the report itself notes, using shorter follow-up times or composite outcome measures may interfere with the producing the most clinically-relevant or scientifically informative information from the trial. (This too is acknowledged in the report.) Nonetheless, it is important to look into ways to reduce missingness. Also, I can see pre-study sensitivity and power analysis for missing data as being a fruitful area for methodological research. As far as I know there is very little work on this to date.
A follow-up post will comment on the report’s recommendations for analysis with missing data.