# Overview

The goal of this class is to teach students skills to design field experiments and social science RCTs in particular. The includes the following:

- Posing a compelling “problem” that warrants intervention.
- Defining and motivating an intervention on the basis of good theory and available evidence.
- Designing a compelling field test of the intervention.

The approach of the class is hands-on: as a class, we will go through 3 cycles of the RCT design process. Each cycle is intended to yield a research design that would satisfy requirements for, e.g., an NSF or similar such grant proposal. Each week will involve making progress in successive steps of RCT design. During class time, we will present material to each other for discussion.

# Weekly plan

The plan for the semester will be laid out week by week below.

## Week 1

For class on Sept 10, prepare the following

- Characterize your research interests in terms of a “problem” (or a few problems).
- Reference existing theory and evidence to motivate potential interventions to address the problem(s).

Prepare notes to present your problems and intervention concepts in class. We will discuss them together.

In preparation for learning nuts and bolts of experimental design, read the following:

- Gerber, Alan S., and Donald P. Green. Field experiments: Design, analysis, and interpretation. WW Norton, 2012. Ch 1-2.
- Duflo, Esther, Rachel Glennerster, and Michael Kremer. “Using randomization in development economics research: A toolkit.” Handbook of development economics 4 (2007): 3895-3962.

Also, Evidence in Governance and Politics has an online resource book that has various resources for field experimental design. Start perusing the material there: [link]

## Week 2

Last class we defined three problems that will serve as the basis of the RCTs that we will design over the course of the semester:

- Non-participation and non-cooperation of rightwing partisans in surveys by established polling organizations.
- Police brutality.
- Failure to recognize historical injustices and their legacies.

We started to sketch out potential interventions, based on existing theory and evidence. For class on Sept 17, each of the group should prepare 4 slides sketching out the following:

- Characterize the nature of the problem as it applies to a particular location and population that will be the site of intervention.
- Frame the problem on the basis of a compelling social theory and describe what the theory suggests about addressing the problem. It should be a well-specified theory that describes a mechanism or behavioral parameters on which we might intervene to address the problem.
- Propose an intervention based on this analysis, defining the units of intervention. Will the intervention target individuals or groups of people?
- Describe how you would judge whether the intervention is successful. Ideally you would want to measure effects using the same kinds of outcomes that you would use to characterize the problem in the first place.

In addition, here are some more nuts and bolts readings in preparation for more technical aspects of design work:

- Gerber, Alan S., and Donald P. Green. Field experiments: Design, analysis, and interpretation. WW Norton, 2012. Ch 3,4,12.
- Congdon, W.J., J.R. Kling, J. Ludwig, S. Mullainathan, Chapter 4 – Social Policy: Mechanism Experiments and Policy Evaluations, Editor(s): Abhijit Vinayak Banerjee, Esther Duflo, Handbook of Economic Field Experiments, North-Holland, Volume 2, 2017, Pages 389-426.

In class each group will present their slides. The goal will be for everyone to get a good sense of the components of the proposed RCTs, because the next phase of the class will be for the whole class (working smaller groups) to work on developing each of the RCTs in turn.

## Week 3

This week we will be working on the basics of power analysis. In class, we will be doing simulation exercises in R. Be sure that you have R installed and ready to go for class. We will be working with the `estimatr`

package for R. Be sure that you have it installed. Here is the website for the package: [link]

We will be working with the concepts of “minimum detectable effect sizes” and “design effects” attributable to clustered assignment and use of covariates for stratification and balance constraints.

Review the following before class:

- Coppock, A. 2020. “10 Things to Know About Statistical Power.” EGAP Resource. [link].
- Bloom, HS. “Minimum detectable effects: A simple way to report the statistical power of experimental designs.” Evaluation review 19.5 (1995): 547-556.
- Samii, C. 2021. “Estimation and Inference for a Randomized Experiment.” Slides for Quantitative Methods II, New York University Politics Department. [link]
- Samii, C. 2021. “Robust Inference I.” Slides for Quantitative Methods II, New York University Politics Department. [link]
- Djimeu, EW and Houndolo, D-G, 2016. Power calculation for causal inference in social science: sample size and minimum detectable effect determination, 3ie impact evaluation manual, 3ie Working Paper 26. New Delhi: International Initiative for Impact Evaluation (3ie). [link]
- Bruhn, Miriam, and David McKenzie. “In pursuit of balance: Randomization in practice in development field experiments.” American economic journal: applied economics 1.4 (2009): 200-232.

## Week 4

For the current week, I would like you to do two things.

First, work through the code in this design effects exercise: [link]

I know it’s sort of silly to just copy and paste code, but I want you to understand what the code is doing exactly. It shows steps that you could use to test out experimental designs that include clustering and stratification.

Second, I would like you to work in groups to develop proposed research designs for our first RCT idea: testing strategies to promote participation and cooperation of rightwing partisans in surveys by established polling organizations. The task will include the following:

- Have a slide to remind us of the motivation: state the problem again and, drawing on theory, motivate an intervention concept.
- Have 2-3 slides explaining the following elements of the research design:
- Population of interest.
- Manner in which the sample will be obtained (under ideal conditions).
- Units of observation (on whom/what are you taking outcome measurements).
- Units of randomization (to whom/what are you assigning different treatments).
- Treatment conditions.
- Primary outcomes of interest.
- Statistical quantities you will be targeting, and the hypothesis tests that you will perform on these quantities.
- Randomization strategy (any clustering, stratification, or other forms of restricted randomization?).
- Estimation and inferential strategy, taking care to account for the randomization strategy and noting any covariates that you would include.

- Conduct a simulation study to show how your proposed design performs, in terms of statistical power (e.g., minimum detectable effect), relative to a unit-level completely randomized trial.
- Simulate the outcome data. Write code to simulate outcomes in a way that includes sources of heterogeneity (e.g., by age or gender or by site-specific random effects) that you think might be important.

In class each group will present their results and we will discuss.

For those interested in more details on randomization inference and power analysis, here are some slides (I didn’t want to share these in advance because they go a little too deep into the weeds, but in case you are interested):

## Week 5

For the coming week, the task is to revise the experimental designs along the way we discussed in class.

For group 1:

- Perform a “proof of concept” exercise with establishing an experimental population from online forums, showing potential ways to extract covariate information from their profile histories.
- Propose a way to do a “supply curve” analysis that considers different levels of incentives crossed with different ways of framing the invitation.
- Consider ways to measure outcomes to capture effects both on participation and bias/misrepresentation in the way that participants’ responses.

For group 2:

- Perform a “proof of concept” exercise where treatment is making door-step appeals to individuals, and then track outcomes through both immediate participation and then response to invitation to participate in a subsequent survey administered through mail or online.
- Consider ways to measure outcomes to capture effects both on participation and bias/misrepresentation in the way that participants’ responses.
- Try to get the power analysis code to work to analyze design effects.

Here are some readings to look at strategies for getting at experimenter demand effects, bias, and misrepresentation in surveys:

- De Quidt, Jonathan, Johannes Haushofer, and Christopher Roth. “Measuring and bounding experimenter demand.” American Economic Review 108, no. 11 (2018): 3266-3302.
- Mummolo, Jonathan, and Erik Peterson. 2019. “Demand Effects in Survey Experiments: An Empirical Assessment.” American Political Science Review 113 (2): 517-529.
- Bullock, John G., Alan S. Gerber, Seth J. Hill, and Gregory Huber. “Partisan bias in factual beliefs about politics.” Quarterly Journal of Political Science 10, no. 4 (2015): 519-578.
- Broockman, David E., Joshua L. Kalla, and Jasjeet S. Sekhon. “The design of field experiments with survey outcomes: A framework for selecting more efficient, robust, and ethical designs.” Political Analysis 25, no. 4 (2017): 435-464.
- De Quidt, Jonathan, Lise Vesterlund, and Alistair J. Wilson. “Experimenter demand effects.” In Handbook of research methods and applications in experimental economics. Edward Elgar Publishing, 2019.

## Week 6

This week we will start our second design task. The idea will be to design a *precinct-level* intervention to address police misconduct. The challenges here are that (i) we have a limited number of precincts overall, and (ii) that there is a limited number of precincts in any given month that can receive the intervention. The design that we will use is a “staggered adoption” design where a small number of precincts are treated every 3 months or so. We have pre-treatment time series data that go back to 1993. So we will use methods that flexibly model trends to derive maximal inferential leverage.

Here is a link to data that we will use to inform the design: [NYCLU git repository]

These data are on complaints against individual officers. We will aggregate to the level of precincts. Precincts will be our unit of analysis.

Here is an example of a trial in Chicago that operated under similar circumstances: [original paper] [corrected analysis]

We will be using the tools that are applied in the corrected analysis, although we will continue to operate at the precinct level rather than the officer level.

## Week 7

For the coming session, your task is to work in groups to conduct a descriptive analysis of the complaints dataset to characterize quantitatively the nature of the police misconduct problem. You will use this characterization to motivate your intervention concept, the kinds of causal effects you will be targeting, and begin to think through statistical considerations for the design of the experiment. Here are some references to methodological material that you can consult for designing the study:

- Tutorials and references for Callaway and Sant’Anna’s approach to DiD estimation for staggered adoption settings: [link]
- On estimating causal effects with randomized staggered adoption: Athey, S., & Imbens, G. W. (2021). Design-based analysis in difference-in-differences settings with staggered adoption. Journal of Econometrics. [link]
- For the methods junkies, here is another approach that combines random assignment with synthetic control, for situations where the number of treated units is very small (but there may be many more control units): Bottmer, L., Imbens, G., Spiess, J. and Warnick, M., 2021. A Design-Based Perspective on Synthetic Control Methods. arXiv preprint arXiv:2101.09398. [link]

## Week 8

For this sessions you will use simulations to study power and inferential issues for the proposed experiments, with group 1 focusing on “stopping rules” and group 2 focusing on implications of spillover.

## Week 9

This week we will discuss examples of what to include in pre-analysis plans (PAPs) and why. Below are examples of PAPs to read before the discussion. For the ones on the EGAP-OSF registry, be sure to find the link to download the PDF of the PAP.

- Henn et al.: [pdf]
- Kalla and Broockman: [link]
- Baron et al.: [link]
- Blair et al.: [link]
- Green lab standard operating procedures: [link]

## Week 10

We are now starting our third, and final, design task. If you feel like things are becoming somewhat routine, good! That’s the idea: to make what may have seemed initially like an incredibly daunting task approachable. We will go back to the square one on this. For the theme of “failures to recognize historical injustices and their legacies”, each group will create slides on the following:

- Characterize the nature of the problem as it applies to a particular location and population that will be the site of intervention. Assemble any descriptive data that you might have.
- Frame the problem on the basis of a compelling social theory and describe what the theory suggests about addressing the problem. It should be a well-specified theory that describes mechanisms or behavioral parameters on which we might intervene to address the problem.
- Propose the intervention that you will test based on this analysis, defining the units of intervention. Will the intervention target individuals or groups of people? We had discussed the possibility of a “place-based” intervention, such as with monuments or exhibits. The intervention could involve bringing people together to discuss or deliberate. We can be open minded at the moment and define the parameters together in the discussion in class.
- Describe how you would judge whether the intervention is successful. Ideally you would want to measure effects using the same kinds of outcomes that you would use to characterize the problem in the first place.

We will discuss these general considerations together before moving on to the more nuts and bolts design work for the coming weeks.

## Week 11

Presentation of revised study designs.

## Week 12

RCTs cafe