There is a movement afoot to increase the efficiency of US government activities through greater use of ‘evidence-based policy’. Proponents like the Coalition for Evidence-Based Policy (www.coalition4evidence.org/wordpress/) are pushing to redirect government resources towards those policies that yield relatively higher benefit-cost ratios in randomised experimental tests of different candidate policies – what we call ‘policy evaluations’ (see also, for example, Angrist and Pischke 2010). Perhaps ironically, the US government’s current budget situation has the potential to reduce the amount of funding available for policy evaluations intended to increase the bang-per-buck from government spending.
Given these developments it seems important for the research community to think hard about ways of increasing the efficiency of the policy-research enterprise itself. One starting point is to revisit the assumption that is prevalent throughout the policy-research industrial complex that the best way to use randomised experiments to inform policy is to test actual policies.
Consider the following example, taken from our paper about policy field experiments published in the summer 2011 issue of the Journal of Economic Perspectives (co-authored with Jeffrey R Kling of the Congressional Budget Office, Ludwig et al 2011). Suppose that the US Department of Justice is interested in learning more about whether to devote scarce resources to supporting ‘broken windows’ policing, which is based on the notion that signs of minor disorder signal to potential offenders that no one cares about crime in the local area, thereby reducing the deterrent threat from punishment and increasing the chances that more serious crimes are committed. Most researchers would argue that the best approach is to carry out a policy evaluation of broken windows policing. Recruit a representative sample of cities, and randomly select some neighbourhoods but not others (or perhaps some cities but not others) to receive broken windows policing. Then compare subsequent crime rates in treatment versus control areas. This policy evaluation would be informative but not cheap. The unit of random assignment in this case is the neighbourhood or city – the level at which the policing intervention of direct interest operates. The number of neighbourhoods or cities that would need to be ‘treated’ to have adequate statistical power is large, and the cost per treated area is high.
Now consider an alternative experiment. Imagine buying a number of cheap used automobiles. Break the windows of half the cars, and then randomly select a set of urban neighbourhoods in which to park cars with different levels of physical damage. Measure what happens to more serious crimes across different neighbourhoods. While less ethically objectionable variants of such an experiment are possible (such as randomising areas to have signs of disorder cleaned up, rather than introduced), our example is basically the research design used in a social psychology experiment in the 1960s that led to broken windows theory and then widespread adoption in New York City in the 1990s. This ‘mechanism experiment’ doesn’t test the policy of direct interest to the Department of Justice, but rather tests the causal mechanism that underlies the broken windows policy.
How can mechanism experiments help economise on research funding? The broken windows theory rests on a simple logic model in which the key policy lever, P (broken windows policing), influences the outcome of primary policy interest, Y (serious criminal offences), through the mediator (M) of local disorder, or P→M→Y. Suppose that DoJ thinks it already knows something about policing – specifically, suppose that DoJ thinks it already understands the relationship between broken windows policing and signs of disorder (P→M). Police professionals might need to have learned that relationship to guide all sorts of policing decisions, because citizens dislike disorder for its own sake regardless of whether it accelerates more serious crimes. In that case the new information that DoJ gets from carrying out a policy evaluation of actual broken windows policing is just about the M→Y link, but that information is mixed together with the noise about the specific P→M link that would arise in any given experiment. On the other hand the mechanism experiment maximises the research funding available to identify the part of the causal chain (M→Y) that policymakers do not already understand. Put differently, mechanism experiments can economise on research funding by taking better advantage of what policymakers think they already know.
This broken windows example is not an isolated case. Depending on what policymakers think they understand, in other applications mechanism experiments might increase the efficiency of research spending by, for example, enabling researchers to randomise at relatively less aggregated (lower-level) units of observation.
We are not claiming that mechanism experiments are ‘better’ than policy evaluations. In situations where, for example, the list of candidate mechanisms through which some policy might affect outcomes is long and these mechanisms might interact, the only useful way to get policy-relevant information might be to carry out a policy evaluation. Probably more common is the situation in which mechanism experiments and policy evaluations are complements, in which encouraging evidence from a mechanism experiment might need to be followed up by a policy evaluation in order to, for example, reduce the risk of unintended consequences. But at the very least carrying out a series of mechanism experiments first can help improve decisions about when it makes sense to invest research funding in a full-blown policy evaluation.
Former White House chief of staff (now Chicago mayor) Rahm Emanuel famously said, “Never let a serious crisis go to waste.” The substantial budget deficits confronting government at every level in the US has the potential to reduce funding for policy-relevant experiments, thereby reducing the ability to efficiently allocate government resources in the future. But these budget deficits are also a potential opportunity for the policy-research community to start thinking about how we ourselves can try to start doing more with less.
Angrist, Joshua D and Jorn-Steffen Pischke (2010), “The credibility revolution in empirical economics: How better research design is taking the con out of econometrics”, Journal of Economic Perspectives, 24(2):3-30.
Ludwig, Jens, Jeffrey R Kling, and Sendhil Mullainathan (2011), “Mechanism experiments and policy evaluations”, Journal of Economic Perspectives, 25(3):17-38.