I once, naively, asked the late Alan Krueger about the pioneers of natural experiments in economics. His somewhat sheepish answer was that that’s like asking about the pioneers of rock music. It didn’t take much research on my part to reveal the numerous protagonists of a movement in labour economics in the 1980s and 1990s that transformed the way empirical work is done in the field and in many areas of economics beyond. Yet, like rock music, natural experiments have their Fab Four, and they are the 2021 Nobel laureates David Card, Joshua Angrist and Guido Imbens, plus the late Alan Krueger. I hope many will agree with me that this prize honours Krueger as well.
The important questions in economics are causal questions. How does immigration affect the labour market prospects of natives? What is the payoff to an additional year spent in school or to attending university? What are the effects of minimum wages on the employment prospects of low-skilled workers? But these questions are difficult to answer because we lack the right counterfactuals.
For example, we don’t know how natives would have fared had fewer immigrants arrived. Instead, we observe native outcomes for the realised treatments, the levels of immigration that actually took place. The observed associations between realised treatments and outcomes may have three possible reasons:
- One is the actual causal effect of the variable of interest – for example, the impact of the arrival of immigrants on the wages of natives.
- Alternatively, causality could run the other way – for example, immigrants might settle in labour markets that are performing particularly well.
- The final possibility is that there may be a third variable (called a confounder) that is affecting both treatment and outcomes. In the case of the payoff to attending university, graduates may be more able or harder working than school leavers and would have had higher earnings even if they hadn’t gone to university.
The social science challenge is how to cut through these causality conundrums.
The primary methods that were to become the mainstays of the natural experimenters were well known to economists long ago. The key to uncovering causal effects is a clear-eyed view as to where the variation in the variable of interest comes from and a credible control group.
One of the methods – differences-in-differences – is frequently used to analyse policy changes by comparing areas or groups, where some receive the treatment and others do not. The groups are tracked both before and after the policy change or intervention took place. By comparing changes over time in both the treatment and control groups, the groups don’t have to be identical beforehand. It is enough that they evolve in a similar fashion absent the treatment.
The other method is instrumental variables. This links a source of variation, hopefully unrelated to outcomes and confounders, called the instrument, to the treatment, which in turn should affect the outcome. Instrumental variables can therefore be used to tease out causal effects and often lend themselves nicely to isolating variation stemming from natural experiments.
These methods were known and used by economists well before the 1980s. One example is Theodore Schultz’s (1964) work on the “surplus labour hypothesis”. Schultz wanted to estimate the marginal product of labour in an agricultural economy and used deaths in Indian provinces during the 1918-19 influenza epidemic as a shock to labour supply.
What then, exactly, did the natural experimenters of the 1980s contribute? My naivety on the question of who started it highlights the difficulty in pinpointing the exact contributions that this year’s Nobel is celebrating. To shed light on this, it is necessary to realise that work like that of Schultz was very much the exception. The 1970s and 1980s were rather a dark age for empirical economics. Edward Leamer’s (1983) plea to “take the con out of econometrics” highlights the dismay felt by many at the time.
Leamer illustrates his frustration with the state of affairs with a discussion of the question of whether capital punishment deters crime. Two influential studies by Isaac Ehrlich (1975, 1977) had concluded that it did, but Leamer demonstrated that the opposite result could easily be obtained from a single dataset by changing a few aspects of the estimation. Contemporary critics also took issue with the Ehrlich analyses and highlighted fragility with respect to functional form, the variables included in the regression models, and changes in the sample.
Yet all these controversies missed the main issue, which is whether the association of executions and murder rates tells us something about the causal effect of capital punishment. Ehrlich was well aware of the fact that crime rates might also affect executions and used the instrumental variables method to circumvent this issue. But the instruments he chose – total government expenditures and expenditures on police, population, and the fraction of the population that is non-white – are neither particularly plausible nor does Ehrlich discuss why they are helpful.
This example and the discussions surrounding it were quite representative of the period. In fact, it was probably one of the more careful investigations. The somewhat mechanical use of instrumental variables was part and parcel of the empirical mainstream. Often the instruments were simply past values of the treatment in time series studies.
At the same time, many economists understood the problems of confounding and reverse causality, and that the approaches in wide use were often poorly suited to the task. Orley Ashenfelter at Princeton wrote a series of papers on the evaluation of government sponsored training programmes (Ashenfelter 1974, 1978, Ashenfelter and Card 1985).
Maybe ironically, he also employed the differences-in-differences method to compare workers who received training to other, untrained workers. But looking at different time horizons before and after the training produced widely different estimates of the training impacts. It became clear to him that the earnings trajectories of trained workers differed markedly from the controls: the trainees tended to have a dip in earnings just before starting training. As a result, what might look like a training effect may just be a natural rebound from an adverse situation for the trainees.
Meanwhile, there was a randomised evaluation of one such training programme, the National Supported Work demonstration, and Ashenfelter’s PhD student Robert Lalonde compared the experimental estimates to non-experimental alternatives. The Lalonde (1986) study hammered home the point about the fragility of non-experimental methods and provided an important impetus for many young economists at the time to try to do better.
While Ashenfelter went on to advocate the experimental evaluation of government programmes, another Princeton PhD student, Gary Solon, took a different tack. Solon was interested in the effects of unemployment benefits on the search behaviour of the unemployed and the time it took them to find new work. Unemployment benefits vary for two reasons: the rules to determine benefit levels and the previous earnings and employment histories of the individuals. While we are interested in the effect of the rules, researchers had often simply compared benefits received with outcomes, thus conflating policy with individual heterogeneity in labour market prospects.
Solon (1985) was careful in separating out the effects of the benefit rules by exploiting the taxation of unemployment benefits for high-income earners starting in 1979 in a difference-in-difference design comparing them to untaxed control individuals. This was another landmark study that influenced many around him and further afield, like Lawrence Katz and Bruce Meyer, PhD students at MIT, who would go on to carry this work further.
Princeton was indeed not the only place where discontent was brewing. At Harvard, Richard Freeman was heavily influenced by Schultz’s work mentioned earlier. Freeman tried to emulate Schultz by using similar shocks in his own work, but didn’t quite get to a point that his work can be classified as proper natural experiments.
But one of his PhD students, Alan Krueger, did. Krueger joined Princeton as an assistant professor in 1986 where he became a colleague of David Card; Joshua Angrist was a PhD student at the time. They were part of a group of young economists determined that they could do better and create more credible empirical work.
Card’s (1990) study of the Mariel boatlift to measure the effect of immigration on native workers highlights the difference in approach between the work of this new generation and what came before. To circumvent the problem that immigrants might settle in well performing labour markets, he looked at the arrival of Cuban émigrés in the Mariel boatlift in 1980. The Mariel Cubans all landed in Miami and many stayed put, adding about 7% to the city’s labour force over a few months.
Card carefully compares outcomes for Miami natives in a simple differences-in-differences analysis with a number of comparison cities, which resemble Miami but didn’t receive a large inflow of immigrants. The evolution of earnings and unemployment rates of Miami residents was roughly similar to that in other cities, but the results are somewhat noisy and not fully conclusive. Nevertheless, the Mariel study remains a natural experiment par excellence and influenced many researchers at the time.
David Card and Alan Krueger may be best known for their work on minimum wages and in particular Card and Krueger (1994), which analyses the increase in the state minimum wage in New Jersey. They compare employment effects at fast food restaurants in New Jersey to those in neighbouring Pennsylvania in another differences-in-differences framework and find similar changes in employment. Maybe New Jersey and Pennsylvania were different in some way, that was why one state raised the minimum wage and not the other, and this effect offsets any employment decline. Card and Krueger address this concern by also comparing restaurants charging different wages before the minimum wage increase within New Jersey with similar results.
One innovation of this minimum wage study was that they collected data using their own survey at a time when economists were almost exclusively relying on secondary data sources. Finding good new data sources or generating your own is now commonplace in empirical economics. It turns out that the quality of the New Jersey data was not tremendously good, as was pointed out by critics, and Card and Krueger (2000) improved on it by using administrative records in a re-analysis, which confirmed the results. Critically revaluating their own work, and building on it where necessary, is another hallmark of these authors.
All the studies just described use differences-in-differences but a key tool of the natural experiment movement was the instrumental variables method. Its use is highlighted in the study by Angrist and Krueger (1991) of the returns to completing an additional year of school. They exploit the nature of compulsory schooling laws in the United States, which stipulate that students can drop out of school as soon as they reach the compulsory schooling age (16 in many places), even when their birthday is in the middle of the school year. Since school entry is only once a year in September, date of birth generates variation in the time in school for those who decide to drop out as soon as they can. This leads Angrist and Krueger to use quarter of birth, which is available in the large public use samples of the US Censuses, as an instrument for years of schooling.
This paper in particular stands out as a turning point in terms of the modern, thoughtful use of instrumental variables (though there are some others as well). Angrist and Krueger carefully back up the inherent assumptions with subsidiary empirical evidence. For example, the authors document that compulsory schooling laws affect school attendance in younger cohorts in the Census in exactly the way expected, although these are not the main cohorts used in the analysis. They also show that the effect of quarter of birth affects primarily those with high school and less, who should be affected by compulsory schooling laws, but not individuals who obtain more education. Like in many of the key natural experiment papers, simple graphical evidence supports its story and this transparency is another key feature of the natural experiment movement.
One of the surprises of Angrist and Krueger’s study of schooling was that instrumental variables actually delivered a higher payoff to schooling than a more naïve analysis. This is surprising because most analysts expected the naïve association to be too large because we believe that those who would have higher earnings anyway are also the ones getting more schooling, biasing the naïve association upwards.
This is where the work of Angrist and Guido Imbens comes in. Both had just finished their PhDs and met as assistant professors at Harvard. Clearly, different individuals will have different benefits from an additional year of schooling. Someone may have a job in the family firm guaranteed and completing high school may matter little; for someone else, it may be life changing. So whose return to education are we measuring? Imbens and Angrist (1994) show that it is those individuals whose schooling is affected by the instrument.
In the Angrist and Krueger case, this consists of individuals at the margin of dropping out early. Their study tells us little about completing a university degree. While we don’t learn the earnings effect of a year of schooling for everyone and not for every year of schooling, the results from studies using instrumental variables are often particularly relevant for groups that are a concern for policy purposes.
The Angrist and Imbens analysis has played a key role in Card’s (2001) interpretation of the literature on returns to schooling, and the insights have been used and extended by numerous researchers since. It also highlights that any single study will only provide partial results on a question. To gain a fuller picture, we need more analyses with different instruments or otherwise different individuals affected. These issues of external validity have been receiving greater and greater attention over time.
The classic papers of these authors have not gone without their critics. In fact, all the papers mentioned here have been meticulously dissected and re-analysed by others. This process has led to a better understanding of the methods used and to many methodological improvements. The Fab Four have been much involved in this process themselves. Lo and behold, and maybe surprisingly, many of the papers have stood up rather well to the scrutiny they have received.
The natural experiment methodology has not just swept through labour economics but many other fields of economics as well. Some worried that we would soon run out of good natural experiments and that the methodology would lead to a focus on narrow, relatively unimportant questions. Instead, the opposite seems to have happened: as more researchers use the methods, we come across more and more good settings to exploit. Good questions have attracted good answers.
One crude metric for the success of the natural experiment revolution is the fact that since 1990, at least half of the Clark Medals (a prize given by the American Economic Association to the best economist under 40) have gone to medallists associated with natural experiment methods. This is remarkable as the prize-winners come from all fields including theory, and it also highlights the success that empirical economics has enjoyed over this period. Surely this is in no small part due to the fact that this year’s laureates have shown us the way to do credible empirical work.
Angrist, J D, and A B Krueger (1991), “Does Compulsory School Attendance Affect Schooling and Earnings?”, Quarterly Journal of Economics 106(4): 976-1014.
Ashenfelter, O (1974), “The Effect of Manpower Training on Earnings: Preliminary Results”, Proceedings of the 27th Annual Meeting of the Industrial Relations Research Association.
Ashenfelter, O (1978), “Estimating the Effect of Training Programs on Earnings”, Review of Economics and Statistics 60(1): 47-57.
Ashenfelter, O, and D Card (1985), “Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs”, Review of Economics and Statistics 67(4): 648-60.
Card, D (1990), “The Impact of the Mariel Boatlift on the Miami Labor Market”, Industrial and Labor Relations Review 43(2): 245-57.
Card, D (2001), “Estimating the Returns to Schooling: Progress on some Persistent Econometric Problems”, Econometrica 69(5): 1127-60.
Card, D, and A B Krueger (1994), "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania", American Economic Review 84(4): 772-93.
Card, D, and A B Krueger (2000), “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Reply”, American Economic Review 90(5): 1397-1420.
Ehrlich, I (1975), “The Deterrent Effect of Capital Punishment: A Question of Life and Death”, American Economic Review 65(3): 397-417.
Ehrlich, I (1977), “Capital Punishment and Deterrence: Some Further Thoughts and Additional Evidence”, Journal of Political Economy 85(4): 741-88.
Imbens, G, and J D Angrist (1994), “Identification and Estimation of Local Average Treatment Effects”, Econometrica 61(2): 467-76.
Leamer, E E (1983), “Let's Take the Con Out of Econometrics”, American Economic Review 73(1): 31-43.
Schultz, T W (1964), Transforming Traditional Agriculture, Yale University Press
Solon, G (1985), “Work Incentive Effects of Taxing Unemployment Insurance”, Econometrica 53(2): 295-306.