When Is An Economic Model Fit to Survive?
Grayham Mizon

Although economic policy is partly determined by political, economic and social objectives, it should also rest on an understanding of the relevant economic phenomena and an ability to predict them. Such understanding will usually be based either explicitly or implicitly on the use of economic models, which therefore deserve thorough testing. It is seldom easy, however, to gain universal acceptance of the analytical models or empirical evidence which underlie particular policy proposals. The Chancellor's recent dismissal of the report of the all-party House of Lords Select Committee on Overseas Trade as 'a mixture of special pleading dressed up as analysis and assertion masquerading as evidence' illustrates this perfectly.

This reaction also underlines the importance of establishing the credentials of economic models and evidence. Such credentials, because they are important in the creation, promulgation and advocacy of economic policies, must be readily available for inspection and capable of commanding widespread recognition and approval. Yet as the Chancellor's remarks indicate, it is often difficult to establish these credentials. Why should this be so? Economists and econometricians must accept some responsibility for this situation. It may indeed be true that, as Mark Blaug said in 1980, 'Empirical work that fails utterly to discriminate between competing explanations quickly degenerates into a sort of mindless instrumentalism, and it is not too much to say that the bulk of empirical work in modern economics is guilty on that score.' How should empirical researchers respond to this challenge? An important part of the process of establishing a model's credentials lies in the evaluation and comparison of alternative models. Researchers need to devote more attention in general to systematic model comparisons and in particular to the search for what have been termed 'encompassing' models.

Both economic theory and econometric analysis have a role in establishing the credibility of the framework underlying economic policy prescriptions. Economic theory allows us to interpret models and to communicate to others the ideas underlying them. In addition, a model based on economic theory is usually derived from a set of underlying, more basic assumptions or axioms. This permits us to test whether the model is 'coherent' with the currently accepted general precepts of economics. However a model need not be consistent with conventional economic theory in order to be worthwhile. To ignore such inconsistent but innovative models might mean sacrificing the benefits of serendipity. Edward Leamer has recently argued that 'whimsicality' in the creation of models is a major cause of disillusionment with the performance of econometric models. Whilst models grounded in economic theory should be expected on average to be more reliable and useful than whimsically created models, the assessment of a model's performance must be separated and distinguished from consideration of its origins. David Hendry and I have argued in CEPR Discussion Paper No. 68 that serendipity in the creation of theories and models is highly desirable, in so far as it enables us to escape the confines of received economic theory, and occasionally to make unconventional breakthroughs. In any case practical considerations may well lead investigators away from whimsical models; considerations of research efficiency often lead them to use models consistent with well-tried economic theories, if only to avoid having to run 500 regressions before an acceptable econometric model is found!

Though we see a potentially important role for serendipity in the creation of models, Hendry and I argue strongly that there is no role for it in their evaluation. Theories and the models embodying them, whimsical or not, must be coherent with the relevant available evidence in order that their usefulness and likely durability can be established.

Econometric analysis is important precisely because it is an essential tool in any systematic process of model evaluation and assessment. Experience has taught us that the selective use of evidence to support theories may yield short-term confirmation, but that this may represent no more than the corroboration of prejudice or whim. One cannot judge the adequacy of economic theories and models solely on the grounds of goodness of fit (high explanatory power) and the 'correct' sign and magnitude of the estimated parameters. For example, it has been argued that higher rates of inflation create greater relative price variability and hence more uncertainty, which can lead to costly resource misallocations. Much of the early empirical work concerned with this issue was content to use the existence of a positive correlation between a measure of relative price variability and inflation to 'confirm' their hypothesized relationship. In a forthcoming CEPR Discussion Paper, Claire Safford, Stephen Thomas and I analyze the UK evidence on consumer prices for relationships between inflation and relative price variability. We find that simple correlations are not only misleading, but also constitute models that do not survive standard tests for model adequacy. Regressions of relative price variability on quarterly seasonal dummy variables and the rate of inflation (or the square of this rate, which is dimensionally more appropriate, as pointed out by John Moore in CEPR Discussion Paper No. 19), suffer dramatic 'predictive failure' after the first quarter of 1973. The regression residuals are also serially correlated, and there is clearly relevant information concerning other macroeconomic variables and the impact of changes in tax and excise duties on relative price variability which the regression does not exploit. The first oil price hike, and the ceremonial practice of announcing changes in tax rates, excise duties and administered prices in UK budget speeches, have had major impacts on relative price variability in the UK. The naive 'confirmationist' modelling described above is unlikely to discover this.

Other examples can be found in the positive correlations between money supply and inflation, or between unemployment and the level of real wages. These relationships have the appeal of suggestive simplicity, but they do not provide a solid foundation on which to build economic policy. Even the most rudimentary econometric evaluation of such models reveals their naivety and fragility. The proponents of these theories may be disappointed, but the exposure of economic myth by econometric reality is valuable information that should not be ignored.

Econometric model evaluation is mainly concerned with establishing that a model is coherent with the information available from a number of different sources and assessing the model's robustness to changes in the information available. For example, as new time series observations become available the forecasting ability of a model can be evaluated. If the model predicts badly, i.e. if it displays 'predictive failure', this may indicate that the model was not coherent even with the information used in its original construction and evaluation. Yet there is also a danger that a model can be too 'finely tuned' to the data set originally used in its development. Predictive failure can also be a symptom, therefore, of such 'overfitting'. In either case the message is the same - a more robust model is needed.

Many apparently well-founded and thoroughly tested models have suffered predictive failure in recent years, including the Phillips curve and traditional demand for money functions in both the UK and the USA. Exposing the weaknesses of such economic models has led to much widespread disillusionment with econometrics. In part, this reflects a misunderstanding of the essentially destructive nature of econometrics. Econometric analysis is not concerned with garnering truth, nor does it lead to the best model once and for all time. Modelling is an evolutionary process, not a single event, and a notable achievement of econometric modelling is the weeding out of inadequate or 'unfit' models and economic theories. This destructive role of econometrics, though, has a constructive purpose, in isolating the best available models for a particular purpose at any point in time.

Dissatisfaction with the performance of econometric models, and indeed with econometric practice in general, has recently led a number of econometricians to propose constructive modelling strategies designed to yield 'adequate' models. For example, Edward Leamer, in his determination to 'take the con out of econometrics', has argued that current econometric techniques are full of 'whimsy' and 'fragility'. He has proposed systematic and wide-ranging sensitivity analyses, such as extreme bounds analysis (EBA), as his preferred strategy to eliminate these faults. Although this modelling strategy has now been implemented in computer software and applied in a number of areas of economics, econometricians remain justifiably sceptical. In CEPR Discussion Paper No. 39, Michael McAleer, Adrian Pagan and Paul Volker are highly critical of Leamer's procedures, including EBA. They argue that such procedures are undesirable, above all because they tend to divert an investigator from the vital task of rigorous model evaluation.

Systematic and rigorous model evaluation is essential, but it is unlikely to be perfectly incorporated in a single modelling strategy, so the search for 'the best' constructive modelling strategy will probably be unsuccessful. In Discussion Paper No. 68, Hendry and I argue that although there is no known set of sufficient conditions for model adequacy, model design criteria which aim to produce congruent models (i.e. models coherent with all available sources of information) have proved to be valuable, both in terms of research efficiency and model durability and robustness. The model criteria which we discuss in the Discussion Paper and have used in our own empirical work provide a set of necessary conditions which we believe must be satisfied if a model is to be worthy of serious consideration.

An investigator has eight important sources of information potentially available to guide his search for adequate models: a priori theory, usually economic theory; past, present and future sample data on the variables relevant to the class of models he is considering; past, present and future data provided by and incorporated in alternative models (i.e., models put forward by other investigators). The investigator should also make use of the information implicit in the properties of the measurement system used to collect the sample data; for example, the percentage unemployment rate must be between 0 and 100, so it is desirable to have models that can only generate fitted and predicted values of the unemployment rate in this range.

David Hendry, Jean-Francois Richard and I, in a series of articles referred to in Discussion Paper No. 68, discuss in more detail how these eight sources of information can be used in model building. We also examine how one might test whether a model is congruent with these sources of information, and how this procedure is related to traditional tests of model adequacy. In particular, we argue that it is essential that each investigator should check the performance of his model against that of rival models. Underlying this argument is the fundamental principle of encompassing. A model which encompasses its rivals can explain or predict at least as much as its rivals can, so the rivals are redundant. When adopted as part of a modelling strategy, encompassing helps to identify 'inferentially redundant' models, i.e. those which are dominated by other models. By searching for encompassing models investigators will ensure that a model will be discarded only when it is inferentially redundant. Model building therefore becomes an evolutionary process, in which new models are accepted only if they prove they are 'more fit' by encompassing their rivals.

Testing whether a model encompasses its rivals obliges us to undertake direct comparisons of our own model with those of other investigators, and to assess the relative properties and performance of a range of alternative models. We may require, as a necessary condition of good model design or model adequacy, that a model encompasses all rival models. This is a stringent requirement, but a model which encompasses all its rivals is an impressive one. Such impressive evidence in its favour is precisely what is needed if others are to be persuaded not only that the model is worthwhile in itself, but also that it can form a sound basis for policy recommendations. A congruent model, which encompasses all its rivals, therefore boasts impeccable credentials.
The features of a model which are reported by an investigator and the presentational style adopted in describing the model are also an important part of the process of establishing a model's credentials. Hence it is not sufficient simply to state that a model has undergone thorough econometric evaluation. Enough information should be provided in the form of summary and test statistics to indicate the precise nature of the model design strategy which was adopted. This is particularly important in discussions of the publicly available versions of large-scale econometric models used for macroeconomic forecasting and policy analysis, where there is a clear need for more reported information and more uniformity and consistency in reporting styles across the different modelling teams. Interpretation, understanding and comparisons of models would also be greatly enhanced if the same measured data were used for variables such as unemployment, which are common to many of the models. It is also important to record which version of the model was used when reporting the results of forecasting, simulation or policy analyses using any of the large-scale econometric models. Such models undergo constant modifications and their properties can change very dramatically over time.

Many of the large-scale econometric models are now publicly funded, and it is pleasing to note that these models and the forecasts from them are publicly available, and that the ESRC Macro-Modelling Bureau at Warwick University has encouraged the model proprietors to increase the scope and uniformity of their reporting styles. The time is right to augment these developments by investing more effort and resources in model evaluation. We need to develop and refine new techniques and to broaden our understanding of model evaluation through practical experience. CEPR intends to further such research as part of its programme in Applied Economic Theory and Econometrics.

Grayham Mizon is Leverhulme Professor of Econometrics at Southampton University and a Research Fellow in the Centre's programme in Applied Economic Theory and Econometrics. Further details of the research described in this article can be found in CEPR Discussion Papers Nos. 19, 39 and 68. Further information can be obtained by contacting Professor Mizon at the Centre.