|
|
The
evaluation of econometric models
A one-day workshop was held on Friday, 23 March to
discuss methods of evaluating econometric models generally, though the
evaluation of large macroeconometric models was a particular concern.
The workshop, which was part of the Centre's programme on Developments
in Applied Economic Theory and Econometrics, was chaired by Programme
Director Grayham Mizon. It was attended by 24 people, including academic
econometricians, economists from HM Treasury, NEDO and the Civil Service
College, and members of the NIESR and LBS modelling teams.
In his introductory remarks, Mizon noted that econometric models had
been built and used since the mid-1960s, but for obvious reasons more
effort had been put into their construction than into the development
and application of methods for evaluating and comparing such models. Now
attention should turn to the adequacy of commonly used methods of model
evaluation (e.g. comparison of forecast accuracy and dynamic simulation
performance) and the alternatives to be explored. This was particularly
important since the funding of many of the large macroeconometric
modelling teams had been made more secure by the allocations of the
Consortium on Macroeconomic Research. Furthermore, the creation of the
Economic and Social Research Council's Macroeconomic Modelling Bureau at
Warwick University, with funding from the Consortium, should allow
easier access to the major models, and permit more ambitious model
evaluation and comparison exercises. With these remarks in mind, the
participants heard papers by David Hendry, Trevor Breusch, Hashem
Pesaran and Noxy Dastoor, which were then commented on by Ron Smith,
Alberto Holly, Len Gill and Grayham Mizon before a general discussion
was opened. Some of the material was rather technical, but the issues
addressed were of wide general relevance.
Hendry emphasized the need for funding of research on model evaluation
because some methods currently used were inadequate. His paper
concentrated on the statistical considerations which were relevant for
model evaluation, in order that attention could be focussed on a
coherent set of problems and potential solutions. While other
non-statistical criteria are clearly relevant, econometric evaluation
would always be an important part of model assessment. If they were to
be useful, models must be subjected to a wide range of statistical tests
- e.g. for serial correlation, heteroscedasticity, instrument validity,
and constancy of parameters; and investigating whether random errors are
innovations (shocks) relative to the information set being used.
Hendry pointed out the special role of such tests in modelling and model
selection. Since an acceptable model for a particular purpose would by
design satisfy the battery of test statistics employed to select it, the
results of such tests cannot be interpreted in the usual way. Rather
they should be regarded as descriptive statistics, which characterize
the adequacy of the modelling strategy. For example, if models
must exhibit constant parameters in order to be acceptable, the
particular value taken by the parameter constancy test statistic for the
finally reported model simply reflects the stringency with which this
particular selection criterion was applied. A genuine test of the
constancy of a model's parameters could only be carried out using data
which were not employed in the model selection process.
In his comments on Hendry's paper, Ron Smith challenged the view that
one should "test, test and test again", arguing that it is
virtually impossible to determine the exact statistical properties of
all the tests together. General discussion revealed that most
participants were not persuaded that it was feasible or preferable to
adopt the alternative procedure, a formal decision theoretic approach to
model selection, with the costs and benefits of the relevant model
characteristics clearly specified in a loss function.
Breusch and Dastoor analyzed in more detail particular types of
significance test statistics. An increasingly common approach is to use
the specification tests devised by Hausman, which test the adequacy of a
model by comparing an efficient estimate of the key model parameters
with another estimate which would be consistent for the same parameters
if the appropriate model were more general than the one being
entertained. Such tests can be compared with classical (e.g. likelihood
ratio) tests of the hypothesis which yields the model of interest as a
special case of a more general model. For situations where they are
different, it has been suggested that Hausman tests are better because
they focus precisely on the requirement that the estimators of key
parameters have desirable properties. Breusch compared properties of
Hausman and classical tests using the explicit objective of good
parameter estimates, and he concluded that the superiority claimed for
Hausman tests was ill-founded. In discussion it was suggested that while
Breusch's argument was convincing, Hausman tests could nevertheless be
useful as general tests of model adequacy. Dastoor maintained that the
Cox non-nested test statistic, which is useful for comparing pairs of
models, neither of which is a special case of the other - a very common
situation with macroeconometric models - can also be interpreted as a
classical test, within the framework of a general model which embeds the
competing models. Whilst this point was uncontroversial, there was
strong disagreement with the view that such a general model used for
this purpose need not be economically sensible.
The question of how to compare alternative models which are not
necessarily special cases of each other was treated by Pesaran. He
discussed the use of information criteria for measuring the
"closeness" of models and provided a taxonomy of nested, non-
nested, and non-nested but locally nested models. Pesaran, who was
presenting joint work with Ron Smith, also emphasised the practical
difficulties in attempting to build large macroeconomic models which are
useful, satisfy economic theorists and also pass the rigorous technical
tests of econometricians. They argued that it is not surprising that
compromise and pragmatism are the rule in large scale modelling. There
was general agreement among the participants, however, that
notwithstanding these difficulties models must be subjected to rigorous
testing.
The most commonly used methods for comparing and evaluating econometric
models are based on dynamic simulation tracking performance, forecast
accuracy and economic plausibility. Hendry argued that the first
criterion is inadequate. Differences among models in their choices of
endogenous and extraneous variables make inter-model comparisons
difficult, and this is not resolved by having all model builders agree
on a common set of exogenous variables. All that dynamic simulation
accuracy reflects, Hendry argued, is the extent to which the explanation
of the data is attributed to non-modelled variables, i.e those which are
asserted to be exogenous. Hence dynamic simulation is not a sensible
model selection criterion if one wishes to choose models for
forecasting, policy analysis and testing economic theories. Neither does
the second criterion, forecast accuracy, guarantee model validity, since
forecasts are usually generated by a combination of the model and the
model builder. Hendry also argued that one-period forecast accuracy was
a consequence of choosing models according to goodness-of-fit criteria,
so multi- period forecast tests were desirable.
Moreover, even models which have been designed to satisfy best- practice
econometric tests should then be evaluated by using the new information
provided by alternative models. This could be achieved by adopting the
encompassing principle, which requires that a model be able to explain
the behaviour of competing models. In particular a model which is to
replace previously acceptable models should be able to account for at
least as much as its predecessors.
Economic plausibility was too weak a criterion for model selection.
Confirmation of economic theories was inadequate. Model builders could
not impose economic theories on their models and then claim that their
models lend support to those same theories. Nor are goodness of fit and
"correct" parameter signs in themselves sufficient criteria
for model validity.
It was generally agreed that the meeting had been productive, both for
the content of the papers presented and for clarifying a research agenda
for model evaluation. There was a clear need for further research
funding on appropriate methods of model evaluation, which should be
firmly focussed on the direct comparison of at least two working
econometric models.
|
|