Microeconometrics
Much Ado

In recent years, more and more applied work in microeconometrics has been conducted using disaggregated data at the household level, such as the Family Expenditure Survey (FES). Such data can yield useful information, but new problems arise in analysing them. Many household surveys record "zero expenditures', when the household makes no purchases of particular goods during the survey interview period. This forces the investigator to consider the "truncation problem': expenditures cannot be negative, yet the conventional regression model allows this to occur. Another difficulty lies in the short duration of the interview period: a household might not happen to purchase a relatively durable good in this particular period, but may nevertheless consume its services continuously.

How can methods of analysis be adapted to deal with this problem? The importance of the topic prompted CEPR and the ESRC Econometric Study Group to hold a workshop at the Centre on 30 November. It was organized by CEPR Associate Director Professor Richard Blundell (University College London) and brought together participants from a number of British universities. The two papers presented at the workshop approached the problem of "zero observations' from different angles and provided interesting alternative solutions.

Some economic variables cannot by their very nature take on negative values. In the microeconomic context, such variables could be expenditure or labour supply. In the conventional linear regression model, however, the dependent variable can take on any positive or negative value. One well-known solution to this problem is provided by the "Tobit model', a regression model in which the values of endogenous variable are truncated at zero and only positive values are assumed.

Joanna Gomulka (LSE) noted in her paper "Gamma-Tobit: A Tobit Type Model with Gamma-distributed Error Terms', that the assumption of normally distributed error terms usually made in the Tobit model can have important implications for the estimates of regression coefficients. In order to obtain a more general framework, Gomulka considered a family of distributions that contains the normal distribution as a special case. She argued that this approach offered much more flexibility, yet only required the estimation of one additional parameter.

This modified Tobit model was then applied to models of household expenditure on tobacco and alcohol. These are typically categories where many zero expenditures are recorded in survey data, and the truncation problem is thus potentially severe. Expenditure on these commodities is particularly important for policy-makers because of their implications for health and their importance as sources of government revenue. Expenditure data are available from the FES and relate to some 53,000 households over the period 1970-80. From this main sample Gomulka and her colleagues drew several random subsamples of about one-tenth this size in order to compare "Tobit' and "modified-Tobit' estimators. Their model explains the share of tobacco and alcohol in total household expenditure in terms of total expenditure, time, relative prices, the age and socioeconomic status of the head of the household, and household composition.

For tobacco the difference between the results obtained under the assumption of normal errors and those obtained under Gomulka's more general distribution is relatively small. This indicates that the usual Tobit model is not unreasonable for this commodity.

In the case of expenditures on alcohol, however, the outcome is completely different. Here the results suggested an important departure from the conventional model. Hence, forecasts made on the basis of the modified model may differ substantially from those based on the usual Tobit model. The modified model proposed by Gomulka had enough additional flexibility to allow it to properly capture the particular spread of the data and seemed a sounder basis for policy analysis.

Of course, other ways of extending the class of underlying distributions could be considered. This proved to be one of the main issues in the discussion of the paper. A more fundamental methodological question was also raised: should we focus our attention on the properties of the error term - the "unexplained' part of the dependent variable, or on the performance of the explanatory variables in the model, such as age and socioeconomic status?

The estimation of demand functions and Engel curves (the relationship between an individual's consumption of a particular good and his total consumption) has always been a central issue in applied microeconometrics. The necessary budget data, disaggregated to household level, are often taken from household surveys such as the FES. In "Zero Expenditures and the Estimation of Engel Curves', Michael Keen (Essex) stressed the short duration of the interview period in such household surveys. For the FES, for instance, this period only covers two weeks. Data from such surveys may be prone to serious "measurement error', since a household may not happen to purchase certain (more durable) commodities during the relatively short interview period. In this context there is a crucial distinction between the purchase of a good and its "consumption'.

We know that zero observations may reflect either the relative infrequency of purchase, a genuine lack of consumption, or systematic under-reporting. In view of the prevalence of the zeros, Keen argued that the second explanation was implausible. Moreover, under-reporting in the FES data seems to be confined to certain goods such as tobacco and alcohol. Therefore, Keen assumed in his further analysis that zero expenditures arise from the infrequency of purchases. He argued this enhanced both the clarity and the tractability of the subsequent analysis. Keen also assumed the probability of purchase for any good was equal for all consumers and independent of true consumption, and the Engel curves were assumed to be linear. The restriction that the sum of a household's consumption over all commodities should equal that household's total consumption, gave rise to a correlation between the error and the explanatory variables in the model. Conventional estimation methods are inappropriate in this case, and Keen argued that they would lead to overestimation of the marginal propensity to consume goods which were purchased infrequently. Keen proposed an estimator using as an instrumental variable "normal income', obtained from the FES, to correct for these difficulties.

Is this procedure appropriate for the FES data? Keen used data on 195 one-parent families (with fewer than two working members) from the 1977 FES and found that the predicted overestimation with the conventional methods did in fact occur. The large differences between the conventional estimator and the one used by Keen suggest that the measurement errors have seriously contaminated these data. However, two of Keen's initial hypotheses did not seem to be confirmed by the data: the independence between the probability of purchase and the level of consumption was rejected for three commodity groups. The hypothesis of linear Engel curves seemed to be dubious for most goods. Nevertheless Keen's empirical results clearly demonstrated the importance of the measurement error problem in the FES data.

In his presentation, Keen noted the simplicity and tractability of his approach, even for the estimation of complete demand systems. He remarked that Gomulka's work dealt mainly with explaining the demand of "potential' consumers by differences in preferences , whereas his own paper stressed the distinction between purchase and consumption . The results of the papers suggested that both of these possible causes for zero observations can, for some commodities, seriously influence empirical results and the policy conclusions drawn from them. There was a need to combine treatment of both problems in a more comprehensive framework. In view of the rapidly growing use of highly disaggregated survey data, which typically contain zero observations, this topic is likely to inspire further research.