What measure of GDP should researchers use? The standard answer to this question is the latest vintage of the Penn World Tables (PWT). This dataset was the first to measure GDP as output valued under purchasing power parity (PPP) adjusted prices, taking into account the fact that real prices in the developing world are often lower than in the developed world. The PWT are frequently updated to reflect the most recent data and superior methods. Their current (8th) vintage attempts to improve on this fundamental insight by using price surveys from the International Comparisons Programme (ICP) from multiple years to measure not only levels of GDP in a particular year, but also economic growth across time at purchasing power parity. The original paper introducing the PWT (Summers and Heston 1991) has 4,287 cites on Google Scholar as of April 2016, and the paper introducing the current vintage has recently been published in the American Economic Review (Feenstra et al. 2015).
However, it is not obvious that using the latest vintage of the Penn World Tables is the best approach. The PWT is not the only source of PPP-adjusted GDP estimates, as the World Bank now produces its own series in the World Development Indicators (WDI). Moreover, there are large differences between successive vintages of the PWT. Ciccone and Jarocinski (2008) find that variables found to be robust determinants of economic growth using one vintage of the PWT no longer remain so when using the following vintage. Johnson et al. (2009) note that countries ranked as slowest growing by one vintage may be ranked as fastest growing by the following one. Assuming that the later vintage is always right is not safe, because measurement errors in the national accounts can be large. For example, when the ICP released its price survey for 2005, estimates of Chinese GDP fell by 40% relative to the previous version, but this change was largely undone by the subsequent (2011) price survey. The problem of selecting the right vintage is also present for the WDI, as it also uses new methodology and price survey information in every year. Hence, whenever researchers wish to use PPP-adjusted estimates of national accounts data, they need to answer several questions. First, should they immediately update to the newest version of the PWT? Second, should they use the PWT or the WDI? Third, if they use the WDI, what version of the WDI should they use?
Previous studies of the reliability of the PWT, such as Johnson et al (2009), perform painstaking analyses of methodological differences between different vintages and competing datasets, and try to assess which dataset is better, based on mainly theoretical considerations. However, no dataset is perfect, and balancing methodological imperfections across different datasets is difficult without being able to observe some empirical index of the quality of a dataset. In a new paper, we propose a data-driven approach to determining how well different estimators of GDP measure unobserved true income relative to each other (Pinkovskiy and Sala-i-Martin 2016b).
If we had a measurement of GDP per capita whose measurement error was uncorrelated with the measurement errors of the different vintages of the Penn World Tables, it would be easy to see which vintage was better by comparing them both to the independent measure. In another recent paper (Pinkovskiy and Sala-i-Martin 2016a), we argue that such an independent measurement can be constructed using data on satellite-recorded night-time lights, which were first studied by Elvidge et al (1993), and in economics by Henderson et al (2012) and Chen and Nordhaus (2011). While errors in different vintages of the PWT and WDI come from errors in the underlying national accounts data (such as faulty assumptions about economic relationships like input-output tables), or from errors in calculating indicators of PPP between different currencies, errors in the relationship between night-time lights and economic output come from weather and atmospheric disturbances that affect how light is captured by the orbiting satellites. These errors are plausibly uncorrelated with one another, making night-time lights a useful ‘independent referee’ in determining the quality of conventional statistics. Unlike the previous literature, we can test for which GDP measure is closest to unobserved true income, and we do not need to know anything about the way that the GDP measures were constructed for our empirical tests to yield valid answers. On the other hand, our method does not shed light on the methodological reasons why one estimator of true income might be better than another, although we can use our knowledge of how the GDP measures were constructed to make conjectures.
The core of our analysis will be to regress night-time lights per capita on measures of GDP per capita, such as different vintages of the PWT, or a vintage from the PWT and a vintage from the WDI. In Pinkovskiy and Sala-i-Martin (2016a), we show that the coefficients from such a regression are proportional to the optimal weights on these measures in the best unbiased linear combination of the two measures. If each newer vintage of the PWT were truly better than the older vintage (less biased and more precise), when lights are regressed on both vintages, the newer vintage should receive a positive and significant coefficient, while the older vintage should receive a zero coefficient.
We find that, in general, newer versions of the Penn World Tables are not necessarily better than their direct predecessors. We can never reject that it would be helpful, from a statistical point of view, to form some nontrivial linear combination between any new PWT vintage and its preceding vintage in order to compute GDP, than to use the new PWT vintage alone. Moreover, the most recent PWT index that is based on multiple PPP benchmarks appears to have been less accurate than the preceding PWT index. Regardless of whether we seek to measure levels of GDP per capita or growth rates, the optimal weight on the current constant price PWT index in the best unbiased linear combination is below 14%, while the weight on the preceding PWT index is above 85%. We show that this finding is not driven by night-time lights being a biased indicator of output across industries and is not driven by any subsample of the data (although there is interesting regional heterogeneity in the relative quality of the two PWT vintages). On the other hand, the most recent PWT index that uses national, rather than PPP-adjusted, growth rates appears to be more accurate in measuring economic growth than the similarly constructed index in the previous vintage.
We also find that newer versions of the WDI are better than older ones at measuring cross-country income differences (though not necessarily growth rates). In particular, we show that each successive price survey of the ICP has generally led to better estimates of PPP-adjusted GDP per capita, including the controversial 2005 round. While we cannot say anything about China (because we have only a few observations for it), it appears that on net, both the 2005 price survey and the 2011 price survey improved our understanding of global price levels even though they had conflicting estimates for some countries.
More fundamentally, we show that so long as night-time light growth rates can be taken as unbiased predictors of the growth rates of true income, the best way to measure growth rates of true income is to use growth rate series based on the national accounts alone, and without the adjustments employed by almost all the versions of the Penn World Tables. The GDP series most successful in explaining night-time light growth rates (the WDI and PWT 8.1 national growth rates series) compute growth rates based on national accounts alone, and without incorporating any information about PPPs. On the other hand, the series that goes furthest in attempting to construct a GDP index in prices that were truly constant across space and over time (PWT 8.1 constant price series) is less useful in explaining night-time light growth than any of the above series.
Finally, our results also provide guidance for researchers seeking a GDP series to use, for example in investigating the determinants of cross-country income differences, or in modelling the macroeconomy. One way to select such a series from among the most recent ones (those based on 2005 PPP or later) is to ask whether there is a unique series that, in every head-to-head comparison against another recent series and for every set of country and year fixed effects, either receives the larger (and statistically significant) weight in the optimal linear combination of the two, or, if it receives the smaller weight, neither of the weights are statistically significantly different from zero. Based on this criterion, we recommend using the WDI 2011 series. When compared in the cross section against PWT 7.1, PWT 8 or WDI 2005, it receives a larger and statistically significant weight (which always is above 0.77). When country fixed effects are included in the comparison, WDI 2011 receives the larger weight (which is always greater than unity) except when compared against WDI 2005, but for that comparison, none of the weights are significant. Hence, if one seeks a single GDP series to measure both levels and growth rates of economic activity, the best candidate appears to be WDI 2011.
Authors' note: Any views expressed in this paper are the authors' and do not necessarily reflect those of the Federal Reserve Bank of New York or of the Federal Reserve System. All errors are our own.
Chen, X and W D Nordhaus (2011) "Using luminosity data as a proxy for economic statistics", Proceedings of the National Academy of Sciences.
Elvidge, C D, K E Baugh, E A Kihn, H W Kroehl and E R Davis (1997) "Mapping city lights with night-time data from the DMSP operational linescan system", Photogrammetric Engineering & Remote Sensing, 63(6): 727-734.
Feenstra, R C, R Inklaar and M P Timmer (2015) "The next generation of the Penn World Table", American Economic Review, 105(10): 3150-3182.
Henderson, J V, A Storeygard and D N Weil (2012) "Measuring economic growth from outer space", American Economic Review, 102(2): 994-1028.
Johnson, S, W Larson, C Papageorgiou and A Subramanian (2013) "Is newer better? Penn World Table revisions and their impact on growth estimates", Journal of Monetary Economics, 60(2): 255-274.
Pinkovskiy, M L and X Sala-i-Martin (2016a) "Lights, camera, ... income! Illuminating the national accounts-household surveys debate", Quarterly Journal of Economics, 131(2): 579-631.
Pinkovskiy, M L and X Sala-i-Martin (2016b) “Newer need not be better: Evaluating the Penn World Tables and the World Development Indicators using night-time lights”, NBER, Working Paper no 22216.