VoxEU Column Frontiers of economic research

The Billion Prices Project: Using online data for measurement and research

Big Data is changing the world, even economics. This column describes MIT’s Billion Prices Project and discusses key lessons for both inflation measurement and some fundamental research questions in macro and international economics. Online prices can be used to construct daily price indexes in multiple countries and to help avoid measurement biases. 

As Einav and Levin (2014) recently point out, “the data revolution of the past decade is likely to have a further and profound effect on economic research” and measurement. One key area that is already being affected by these new data sources is the measurement of inflation and other price statistics.

The challenges of inflation measurement are well known.1 Many have to do with the limitations of the raw price data and the fact that the basic procedure to collect them has remained roughly the same for decades. In particular, a large number of people working for national statistical offices visit hundreds of stores on a monthly basis to collect prices for a pre-selected basket of goods and services. This process is expensive, complex, and often too slow for some users of the data. Infrequent sampling and slow updates to the baskets complicate adjustments for quality changes and the introduction of new goods. Groves (2011) further describes many other challenges faced by traditional survey-based methods of data collection, including growing levels of non-response. And while recent economic crises have prompted policymakers and other users of these statistics to demand faster and more accurate data, shrinking resources are straining the indispensable work of many national statistical offices, as discussed by Abraham et al. (2015).

Big Data in general, and online prices in particular, have a natural appeal in this context, as we describe in Cavallo and Rigobon (2016). While prices are dispersed across hundreds of websites and thousands of webpages, advances in automated “scraping” software now allow anyone to design and implement large-scale data collections on the web. Detailed information can be collected for each good, and new and disappearing products can be quickly detected and accounted for. Online data collection is cheap, fast, and accurate, making it an ideal complement to traditional methods of collecting prices, particularly in categories of goods that are well-represented online.

Our first use of online data to construct inflation indexes was motivated by the manipulation of inflation statistics in Argentina from 2007 to 2015. Using online data collected every day from the websites of large retailers, Cavallo (2013) showed that while Argentina’s government announced an average annual inflation rate of 8% from 2007-2011, the online data suggested it was actually over 20%, in line with the estimates of some provincial governments and local economists, and consistent with the results from surveys of household inflation expectations. The persistence of the difference in inflation rates can be seen in Figure 1. Among the many advantages of using online prices, the ability to collect them remotely proved particularly useful in 2011, when Argentina’s government started to impose fines and to pressure local economists to stop collecting data independently. The manipulation of official data lasted almost nine years, ending only December 2015 when a new government was elected (although there is still no official consumer price index in the country).2 

Figure 1 Online and official annual inflation rate in Argentina

Note: The online price index is computed by PriceStats. The CPI is computed by the National Statistics Institute of Argentina - INDEC (all-items, non-seasonally adjusted).

Argentina’s statistical debacle had a positive side effect: it showed the potential that online prices had for inflation measurement applications. With this idea in mind, we created the Billion Prices Project at MIT in 2008 to extend our work to other countries, including the US. By 2010, we were collecting 5 million prices every day from over 300 retailers in 50 countries. Half a million prices were collected every day in the United States alone. By comparison, the US Bureau of Labor Statistics collects approximately 80,000 prices on a monthly or bi-monthly basis.

Our approach is different from many other attempts to use Big Data in economics because we focus on measurement instead of prediction. But more data does not necessarily mean better information. For us, the web is simply a new data collection technique to get to the information we need. But we are still careful to apply the basic principles of traditional data-collection designs, such as the need to have a representative sample. That is why we focus, for example, on large multi-channel retailers that sell both online and offline, such as Walmart, instead of using online-only retailers that still have a relatively small share of retail transactions. It is also the reason why we mostly collect prices from categories of goods that are included in the official Consumer Price Index baskets, for which consumer expenditure weights are available.

In Cavallo and Rigobon (2016), we provide several examples of how online prices can be effectively used as an alternative source of price information to construct price indexes that mimic the behaviour of official Consumer Prices Indexes. Figure 2 shows the case of a daily online price index for the US, which has closely tracked the official CPI for several years. 

Figure 2 Online and official price indexes in the US

Note: The online price index is computed by PriceStats.  The CPI is computed by the US Bureau of Labor Statistics - INDEC (all-items, non-seasonally adjusted).

In the paper, we also emphasise two characteristics of online price indexes in greater detail. First, online indexes have the ability to approximate hedonically adjusted price indexes in sectors with a large number of goods that come and go with overlapping lifecycles (for instance, electronics). Second, online indexes appear able to anticipate movements in the official Consumer Price Index in many countries. This anticipation extends beyond the publication lags, which suggests that online prices often adjust sooner to aggregate shocks.

Online data can also change empirical results in macro and international research. In particular, online datasets constructed to fit specific research needs can help mitigate biases and other empirical challenges that are so frequent in traditional datasets collected for other purposes, such as sample selection, endogeneity, omitted variables, and error-in-variables.

But a natural concern is to wonder whether online prices are different from those collected in physical stores, where the vast majority of transactions still take place.3 To answer this question, we recently conducted a large-scale comparison of online and offline prices of more than 50 retailers in 10 countries. Cavallo (2016) shows that price levels are identical in over 70% of cases, and while price changes are not synchronised, their frequency and average size is very similar across samples. However, there is also significant heterogeneity among retailers, sectors, and countries, which cautions against drawing strong conclusions from papers that rely on data from particular countries, sectors, or retailers.

One area in macroeconomics where online price data can significantly alter previous results is in the price-stickiness literature, as shown in Cavallo (2015). For example, online prices exhibit a very different distribution of the size of price change (a key statistic in this literature) compared to the one can be obtained from micro CPI or scanner data. The main reason is that online prices do not have time averages, common in scanner data, or imputed prices, common in official micro data, which create a large number of small spurious price changes, as shown in Figure 3.

Figure 3 The distribution of the size of price changes with different data sources

Online prices can also be used to study international relative prices and their relation to exchange rates. For example, tests of the “law of one price” (that there should not be large or persistent cross-country differences in the prices of identical goods when translated into a common currency) using online data give us a more nuanced picture of when and where this law works well. The existing consensus in the literature is that there are large and persistent deviations from the law of one price, with little pass-through from nominal exchange rates to relative prices, and vice-versa, causing persistent shocks to real-exchange rates that take years to dissipate. While deviations can also be large with online data, in Cavallo et al. (2014, 2015) we showed that the law of one price holds well across countries that use the same currency. Furthermore, when goods are more closely matched across countries, there is sticking evidence that relative prices and nominal exchange rates co-move more closely than previously thought, implying higher pass-through rates and less persistent real-exchange rate dynamics.

Concluding remarks: Big Data is here to stay

The Billion Prices Project is just one example of the use of ‘Big Data’ sources in economics. Other examples include various types of web scraped data, such as labour and real estate information, data from mobile phones, satellite images as in Henderson et al. (2012), and many other sensors that are increasingly part of our daily lives.  

For us, the greatest appeal of Big Data technologies is that they are finally providing economists (particularly in macro and international) with opportunities to stop treating the data as “given” and get personally involved with the data collection process.

This is something that was advocated for many years by prominent economists such as Griliches (1985, 1994).

While many governments have been active in searching for alternative data sources (Bean 2016[RB1] ), their use will require not only the will of policymakers or statisticians working on the field, but also the involvement of more economists and academics who can help identify the best ways to collect, treat, and use these new sources of information.


Abraham, K, S Davis, and J Haltiwanger (2015), “Don’t Starve the BLS.” Roll Call. October 2015

Bean, C (2016), “Time to rethink the way we measure economic activity”, VoxEU.org

Cavallo, A, and R Rigobon. (2016), “The Billion Prices Project: Using Online Data for Measurement and Research.” Journal of Economic Perspectives, forthcoming

Cavallo, A, (2013), “Online and Official Price Indexes: Measuring Argentina’s Inflation.” Journal of Monetary Economics, 152–65

Cavallo, A, (2015), “Scraped Data and Sticky Prices.” NBER Working Paper 21490

Cavallo, A, (2016) “Are Online and Offine Prices Similar? Evidence from Large Multi-Channel Retailers.” NBER Working Paper 22142

Cavallo, A, B Neiman and R Rigobon (2014), “Currency Unions, Product Introductions, and the Real Exchange Rate.” Quarterly Journal of Economics 129:2

Cavallo, A, B Neiman, and R Rigobon (2015), “The Price Impact of Joining a Currency Union: Evidence from Latvia.” IMF Economic Review 63 (2): 281–97

Cavallo, A, G Cruces and R Perez-Truglia (2016), “Learning from Potentially-Biased Statistics: Household Inflation Perceptions and Expectations in Argentina.” Brookings Papers on Economic Activity, forthcoming

Einav, L, and J Levin (2014), “Economics in the Age of Big Data” Science 346 (6210): 1243089.

Euromonitor (2014), “Internet vs Store-Based Shopping: The Global Move Towards Omnichannel Retailing.” Euromonitor International

Griliches, Z (1985), “Data and Econometricians–the Uneasy Alliance.” The American Economic Review, 196–200

Griliches, Z (1994), “Productivity, R&D, and the Data Constraint.” The American Economic Review, 84 (1): 1–23

Groves, R M (2011), “Three Eras of Survey Research.” Public Opinion Quarterly, 75 (5): 861–71

Henderson, J V, A Storeygard and D N Weil (2012), “Measuring Economic Growth from Outer Space.” American Economic Review, 102 (2): 994–1028

4,199 Reads