Data have a special characteristic. With most goods -- think of a barrel of oil or an hour of an accountant’s time -- one person’s consumption of the good means there is less to go around. But this is not true of data. By its nature, data are nonrival, or infinitely usable. The same data can be used simultaneously by engineers in different businesses and scientists in various universities. Similarly, you reading this essay doesn’t lessen the amount of the article available to others. Any number of people can use data simultaneously without them becoming scarce.
Recent advances in machine learning have sharply raised the value of data. Google and Facebook use data to target their ads; Tesla and Waymo use data to train algorithms to drive vehicles autonomously; and Netflix and Spotify use data to recommend movies and songs. Agrawal et al. (2018) provide an overview of the economics of machine learning and its role as a “prediction machine.” Chung and Veldkamp (2019) provide an overview of the growing literature on the implications of data for the aggregate economy, highlighting links between macro and information economics.
In Jones and Tonetti (2020), we study the economic implications of the nonrival nature of data. We consider data that are created when consumers interact with a firm. In the US, firm ownership of data is typically the default. Instead, our analysis suggests that there may be efficiency gains from having consumers own their personal data. Consumer ownership would achieve two desirable goals. First, and obviously, consumers would respect their own privacy. But the second gain is perhaps more subtle: consumers would have incentives to sell their data to multiple organisations, taking advantage of infinite usability.
Why is broad use desirable? Consider the example of machine learning and medicine. Algorithms can be trained to read radiology scans from MRIs, CT machines, and ultrasound exams. The more scans and reports are used to train an algorithm, the more accurate its diagnosis. An algorithm trained only on patients from a single hospital will be inferior to an algorithm trained using scans from every patient in the country. When data is used broadly, the algorithms improve: we all get better medical care, safer cars, and more accurate speech recognition. Every hospital cannot use the world’s most talented radiologist simultaneously, but they can all potentially employ the best data.
When firms own data, they may not want to sell them to their competitors. For a company like Amazon, Facebook, or Google, their users’ data gives them an important competitive advantage. But if consumers own data, they could sell them to multiple firms, stimulating competition and innovation. There are many examples where firms acquire and hoard data. Tesla collects data as its cars are driven around and uses those data to develop self-driving car technology. If Tesla’s customers owned the data, they could sell it to Tesla, but also to other competing firms. Every firm’s AI would improve and barriers to entry to developing self-driving cars would fall. Safer cars would result in the short term, and the move to autonomous vehicles would accelerate. Consumers would, of course, pay a higher price for their Teslas, but this higher price could be offset by the potential fees consumers would collect by selling their data to multiple firms. Right now, we pay to use Gmail and Tesla in part through exclusive rights to the data we provide. But that exclusivity has a social cost in that it inhibits the competition that stimulates innovation and economic growth.
Developing the Analysis
In Jones and Tonetti (2020), we develop these insights using a standard model in which firms produce differentiated goods using labour and data as fundamental inputs. Nonrivalry shows up in two key ways. First, there are increasing returns to labour and ideas in production, reflecting the nonrivalry of ideas emphasised by Romer (1990): the same trained machine learning algorithm can be used in one hospital or a thousand hospitals. Second, the nonrivalry of data enters the production of ideas. The ‘data’ input potentially includes data from many different firms in the economy: it is feasible for the same data to be used simultaneously by one firm or a thousand firms to train their algorithms. The model features monopolistic competition and the free entry of new varieties, subject to an entry cost.
In this environment, we study several allocations. We highlight the key distinction between an allocation in which firms own data versus one in which consumers own data. We also study the optimal allocation in our environment as well as an allocation in which the government, out of a concern for privacy, outlaws the selling of data across firms.
The planner wants to use the nonrival factor of production broadly, but refrains from letting all firms access all data to respect consumers’ preference for privacy. In the decentralised equilibrium we study, there is a market for data. The owner of data can sell them to a data intermediary which aggregates all the data it purchases and sells them to other firms. As is often the case with nonrival goods, there does not exist an equilibrium with positive and finite sale of data if markets are perfectly competitive. We therefore model the data intermediary as a monopolist subject to free entry at zero cost (contestable markets) so that the data intermediary makes zero profits, minimising the impact of this intermediary in generating a deviation of the decentralised equilibrium from the planner’s allocation. We also include model assumptions such that original ownership of data matters, breaking the Coase theorem. Firms cannot commit to not using or selling the data they collect on consumers when they own the data and consumers cannot commit to selling their data to only one firm when they own the data. Finally, the key ingredient that generates misallocation is that firms internalise that their probability of exit (creative destruction) is higher when data about their variety are available to their competitors.
With this market structure, we compare allocations under different property rights. When firms own data, they choose to internally use all the data they collect from their consumers in order to produce. This is generically more than the planner or consumer would desire, since it places zero weight on consumer privacy considerations. Furthermore, firms sell some of their data to other firms, through the data intermediary. They choose to sell just enough so that the benefit of profits from data sales equals the cost from the increased chance of exit. When a firm exits, the knowledge to produce that variety does not disappear, it is just a new firm that replaces the old firm. Thus, the planner does not care about firm exit in the way that the firm shareholders care about exit. This causes a wedge between the optimal amount of data sales and the equilibrium amount of data firms choose to sell.
When consumers own data, they sell some of their Tesla data not only to Tesla, but also to Waymo. They sell less of their data than the planner would like, however, because of two classic inefficiencies. First, monopolistic competition generates distortions in firm size. Second, there is a classic appropriability problem, in which entrants that create new varieties only privately capture a fraction of the social value of the invention. This leads to a distortion in the number of varieties. These two distortions lead data to be, in a sense, mispriced, leading to less broad use of data than is optimal.
The key result in the paper is that, under a very wide range of parameter values, consumers owning data produces better outcomes than firms owning data. Furthermore, the cases in which firms owning data outperforms consumers owning data occur by chance of parameter configurations, not by something more deeply systematic to the analysis. Even in these cases the welfare gains are quite small, whereas the welfare gains from consumers owning data compared to firms owning data can be quite large. Indeed, in our numerical simulations consumers owning data is often very close to optimal.
Laws are currently being written to deal with data privacy rights. In the US, the California Consumer Privacy Act is now in force and there are federal proposals like COPRA, ACCESS, and the CDPA. Europe has the GDPR. India is debating laws that range from those similar to GDPR to those with a stronger role for national government intervention. China is no stranger to heavy government intervention in digital spaces. We hope our research is informative about some of the fundamental economics of nonrival data and how a market-based approach to regulation may improve social welfare.
Agrawal, A, J Gans and A Goldfarb (2018), Prediction Machines: The Simple Economics of Artificial Intelligence, Harvard Business Press.
Chung, C and L Veldkamp (2019), “Data and the Aggregate Economy”, In preparation for the Journal of Economic Literature.
Jones, C I and C Tonetti (2020), “Nonrivalry and the Economics of Data”, forthcoming in the American Economic Review.
Romer, P M (1990), "Endogenous Technological Change", Journal of Political Economy 98(5) Part 2: S71-S102.