VoxEU Column Competition Policy Productivity and Innovation

The economics of social data

The rise of large digital platforms — from Facebook, Google, and Amazon in the US to JD, Tencent, and Alibaba in China — has led to the unprecedented collection and commercial use of individual data. This column argues that a central, underappreciated feature of those data is their social aspect: data captured from an individual user describe not only that individual, but other users with similar characteristics or behaviours. The policy implications of this insight include the need for privacy regulations focused less on personalised prices, and more on group-based price discrimination.

Big tech companies such as Google, Amazon, and Facebook have an unprecedented collection of individual data. Their constantly expanding user base has allowed these digital platforms to acquire massive amounts of data about consumer preferences, locations, friends, political views, and almost every facet of their lives. This data is now beginning to receive considerable attention from consumers, media, and authorities alike (see HM Treasury 2019, Morton 2019, Crémer et al. 2019). Germany’s top court ruled that Facebook violated competition laws by combining data collected about users across different Facebook platforms, such as Instagram and WhatsApp, as well as third-party apps. Google announced that a data auto-delete feature was added in a user’s default setting shortly after rival Apple released new privacy features to hide users’ data from third-party trackers.

The services on these digital platforms rely on individual-level data to provide refined search results, personalised product recommendations, and targeted advertisements. Consequently, most data regulation on Internet platforms focuses on ensuring consumers’ control over their individual data. Regulators hope that ownership and control over one’s own data will result in appropriate compensation for the data one chooses to reveal. However, economists need to consider the social aspect of data collection. Because an individual user’s data is predictive of the behaviour of others, individual data is in practice social data. The social nature of data leads to an externality: an individual’s purchase on Amazon, for example, will convey information about the likelihood of purchasing a certain product among other consumers with similar purchase histories.

In a recent paper (Bergemann et al. 2020), we dispel the idea that empowering a consumer to take control of her data will prevent negative consequences, since she can demand compensation from an intermediary. A consumer’s choice to release data takes into account only her private benefits and costs, and not the externality generated by the data she provides. Moreover, data externalities in the form of diminishing marginal returns for individuals’ information reduce the intermediary costs of acquiring information. While consumers can experience positive externalities, such as real-time traffic information, very little curbs the platform from trading data for profit in ways that harm consumers. Therefore, data ownership is insufficient to bring about the efficient use of information, since arbitrarily small levels of compensation can induce a consumer to relinquish her personal data. What barriers or guarantees in terms of privacy does consumer control then provide?

Our paper analyses three critical aspects of the economics of social data. First, we consider how the collection of individual data changes the terms of trade among consumers, firms (advertisers), and large digital platforms. Second, we examine how the social dimension of data magnifies the value of individual data for platforms, and facilitates data acquisition. Third, we analyse how data intermediaries with market power (e.g. large Internet platforms that sell targeted advertising space) change the level of aggregation and the precision of the information that they provide in equilibrium about individual consumers.

To analyse these aspects, we propose a model of data intermediation in the presence of informational externalities. A data intermediary acquires signals from individual consumers regarding their preferences. This corresponds to the prevailing practice whereby Facebook and Amazon require an account to be established, and consumers and merchants to accept the “terms of use/service.” The intermediary resells the information in a product market in which firms and consumers can tailor their choices to the demand data. Information on demand allows the producer to engage in price discrimination at the individual or market level. The individual’s willingness to sell personal data then depends on the producer’s use of the acquired data.

This model generates insights on data flow, aggregation, and intermediation. First, a consumer’s decision to share data depends on her anticipation of how the intermediary will use the newly gained information. The intermediary implements data outflow policies that maximise the producer’s surplus given the acquired data.

Second, the collection of aggregate data does not prevent the occurrence of data externalities but does allow consumers to retain some form of privacy. This further reduces the intermediary’s cost of acquiring information, more than it reduces the value of the data for the producer downstream. If one consumer does not participate in the contract, the producer will optimally aggregate all available data to form the best predictor of the missing data point. Thus, aggregate data policies maximise intermediary profits by minimising loss of surplus. Finally, if the data externality is significant enough, the intermediary can acquire the consumers’ information at sufficiently low cost and generate positive profits even if information reduces total surplus.

Third, with a large number of consumers in the market, we find that data aggregated to a coarse level of nearly homogeneous consumers is optimal, although further aggregation is profitable for the intermediary when the number of consumers is small. The resulting group pricing (which can be interpreted as discriminatory based on observable characteristics such as location) has welfare consequences between those of complete privacy and those of price personalisation

In terms of policy implications, our results on the aggregation of consumer information suggest that privacy regulation must move away from concerns over personalised prices at the individual level. Most often, firms do not set prices in response to individual-level characteristics. Instead, segmentation of consumers occurs at the group level (e.g. as in the case of Uber) or at the temporal and spatial levels (e.g. Staples, Amazon). Thus, our analysis points to the significant welfare effects of group-based price discrimination and of uniform prices that react in real time to changes in market-level demand.


Athey, S, C Catalini and C Tucker (2017), “The Digital Privacy Paradox: Small Money, Small Costs, Small Talk”, Discussion paper, National Bureau of Economic Research.

Bergemann, D, A Bonatti and T Gan (2020), “The Economics of Social Data”, Discussion paper, Cowles Foundation for Research in Economics.

Crémer, J, Y-A de Montjoye and H Schweitzer (2019), Competition policy for the digital era, Report to the Directorate General for Competition, European Commission.

HM Treasury (2019), Unlocking digital competition, Report of the Digital Competition Expert Panel.

Scott Morton, F (Chair) (2019), Report of the Committee for the Study of Digital Platforms, Stigler Center for the Study of the Economy and the State.

4,409 Reads