A scientist analyzing complex data sets with the assistance of AI algorithms.
VoxEU Column Labour Markets

The role of data skills in the modern labour market

Data collection, processing and analysis skills are in high demand in today’s labour markets and are increasingly found in non-digital industries. This column applies natural language processing to online job advertisements to better understand digital skills in the UK, Canada and the US. Results show that data analytics skills contribute most to the aggregate data-related labour demand in all three countries. The information and communication and finance industries are the most data-intensive in all three countries, while larger differences in labour demand persist across countries for agriculture, mining and quarrying, and electricity, gas, steam and air conditioning supply.

The increasing use of data and advanced analytics across countries has driven demand for new types of jobs (Acemoglu and Restrepo 2017). Individuals who master those skills are in high demand in the labour market and typically earn a wage premium (Sostero and Tolan 2022). With rapid technology changes, it is less clear which jobs are becoming more digital, and which occupations and industries hire people with these skills. Traditional labour market statistics provide only limited possibilities to explore these questions, as they often lack detailed information on skill requirements.

In recent years, online job advertisement data have gained popularity as an alternative data source relating to labour markets, as they provide timely and granular information. In many cases, they have advantages over existing occupation classifications and are often complementary to official employment or vacancy statistics (Atalay et al. 2022). Recent studies using online job advertisements showed that digital skillsets have evolved over the past decade and can be found at the core of some traditionally non-digital domains (Sostero and Tolan 2022). Online job advertisements have also contributed to an improved understanding of the impact of crises on labour markets, such as the COVID-19 recession in the US and Canada (Soh et al. 2022, Bellatin and Galassi 2022) and the war in Ukraine (Pham et al. 2023).

In a recent paper (Schmidt et al. 2023), we aim to estimate the data intensity of occupations and sectors (i.e. the share of data-related jobs involved in the production of data). First, we put forward a novel methodology applying natural language processing (NLP) to online job advertisements from Lightcast to generate occupation- and industry-level estimates of data intensity. Second, the methodology can be used to advance cross-country comparable results on measuring the value of data assets in the data economy and the evolution of digital skills in the labour market. Third, the NLP algorithm is flexible and can be applied to concepts that are difficult to capture in traditional labour market statistics, such as green and AI-related jobs. The algorithm can also be adapted to over 66 languages, meaning the scope of the analysis could be broadened.

The NLP techniques enable the extraction of relevant skills and tasks from the raw text of the online job advertisements. Due to the high granularity of the data, the algorithm can classify the extracted information into data entry, database and data analytics related activities. Finally, the methodology computes occupation and industry-level aggregates of the share of jobs involved in data production activities. The methodology relies on an open-source NLP pipeline provided by the 'spacy' python library (spaCy 2022), which allows for an efficient treatment of large amounts of text data, and a flexible deployment of NLP models.

Results illustrate that the percentage of data analytics skills is higher than those of other data-related skills in all three countries. At the sectoral level, the emerging picture is more heterogeneous across countries. While the information and communication industry as well as the finance industry are highly data-intensive, larger differences in data intensity exist for agriculture, mining and quarrying, and electricity, gas, steam and air conditioning supply. Differences in labour demand mostly explain these variations, with low data-intensity professions contributing most to economy-wide data intensity. Preliminary estimates show that the results remain stable for pre-COVID years, too.

The most data-intensive occupations are linked to data analytics skills

In the UK, the occupation with the highest level of data intensity is data scientist, with a rate of 92.3% in 2020. Following closely are data engineer at 69% and data entry clerk at 68% (Figure 1). Most of these occupations revolve primarily around data analytics skills, with a few exceptions. For instance, data entry clerks and database administrators exhibit data intensities largely linked to data-entry and database-related capabilities. In general, the highly data-intensive occupations tend to be specialised, technology-oriented professions, with occupations such as biostatistician and clinical data manager showing connections to fields such as biology and medicine. Similar trends are observed in Canada and the US.

Figure 1 Data intensity at occupation level in the UK is linked to data analytics skills

Per cent of labour demand, 2020

Figure 1 Data intensity at occupation level in the UK is linked to data analytics skills

Note: The Lightcast data provide occupation classifications. Data intensity takes values between 0 and 100.
Source: Authors’ calculations based on Lightcast data.

Differences in data intensity across the countries are concentrated in a handful of sectors

Financial and insurance, and information and communication activities are the two most data-intensive industries in all three countries, with shares close to or above 10% in 2020 (Figure 2). Shares are similar in most sectors with low data intensity, in particular accommodation and food service activities, construction, and transportation and storage. This is consistent with Calvino et al. (2018), who use a different methodology.

Figure 2 Data intensity in the US, Canada, and the US by industry

Per cent of labour demand, 2020

Figure 2 Data intensity in the US, Canada, and the US by industry

Note: Sectors are based on the ISIC version 4 classification. Activities of extraterrestrial organisations and activities of households are excluded. Data intensity takes values between 0 and 100.
Source: Authors’ calculations based on Lightcast data.

However, these numbers can mask some structural differences across countries. For instance, in the finance and insurance sector, the UK’s share is almost on par with those of the US and Canada, with data mining analysts making the largest contribution to the data intensity of the sector in all three countries. However, the high demand for data mining analysts in the sector more than compensates for the lower average data intensity of the profession in the UK (30% as compared to 70% in 2020 in the US and Canada). Overall, the contribution of the profession to the data intensity of the sector is about twice as high in the UK (0.8 percentage point), compared to Canada (0.3 percentage point) and the US (0.4 percentage point).

Differences in the data intensity of individual industries are noticeable in all three countries in professional, scientific and technical activities, with data intensity much higher in the US, and to a lesser extent Canada, than in the UK. Similarly, the data intensity in agriculture and forestry and electricity, gas, steam and air conditioning supply differs across countries, with labour demand being more data-intensive in the UK than in Canada or the US. In a few sectors, such as mining and quarrying, arts, entertainment and recreation, and public administration and defence, the US and Canada exhibit similar data intensity rates, which are much lower than in the UK.

Professions with a low level of data intensity contribute most to the aggregate data intensity in the UK

At the economy-wide level, the UK and Canada appear to be less data-intensive than the US (Figure 3). The overall share of data-intensive jobs in the UK was estimated to be 3.4% in 2020, weighting the data intensity at occupation level by the number of job advertisements posted. This compares to 3.9% in Canada and 4.6% in the US. 

Figure 3 Low data-intensity occupations contribute most to data intensity in the UK

Aggregate data intensity, per cent, 2020

Figure 3 Low data-intensity occupations contribute most to data intensity in the UK

Notes: Data intensity takes values between 0 and 100. Low data-intensive occupations: 0<10%, medium data-intensive occupations: 10-50% and high data-intensive occupations > 50%.
Source: Authors’ calculations based on Lightcast data.

In the UK, the low data-intensity occupations are those that count most for the overall data economy – more than medium and highly data-intensity jobs (Figure 4, Panel A). For instance, office assistants, which represent 6% of the labour demand in the low data-intensity occupation class, contribute as much to the overall data intensity of the economy as data scientists.

Figure 4 Data intensity across occupations

Data intensity of an occupation in per cent, contribution to aggregate data intensity in percentage points, 2020

A) United Kingdom

Figure 4a Data intensity across occupations: UK

B) Canada

Figure 4b Data intensity across occupations: Canada

C) United States

Figure 4c Data intensity across occupations: US

Notes: Contributions are computed as the data intensity of occupation classes weighted by their share of employees. Data intensity takes values between 0 and 100. Low data-intensive occupations: 0<10%, medium data-intensive occupations: 10-50% and high data-intensive occupations: > 50%. Contribution to aggregate data intensity is displayed in percentage points.
Source: Authors’ calculation based on Lightcast data.

In Canada and the US, medium data-intensity occupation classes contribute the largest proportion to the overall data intensity, and the level of data intensity across professions is generally higher (Figure 4, Panel B and C). A data scientist has a data intensity score of 94.5% in Canada and 95.1% in the US, compared to 92.3% in the UK. Among the high data-intensity professions in Canada, data entry clerks, database administrators, and data mining analysts contribute most to aggregate data intensity, next to professions such as the business management analyst and software developer at the medium level. The US has the widest range of professions contributing at the medium and high data intensity level, amongst them network system analysts and computer system engineers. 


Acemoglu, D and P Restrepo (2017), “Robots and Jobs: Evidence from US Labor Markets”, NBER Working Paper No. 23285.

Atalay, E, S Sotelo and D Tannenbaum (2022), “The geography of job tasks”, VoxEU.org, 12 November.  

Bellatin, A and G Galassi (2022), “What COVID-19 May Leave Behind: Technology-Related Job Postings in Canada”, Bank of Canada Staff Working Paper 2022/17.

Calvino, F, C Criscuolo, L Marcolin and M Squicciarini (2018), “A taxonomy of digital intensive sectors”, OECD Science, Technology and Industry Working Paper 2018/14.

Pham, T, O Talavera and Z Wu (2023), “Labour markets during wartime: Evidence from online job advertisements”, VoxEU.org, 22 July.

Schmidt, J, G Pilgrim and A Mourougane (2023), “What is the role of data in jobs in the United Kingdom, Canada, and the United States? A natural language processing approach”, OECD Statistics Working Paper 2023/05.

Soh, J, M Oikonomou, C Pizzinelli, I Shibata and M Mendes Tavares (2022), “Did the COVID-19 Recession Increase the Demand for Digital Occupations in the United States? Evidence from Employment and Vacancies Data”, IMF Working Paper 2022/195.

Sostero, M and S Tolan (2022), “Digital skills for all? From computer literacy to AI skills in online job advertisements”, JRC Working Papers Series on Labour Education and Technology.

spaCy (2022), "Language Processing Pipelines".