Discussion Paper Details

Please find the details for DP15852 in an easy to copy and paste format below:

Full Details   |   Bibliographic Reference

Full Details

Title: A Cross-verified Database of Notable People, 3500BC-2018AD

Author(s): Palaash Bhargava, Jean-Benoît Eyméoud, Olivier Gergaud, Morgane Laouenan, Guillaume Plique and Etienne Wasmer

Publication Date: March 2021

Keyword(s): Creative Class, economic history, Notable individuals and Urban Economics

Programme Area(s): Economic History, International Trade and Regional Economics and Labour Economics

Abstract: We add to the literature on notable individuals (famous, prominent, distinguished) in collecting first a massive amount of data from various editions of Wikipedia and Wikidata along with deduplication techniques; and then using these partially overlapping sources to cross-verify each retrieved information. This strategy results in a cross-verified database of 2.2 million individuals, including a third who are not present in the English edition of Wikipedia. An extension to 4.7 million entries is currently not recommended given the inaccuracy of the information and discrepancies between Wikidata and other sources. A non-negligible fraction of newly-added individuals were collected from non-English editions of Wikipedia. We adopt a social science approach: data collection is driven by specific social questions on gender, economic and cul- tural development and quantitative exploration of cultural trends, that we document in this paper. A sample of 100,000 individuals is available here, together with the most recent version of this paper.

For full details and related downloads, please visit:

Bibliographic Reference

Bhargava, P, Eyméoud, J, Gergaud, O, Laouenan, M, Plique, G and Wasmer, E. 2021. 'A Cross-verified Database of Notable People, 3500BC-2018AD'. London, Centre for Economic Policy Research.