Discussion paper

DP5833 The 'Names Game': Harnessing Inventors Patent Data for Economic Research

The goal of this paper is to lay out a methodology and corresponding computer algorithms, that allow us to extract the detailed data on inventors contained in patents, and harness it for economic research. Patent data has long been used in empirical research in economics, and yet the information on the identity (i.e. the names and location) of the patents? inventors has seldom been deployed in a large scale, primarily because of the ?who is who? problem: the name of a given inventor may be spelled differently across her/his patents, and the exact same name may correspond to different inventors (i.e. the ?John Smith? problem). Given that there are over 2 million patents with 2 inventors per patent on average, the ?who is who? problem applies to over 4 million ?records?, which is obviously too large to tackle manually. We have thus developed an elaborate methodology and computerized procedure to address this problem in a comprehensive way. The end result is a list of 1.6 million unique inventors from all over the world, with detailed data on their patenting histories, their employers, co-inventors, etc. Forty percent of them have more than one patent, and 70,000 have more than 10 patents. We can trace those multiple inventors across time and space, and thus study the causes and consequences of their mobility across countries, regions, and employers. Given the increasing availability of large computerized data sets on individuals, there may be plenty of opportunities to deploy this methodology to other areas of economic research as well.


Trajtenberg, M, G Shiff and R Melamed (2006), ‘DP5833 The 'Names Game': Harnessing Inventors Patent Data for Economic Research‘, CEPR Discussion Paper No. 5833. CEPR Press, Paris & London. https://cepr.org/publications/dp5833