Discussion paper

DP3094 The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools

This Paper describes the database on US patents that we have developed over the past decade, with the goal of making it widely accessible for research. We present main trends in US patenting over the last 30 years, including a variety of original measures constructed with citation data, such as backward and forward citation lags, indices of ?originality? and ?generality?, self-citations, etc. Many of these measures exhibit interesting differences across the six main technological categories that we have developed (comprising Computers and Communications, Drugs and Medical, Electrical and Electronics, Chemical, Mechanical and Others), differences that call for further research. To stimulate such research, the entire database ? about 3 million patents and 16 million citations ? is now available on the NBER website. We discuss key issues that arise in the use of patent citations data, and suggest ways of addressing them. In particular, significant changes over time in the rate of patenting and in the number of citations made, as well as the inevitable truncation of the data, make it very hard to use the raw number of citations received by different patents directly in a meaningful way. To remedy this problem we suggest two alternative approaches: the fixed-effects approach involves scaling citations by the average citation count for a group of patents to which the patent of interest belongs; the quasi-structural approach attempts to distinguish the multiple effects on citation rates via econometric estimation.


Hall, B, M Trajtenberg and A Jaffe (2001), ‘DP3094 The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools‘, CEPR Discussion Paper No. 3094. CEPR Press, Paris & London. https://cepr.org/publications/dp3094