DP10829 How to construct nationally representative firm level data from the ORBIS global database
|Author(s):||Sebnem Kalemli-Ozcan, Bent E Sørensen, Carolina Villegas-Sanchez, Vadym Volosovych, Sevcan Yesiltas|
|Publication Date:||September 2015|
|Keyword(s):||Europe, FDI, finance, firms, productivity|
|JEL(s):||D0, E0, F0, O0|
|Programme Areas:||Financial Economics, International Trade and Regional Economics, International Macroeconomics and Finance, Macroeconomics and Growth|
|Link to this Page:||cepr.org/active/publications/discussion_papers/dp.php?dpno=10829|
Firm-level data on productivity, financial activity and firms' international linkages have become essential for research in the fields of macro, international finance and growth. This paper describes the development of a firm-level global panel dataset for public and private companies based on the administrative microdataset ORBIS, provided commercially by Bureau van Dijk Electronic Publishing (BvD). The ORBIS database provides data on firms' financial and productive activities from balance sheets and income statements together with detailed information on firms' domestic and international ownership structure for over 130 million companies across the world. Researchers need to overcome several challenges before making the database usable for research. First, the database is not designed for large downloads that is essential for an econometric analysis. Second, there are several inherent biases in the database that affect the download process and lead to missing information. Third, the raw data may contain a number of irregularities which, if not dealt with, will result in data loss during a standard cleaning procedure. In combination, these issues cause minimal coverage of small firms, extensive missing data, and poor national representation. We give detailed instructions on the data gathering process from ORBIS in terms of downloading methodology and cleaning procedure so that a researcher can construct a database that is nationally representative with minimal missing information. We provide examples from several European countries to present the process and discuss the resulting dataset in detail.