DP18125 Text Algorithms in Economics

This paper provides an overview of the methods used for algorithmic text analysis in economics, with a focus on three key contributions. First, the paper introduces methods for representing documents as high-dimensional count vectors over vocabulary terms, for representing words as vectors, and for representing word sequences as embedding vectors. Second, the paper defines four core empirical tasks that encompass most text-as-data research in economics, and enumerates the various approaches that have been taken so far for these tasks. Finally, the paper flags limitations in the current literature, with a focus on the challenge of validating algorithmic output.


Ash, E and S Hansen (2023), ‘DP18125 Text Algorithms in Economics‘, CEPR Discussion Paper No. 18125. CEPR Press, Paris & London. https://cepr.org/publications/dp18125