VoxEU Column Frontiers of economic research International trade

On the utility of predicting the next exporters with machine learning

14 Aug 2023

Statistical learning models are increasingly used for decision-making in private and public organisations. This column discusses how they could be usefully applied to predict export success with firms’ financial accounts. After training algorithms on the experience of exporters and non-exporters, one can obtain out-of-sample probabilistic scores that catch the distance of a company from exporting status. The authors point to at least two cases when predicting the next exporters might be useful: trade finance and industrial policy design.

Armando Rungi

Francesca Miccoci

PhD candidate IMT School for Advanced Studies - Lucca

As is often the case, science fiction anticipates the future. In the 1950s, a novelist imagined a future world in which crimes could be foreseen by a combination of machines and psychically gifted individuals. Minority Report eventually became an action movie in the 2000s, and flocks of spectators were still captivated by the idea that human beings and machines could cooperate in predicting the future. Only about twenty years later, we know that policymakers are not psychically gifted, but they finally have machines that may help select policy targets.¹

Machine learning² models have proven powerful tools for decision-makers in both public and private organisations. As in the old science fiction, we do have an algorithm that profiles potential criminals in public courts.³ We also have an algorithm to spot accounting fraudsters (Usuki et al. 2020). However, most of the benefits from harnessing big data and machine learning have been picked by private companies, who seized the opportunity of predictive models for a wide array of operations, from inventories management to customer relationships and, in the case of the financial industry, from credit risk management to portfolio optimisation.

Yet, we argue, there are still unexplored and unexploited benefits to come for evidence-based public policy design. For example, De Blasio et al. (2018) show how a proper machine-learning framework can target subjects that could gain more from a tax rebate. Most recently, new methods have emerged at the intersection of statistical learning and causal inference that improve ex-post policy evaluation methods (Athey and Imbens 2019), whose aim is to exploit observable information to predict policy counterfactuals with a higher accuracy.

In a recent paper (Micocci and Rungi 2023), we focus on ex-ante predictive analyses for international trade. We exploit firm-level financial information to predict whether and how close firms are to becoming successful exporters. Our general intuition relies on the long-established evidence that exporters and non-exporters are heterogeneous and, thus, statistically different (Mayer and Ottaviano 2008, Bernard et al. 2012). While trade offers general welfare gains, only a few firms may be able to sustain the costs of handling different regulatory environments, meeting different consumer tastes, and establishing marketing and logistics channels.

Therefore, our basic intuition is that machine learning algorithms could extract non-trivial information that differentiates exporters from non-exporters. If we do so, we can also measure how far a company is from becoming an exporter based on what we know about who previously succeeded in international markets. The first step is to train different algorithms on a representative sample that includes exporters and non-exporters. We select an extensive battery of 52 predictors encompassing different aspects of a firm’s economic activity (financial constraints, total factor productivity, size, age, innovation, industrial affiliation, geography, etc.) with reference to specific literature. Once we assess which algorithm predicts with a better accuracy,⁴ then we can focus on non-exporters and their ability to propose on international markets.

We can better grasp the idea by looking at Figure 1, plotting actual predictions obtained for non-exporters in France in 2010-2018.

Figure 1 Predictions for non-exporters (scores) after a Bayesian additive regression tree with missingness in attributes

Source: Micocci and Rungi (2023)

After running our best algorithm, we can obtain a segment of predictions that goes from zero to one, where one stands for the certainty of an exporting status. By construction, the shortest the distance from one, the highest the odds that a firm can successfully become an exporter. From a broader perspective, if we look at companies located on the right tail of the predictions in Figure 1, we have information on who the next exporters can be.

But how useful is predicting the next exporters? We can think of at least two cases in which one may want to know the export potential of a company. The first is in the case of trade finance, when trade promotion agencies or private financial intermediaries decide how to allocate resources to the internationalisation of enterprises. Is an investment project worth financing? What credit risk can one attach to the applicants’ projects? Is the company able to sustain the competition after reaching international markets? A predictive score trained on the experience of previous successes and failures can help design credit policies for organisations that aim to promote internationalisation. For example, one may conclude that applicants are already compliant with all the characteristics of a successful exporter, and, in that case, it would be better to allocate public resources where they are most needed. On the other hand, if a company is too far from an exporting status, one could question the utility of investing in that project and focus on how to reduce the distance from successful exporters. Note how credit scores with predictive models are nowadays common in the financial industry,⁵ when credit risk is measured by the probability that a debtor goes bankrupt. In our view, predicting success in foreign markets is apparently more challenging than predicting a firm’s failure. Yet, the prediction accuracy we obtain with the baseline predictive model is relatively high,⁶ as we make a mistake only once in every ten observations.

A second more sensitive case for which it is useful to know the next exporters are industrial policies for growth and diversification. After the recent global shocks, promoting diversification and developing new industrial capabilities have become a priority (IMF 2022) to tackle broader market failures, unfavourable business environments, and emerging sources of geopolitical risk. Over the last decade, the EU has elaborated a few general principles for the EU industrial policy (European Union 2023), which integrates into a number of EU-related policies, including external trade and the internal market. Against this background, it is useful to know where one can spot fringes of potential exporters, which could enhance the trade potential of a country, a region, or an industry. More technically, exporting scores detect ex-ante how trade extensive margins can change to catch how competitive regions or industries are becoming.

Let’s consider, for example, the map reproduced in Figure 2, where we can detect the geographic concentration of potential exporters in France whose odds of exporting are higher than the median company. Notably, we can briefly comment that there is more room for an increase in trade extensive margins in regions that are coloured green. Most notably, Paris and Île-de-France are not coloured in green, although we know that the capital region hosts many exporters. Instead, a grey area indicates that there is no significant concentration of exporters in that region. If we look better, a greater fringe of potential exporters sits in other areas of the country, although in places where the density of industrial activity may be lower. Eventually, fewer exporters are expected from the South of the country and overseas territories.

Figure 2 Where are the next exporters?

Note: Location quotients by NUTS 2-digit regions in France
Source: Micocci and Rungi (2023)

More clever analyses could investigate which firms in which industries are better positioned to become the next exporters, and one could also comment on the power of different indicators included in the battery of predictors. However, at this stage, we would not know how and why firms are in a condition to export. The latter is a relevant limitation of any ex-ante predictive model, which can only inform policymakers with a picture of the situation. Predictive models do not substitute structural economic models, policy evaluation methods or impact analyses. As in the case of modern macroeconometrics, forecasting gross outcome or inflation can be an essential exercise for a central banker, who however still needs to resort to other tools to design an optimal policy.

To conclude, modern data science has expanded the possibilities to make better, faster, and more efficient decisions based on empirical evidence. International trade scholars and policymakers can resort to machine learning predictive models and, thus, have more strings in their bow to design evidence-based policies. However, as the old-fashioned science fiction novel would remind us, we always need to take care of the Minority Report(s), i.e. when there is a minor, albeit non-negligible, chance that machine predictions may be wrong, and foreseen events can change by human intervention.

References

Ansgar, W, P Goldsmith-Pinkham, T Ramadorai and A Fuster (2019), “The effect of machine learning on credit markets”, VoxEU.org, 11 January.

Athey, S and G W Imbens (2019), “Machine Learning Methods That Economists Should Know About”, Annual Review of Economics 11(1): 685-725.

Bank of England (2022), “Machine learning in UK financial services”.

Bargagli-Stoffi, F J, F Incerti, M Riccaboni and A Rungi (2023), “Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values”.

Bernard A B, J B Jensen, S J Redding and P K Schott (2012), “The Empirics of Firm Heterogeneity and International Trade”, Annual Review of Economics 2012 4(1): 283-313.

Breinlich, H, V Corradi, N Rocha, M Ruta, J M C Santos Silva and T Zylkin (2022), "Machine Learning in International Trade Research - Evaluating the Impact of Trade Agreements", CEPR Discussion Paper 17325.

de Blasio, G, E Ciani, A D'Ignazio and M Andini (2018), “Effective policy targeting with machine learning”, VoxEU.org, 21 November.

Dick, P K (1991), “The Minority Report”, Volume 4 of The Collected Stories of Philip K. Dick. Secaucus, Citadel Twilight.

European Union (2023), “General Principles of EU Industrial Policy”, Fact Sheets on the European Union, European Parliament.

Hastie, T, R Tishirani and J Friedman (2017), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Online 12th revision, Springer Series in Statistics.

IMF (2022), “Industrial Policy for Growth and Diversification: A Conceptual Framework”, IMF Departmental Paper 2022(017).

Kapelner, A and J Bleich (2015), "Prediction with missing data via Bayesian Additive Regression Trees", Canadian Journal of Statistics 43(2).

Kleinberg, J, H Lakkaraju, J Leskovec, J Ludwig and S Mullainathan (2018), “Human Decisions and Machine Predictions”, The Quarterly Journal of Economics 133(1): 237–293.

Mayer, T and G I P Ottaviano (2008), “The Happy Few: The Internationalisation of European Firms”, Intereconomics 43: 135–148

Micocci, F and A Rungi (2023), “Predicting Exporters with Machine Learning”, forthcoming in World Trade Review.

Mullainathan, S and J Spiess (2017), "Machine Learning: An Applied Econometric Approach", Journal of Economic Perspectives 31(2): 87-106.

Usuki, T, M Suga, D Miyakawa, K Shiraki and S Kondo (2020), “Machine learning against accounting fraud”, VoxEU.org, 13 May.

Footnotes

See Mullainathan and Spiess (2017) or Athey and Imbens (2019) for a discussion on the utility of integrating machine learning models in economics and econometrics.
In this context, we could better use the term statistical learning. Briefly, statistical learning deals with the statistical inference problem of finding a predictive function based on data, while machine learning emphasizes the computational aspects needed for solving that peculiar statistical inference problem. For a seminal reference on statistical learning theory, see Hastie et al. (2017).
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is an algorithm used by US courts to assess the likelihood of becoming a recidivist. See also Kleinberg et al. (2018).
When we discuss prediction accuracy, we always refer to the ability of the model to classify correctly, separating the exporters from the non-exporters and only looking at the battery of predictors. Measures of prediction accuracy rely on counting how many false positives and false negatives one obtains from the exercise.
Among others, see the report by the Bank of England (2019) on the diffusion of machine learning tools in the financial industry in the United Kingdom. Fon an application, see Bargagli-Stoffi et al. (2023). See also Ansgar et al. (2019) for a discussion on reinforcement learning problems.
A models’ horse race in Micocci and Rungi (2023) shows that Bayesian Additive Regression Trees (BART) with missing values as predictors (Kapelner and Bleich, 2015) return a higher prediction accuracy when compared with other econometric or machine learning tools. Different standard measures of prediction accuracy are available. Among others, the so-called Area Under the Curve (AUC) tells us that prediction accuracy is up to 0.90.