DP14259 How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm

Author(s): Leonardo Gambacorta, Yiping Huang, Han Qiu, Jingyi Wang
Publication Date: December 2019
Keyword(s): credit risk, credit scoring, Fintech, Machine Learning, non-traditional information
JEL(s): G17, G18, G23, G32
Programme Areas: Financial Economics
Link to this Page: cepr.org/active/publications/discussion_papers/dp.php?dpno=14259

This paper compares the predictive power of credit scoring models based on machine learning techniques with that of traditional loss and default models. Using proprietary transaction-level data from a leading fintech company in China for the period between May and September 2017, we test the performance of different models to predict losses and defaults both in normal times and when the economy is subject to a shock. In particular, we analyse the case of an (exogenous) change in regulation policy on shadow banking in China that caused lending to decline and credit conditions to deteriorate. We find that the model based on machine learning and non-traditional data is better able to predict losses and defaults than traditional models in the presence of a negative shock to the aggregate credit supply. One possible reason for this is that machine learning can better mine the non-linear relationship between variables in a period of stress. Finally, the comparative advantage of the model that uses the fintech credit scoring technique based on machine learning and big data tends to decline for borrowers with a longer credit history.