Proposal of a model for credit risk prediction based on deep learning methods and SMOTE techniques for imbalanced dataset
Implementation of credit scoring models is a demanding task and crucial for risk management. Wrong decisions can significantly affect revenue, increase costs, and can lead to bankruptcy. Together with the improvement of machine learning algorithms over time, credit models based on novel algorithms have also improved and evolved. In this work, novel deep neural architectures, Stacked LSTM, and Stacked BiLSTM combined with SMOTE oversampling technique for the imbalanced dataset were developed and analyzed. The reason for the lack of publications that utilize Stacked LSTM-based models in credit scoring lies exactly in the fact that the deep learning algorithm is tailored to predict the next value of the time series, and credit scoring is a classification problem. The challenge and novelty of this approach involved the necessary adaptation of the credit scoring dataset to suit the time sequence nature of LSTM-based models. This was particularly crucial as, in practical credit scoring datasets, instances are not correlated nor time dependent. Moreover, the application of SMOTE to the newly constructed three-dimensional array served as an additional refinement step. The results show that techniques and novel approaches used in this study improved the performance of credit score prediction.