Assessing the efficacy and risks of tree-based algorithms in personal default prediction: Practical insights from commercial banks

Download This Article

Nhat Minh Nguyen ORCID logo, Chi Diem Ha Le ORCID logo

https://doi.org/10.22495/rgcv15i3p1

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

This study evaluates the effectiveness of tree-based machine learning algorithms in predicting personal default risk, with a specific focus on commercial banks in Vietnam. By analyzing a dataset of 7,500 customers from various financial institutions collected between 2015 and 2023, we assess the performance of these algorithms using confusion matrix, accuracy, precision, sensitivity, specificity, F1 score, and area under the curve (AUC) as evaluation metrics. Our findings reveal that while traditional models like logistic regression (LR) serve as a baseline, advanced algorithms such as random forests (RF) and XGBoost (XGB) significantly enhance predictive accuracy and robustness, particularly in handling complex and imbalanced datasets (Chen & Guestrin, 2016). Among these, XGB stands out as the most effective model, demonstrating superior performance across all evaluation metrics (Li et al., 2020). Additionally, the feature importance analysis highlights the critical roles of loan characteristics, applicant financial information, employment and residential information, and credit history in default prediction. Notably, loan term, highest credit cap, employment tenure, and active accounts number emerge as the most influential features, shaping the individual probability of default. However, limitations in data availability and the directional impact of feature variables within the model may reduce the generalizability and interpretability of the predictive model. This research provides valuable insights for financial institutions aiming to improve their credit risk management practices by adopting sophisticated machine-learning models to predict personal defaults.

Keywords: Personal Default Prediction, Tree-Based Machine Learning Algorithms, XGBoost, Credit Risk Management, Commercial Banks

Authors’ individual contribution: Conceptualization — N.M.N. and C.D.H.L.; Software — N.M.N.; Validation — N.M.N. and C.D.H.L.; Formal Analysis — C.D.H.L.; Investigation — N.M.N. and C.D.H.L.; Resources — N.M.N. and C.D.H.L.; Data Curation — N.M.N. and C.D.H.L.; Writing — Original Draft — N.M.N. and C.D.H.L.; Writing — Review & Editing — C.D.H.L.; Visualization — N.M.N. and C.D.H.L.; Supervision — N.M.N.; Project Administration — N.M.N.

Declaration of conflicting interests: The Authors declare that there is no conflict of interest.

JEL Classification: C13, G21, G24

Received: 17.11.2024
Revised: 03.03.2025; 12.06.2025
Accepted: 25.06.2025
Published online: 27.06.2025

How to cite this paper: Nguyen, N. M., & Le, C. D. H. (2025). Assessing the efficacy and risks of tree-based algorithms in personal default prediction: Practical insights from commercial banks. Risk Governance & Control: Financial Markets & Institutions, 15(3), 8–21. https://doi.org/10.22495/rgcv15i3p1