FINANCIAL DISTRESS FORECASTING WITH A MACHINE LEARNING APPROACH

How to cite this paper: Ha, H. H., Dang, N. H., & Tran, M. D. (2023). Financial distress forecasting with a machine learning approach. Corporate Governance and Organizational Behavior Review, 7 (3), 90–104. https://doi.org/10.22495/cgobrv7i3p8


INTRODUCTION
Financial distress forecasting has been a topic of interest in recent decades because of its importance for listed firms, investors, creditors, regulators, and the economy (Wanke et al., 2015). If financial distress prediction is reliable, firm managers can initiate remedial measures to avoid deterioration before the crisis hits, and investors can take advantage of the crisis to evaluate the financial position of listed firms and adjust their investment strategies for optimizing profit.
Financial distress forecasting and bankruptcy is getting more and more attention from investors, creditors, and management. Determination of a firm falling into financial distress is necessary because it helps firm managers give suitable management for maintaining operations. It also helps investors and creditors evaluate the risks that they encounter when a firm falls into financial distress. Almost all studies on financial distress are conducted in the circumstances of the United States of America and Europe. This theme is still new in emerging countries including Vietnam.
Vietnam is in the process of deep integration with the region and the world. This is a golden opportunity for Vietnamese firms to join but also a challenge. Being a developing economy, most Vietnamese firms are small and medium-sized, so investment opportunities are not high, competitiveness is low, and above all, the capital market has not been developed yet. Vietnamese firms are prone to face difficulties such as capital scarcity, unstable cash flow, low investment opportunities, risk of insolvency, and likely to be in financial distress. Therefore, it is worth mentioning that the proper implementation of financial distress forecasting provides a good opportunity for firms to compete in the market and offer qualitative products and services (Hallunovi, 2023).
According to Wruck (1990), financial distress is a term that describes a financial difficulty when a firm's cash flow is insufficient to pay its current financial liabilities. Research directions on financial distress to date have focused on both theoretical and empirical aspects. In terms of theory, the studies provide methods to measure the state of financial distress of firms. These works use analytical approaches to identify variables for measuring the probability of financial distress as well as to determine the cut-off for getting the threshold for financial distress and non-financial distress. Many scientists focus on univariate models to determine the separate effects of each variable to measure the likelihood of firm financial distress (Beaver, 1966). In addition, they have developed methods of multivariate analysis and conditional probability analysis to measure financial distress (Altman, 1968;Ohlson, 1980;Zmijewski, 1984). The extant literature on financial distress is rich and diverse such as the empirical studies of Campbell et al. (2008) and Tinoco and Wilson (2013) aim to analyze the determinants of the probability of financial distress.
Conducting this topic of financial distress forecasting with a machine learning approach is necessary for both theoretical and practical perspectives. It helps management know the impact levels of determinants influencing financial distress and then give some suggestions to overcome. The signals of financial distress are also recognized and quickly responded to for reducing expenses to solve this problem. These aspects of financial distress also affected by corruption risk can become the foundation for effective and proactive community fraud prevention measures (Marzuki et al., 2022;Julian et al., 2022;Malik & Yadav, 2020). Machine learning is a data analysis method providing an accuracy of 98% to forecast financial distress. In this study, machine learning algorithms are employed to predict the probability of financial distress in the case of listed firms on the Vietnam Stock Exchange. From there, we consider which financial indicators are most effective in forecasting and determining models as well as which algorithms are the most effective.
To achieve the above objectives, the rest of the paper is structured as follows. Section 2 presents a theoretical framework and literature review on research issues including variable definitions and prior findings in the extant literature review. Section 3 is the methodology which describes the variable measurement, the method of machine learning, the method of evaluation, and the research data. Section 4 includes the empirical results and some discussion of findings whereas Section 5 is the conclusion of the paper.  Altman and Hotchkiss (2005) provide a complete description and definition of financial distress and show that bankruptcy is the closest legal definition of a financial crisis. "Bankruptcy" occurs when a firm submits a formal bankruptcy to a court and is approved by the court for bankruptcy. Zmijewski (1984) defines financial distress as the act of submitting for bankruptcy. However, many firms falling into financial distress have never filed for bankruptcy due to mergers or privatizations, whereas those in good standing often file for bankruptcy for avoiding taxes and costly lawsuits (Theodossiou et al., 1996). In practice, "bankruptcy", "financial failure", "default", and "financial distress" are used interchangeably. The terms "financial failure" or "financial distress" is used in many studies. "Financial distress" is a more flexible definition than "bankruptcy" and is used in the study with a bigger sample size. In contrast, "bankruptcy" is a special form of "financial distress"; and "bankruptcy" focuses on studies with smaller sample sizes. The use of "financial distress" provides more apt not only in practice but also in theory, because not all financially distressed firms go "bankrupt". "Bankruptcy" is only a last option for firms when they cannot solve their financial problems (Aktas & Mahaffy, 1996).

THEORETICAL FRAMEWORK AND
Others argue that financial distress refers to a firm's difficulty in repaying debt or meeting other financial obligations (Ghazali et al., 2015). In case of severe financial distress, the firm may go bankrupt. Binti, Zeni, and Ameer (2010) define financial distress as a term used when contractual arrangements with creditors cannot be executed due to a firm's financial difficulties. Meanwhile, Hu and Ansell (2006) state financially distressed firms as those with a debt ratio greater than 1, meaning that liabilities are greater than total assets, or an interest payment ratio (based on cash flow) is less than 1, meaning that the cash flow of a firm is not enough to pay interests.
Beaver's univariate discriminant analysis Beaver (1966) employs financial indicators from the empirical study of 79 bankrupt firms and a number of firms that did not fail over 10 years (1954-1964).
The results show that the ratio of cash to total liabilities is the most important indicator in predicting signs of financial distress and bankruptcy. This indicator presents the balance between a firm's ability to generate cash and the amount of debt that the firm has to pay. In addition, return on assets and debt ratio are also important signals in detecting firm financial distress and bankruptcy because these signals reflect the firm performance and the level of financial risk.
A comparison of indicators which were drawn from Beaver's research illustrates that all financial indicators of firms in crisis are much lower than those in a normal situation. Thus, the findings of Beaver's (1966) study show the way to predict the financial distress of a firm in aspects of detecting signs of financial distress/bankruptcy of a business by comparing the firm financial ratios with the averages calculated by Beaver. These findings also have been applied in several fields of financial reporting and corporate governance

Altman's multivariate discriminant analysis
Unlike Beaver (1966), Altman (1968) uses multivariate discriminant analysis (MDA) to find linear equations of financial ratios for determining whether firms are bankrupt or not. Altman (1968) employs MDA with the variables used as X1 = Working capital/Total assets; X2 = Retained earnings/Total assets; X3 = Earnings before tax and interest/Total assets; X4 = Market value equity/Book value of total liabilities; X5 = Sales/Total assets; Z = Overal index based on data of 66 firms in the US, in which these firms are divided into two groups of 33 each. Group 1 includes 33 firms that went bankrupt from 1946 to 1965. Group 2 consists of 33 firms that did not go bankrupt and continued to operate normally (at least) until 1966. Non-bankrupt firms had similar sizes and sectors are paired with bankrupt firms. From balance sheets and income statements, 22 financial indicators are calculated and classified into five groups of liquidity, profit, leverage, solvency, and operating ratios.

Logistic analysis techniques
Unlike discriminant analysis, which only determines whether a firm is distressed or not, logit analysis can also determine the probability of a firm's financial distress. The coefficients of the logit model can be estimated using the "maximum likelihood" method. Logit analysis uses the logistic cumulative probability to predict financial distress. The result of a function is between 0 and 1, which is the probability of financial distress.
Using logistic models and data from financial statements of American firms for the period 1970-1976, Ohlson (1980) develops a model that estimates the probability of firm failure. Data were collected from 105 bankrupt firms and 2,058 nonbankrupt firms in the industrial sector, from 1970 to 1976 that have traded on the US Stock Exchange. The indicators calculated and selected for use in the model represent four groups of basic financial indicators in predicting bankruptcy, including size, financial structure, performance, and liquidity. From there, Ohlson (1980) selects nine independent variables in predicting bankruptcy/financial distress, including:  SIZE = log (Total assets/(GNP price -level index);  TLTA = Total liabilities/total assets;  WCTA = Net working capital/total assets;  CLCA = Current liabilities/current assets;  OENEG = 1 if total liabilities > total assets, and vice versa;  NITA = Profit after tax/total assets;  FUTL = Active funds/total liabilities;  INTWO = 1 if net income decreases in two consecutive years and vice versa.
Based on the theoretical framework and literature review of the possibility of bankruptcy/ financial distress, Ohlson (1980) proposed the variation of independent variables to a dependent variable as TLTA, CLCA, and INTWO have covariate properties; SIZE, WCTA, NITA, FULT, and CHIN are inverse; OENEG is unspecified. The three models include: 1) the first model predicts failure in 1 year; 2) the second model predicts failure in 2 years; and 3) the third model predicts failure in 1 or 2 years. Then Ohlson (1980) used logistic binary to predict the probability of a firm's bankruptcy for each model. The results show that the predictive accuracy is over 90%. The classification of firms is based on the calculated value of p (p is the probability that a firm is at risk of bankruptcy). If p > 0.5, the firm is assigned to bankrupt/financially distressed, and if p < 0.5, it is unlikely to go bankrupt/financially distressed. Kumar and Ravi (2007) adopt various algorithms of intelligent techniques to solve the problems of financial distress. According to Serrano-Cinca (1996) and Fletcher and Goss (1993), neural network (NN) is the most commonly used technique. Other techniques including decision tree (DT), and support vector machines (SVM) are used to investigate financial distress prediction. A decision tree is a structured hierarchical tree employed to classify objects based on a series of rules. When given data about objects containing attributes along with their classes, the decision tree generates rules to predict the class of the unknown objects (unseen data). The support vector machines technique is a supervised machine learning model used to analyze and classify data. SVM takes incoming data and classifies them into two different classes. There are many studies using machine learning in predicting financial distress such as those of Anandarajan et al.

Identification of financial distress
Most of the studies on forecasting financial distress have focused on predicting bankruptcy (Altman, 1968). However, recent studies have shown that financial distress is not the same as bankruptcy and suggest that not all firms undergoing financial distress will eventually submit for bankruptcy (He et al., 2010). The existence of different views on financial distress in studies on forecasting financial distress is due to the difference in the selection of research samples as well as the variety and complexity of the financial distress (Wruck, 1990), including failure, insolvency, default, and bankruptcy (Altman & Hotchkiss, 2005). Therefore, there are some measures to identify the firm financial distress. Some studies identify the state of financial distress based on accounting and market data (Denis & Denis, 1995;Andrade & Kaplan, 1998). Some other studies rely on corporate actions such as cutting or stopping dividend payments, delisting, submitting for bankruptcy, or performing mergers and acquisitions with other firms (Turetsky & McEwen, 2001;Altman & Hotchkiss, 2005). Recently, many studies have confirmed that the Z-index (Altman, 1968) or the index of B (Zmijewski, 1984) can be employed as a metric to determine whether a firm is in financial distress or not. Among them, the index is the most commonly used because it is not sensitive to different states of financial distress and is also not sensitive to business lines (

Application of machine learning in financial distress prediction
The previous empirical evidence for forecasting financial distress illustrates that models have been improved in both predictability and accuracy over different periods, from univariate analysis (Beaver, 1966), multivariate discriminant analysis, MDA (Altman, 1968), and logistic conditional probability statistical analysis (Ohlson, 1980). Multivariate discriminant and logistic analyses are two popular methods because of their high accuracies. However, both models have weaknesses in assumptions that make use difficult. MDA assumes that independent variables have a normal distribution and a matrix of variance-covariance has to be the same between financial distress and non-financial distress, while logistic analysis assumes data variability homogeneity and sensitivity to multicollinearity.
Since the early 20th century, with the development of science and technology, machine learning models such as artificial neural networks (ANN), support vector machines (SVM), random forest (RF), decision tree (DT), and Bayesian models have been introduced. Studies of forecasting financial distress adopting machine learning methods have become non-parametric methods employed in forecasting financial distress. Updated studies confirm that ensemble algorithms from feature selection to predictor construction can achieve high accuracy according to the actual case, and the interpretation framework can meet the needs of external users by generating local explanations and global explanations (Zhang et al., 2022). An overview of financial distress using machine learning is presented in Table 1.

Variable measurement
Currently, there are some ways to measure financial distress. However, each measure has its advantages and disadvantages. Ghazali et al. (2015) state that Altman Z-Score can be the most popular method to measure the financial condition and has been employed to determine financial distress. Therefore, in this study we determine financial distress based on three approaches 1) the Z-index of Altman (1968), 2) the dummy variable of Fich and Slezak (2008), and 3) the index B of Zmijewski (1984).
If Z < 1.81, a firm is in the financial distress zone and the financial distress variable will have a value of 1, otherwise, it will have a value of 0.

Dummy variable of Fich and Slezak (2008)
Fich and Slezak (2008) measure financial distress through a dummy variable with a value of 1 if the ratio of return on interest expenses is less than 1 (that is, equal to evidence of financial distress), and has a value of 0 otherwise. This measure assumes that if a firm is unable to generate profit large enough to cover its interest expenses, it will soon face default on its debts. It can be said that this measure is based on book value, so it can overcome concerns about ambiguity in the index measurement method.
Index B of Zmijewski (1984) Index B of Zmijewski (1984) defines index B as follows: where:  ROA = Net profit divided by total assets;  FINL = Total debt to total assets;  LIQ = Current assets divided by current liabilities.
Firms are determined to be financially distressed when B > 0 (the cut-off point is 0) or the financially distressed variable will have a value of 1, otherwise, there is no financial distress.
In this study, 22 attributes of financial ratios are calculated, including the group of solvency, group of capital structure and debt serviceability, group of profitability, group of activity, group of growth indicators, and others, which are presented in detail in Appendix A,

Machine learning methods
Machine learning is a means of artificial intelligence that employs algorithms to allow computers for learning from data for solving specific problems such as making computers have basic human cognitive (hearing, seeing, understanding, and solving math problems). Machine learning plays an important role in sciences and its applications are part of daily life. Machine learning is used to filter email spam, and predict the weather, in medical diagnostics, product recommendations, facial recognition, credit card fraud detection, financial distress prediction, or firm bankruptcy. In this research, we adopt some commonly used algorithms to predict financial distress such as logistic regression (LR), decision tree (DT), Bayesian network, support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF).

Logistic regression
Logit regression introduced by Berkson (1944) is a common tool in data analysis with binary variables. Some developments of Altman et al. (1994) and Flitman (1997) have been used in the analysis of multivariate regression models, and the analysis of differences. The binary logistic model employs a binary dependent variable to estimate the probability that an event will occur given the information of the independent variable. Data to be collected about the dependent variable is whether a certain event occurs or not (the dependent variable Y now has two values 0 and 1, with 0 being no event and 1 occurring) and of course data about the independent variables X1, X2, …, Xk. From this binary dependent variable, a procedure will be employed to predict the probability of the event occurring according to the rule if the predicted probability is greater than 0.5 (default cut-off point), then the prediction result will be "yes", otherwise the predicted result will be given as "no". The binary logistic model is presented in Figure 1: where, P is the probability that Y = 1 (which is the probability that the event will occur) when the independent variables take on a specific value. Accordingly, the probability that the event does not occur is: The regression coefficients were estimated by the method of maximum likelihood (maximum likelihood-ML). The logit regression model can be adopted to estimate the log(odds) ratio for each independent variable of the model of Ohlson (1980). The parameters were estimated by the method of maximum likelihood. The logit model is used with many types of data, few constraints, effective when applied in practice, easy to interpret results, and capable of monitoring, diagnosing, and adjusting to match results. consistent with reality.

Decision tree
Decision tree, a classification model introduced by Belson (1959), is widely adopted in different fields. After the introduction of the machine learning method system, the decision tree was further developed with the C4.5 algorithm by Quinlan (1996) and the ID3 algorithm by Quinlan (1986). A decision tree is a structured classification tree that classifies objects based on sequences of rules. Independent variables and attributes can be of different data types such as binary, nominal, ordinal, and quantitative data. To determine which variable to use classification first, the information weight (entropy) for each variable is calculated, the higher the information value, the more categorical information the variable carries.

Bayesian network
Bayesian network is applied for classification based on conditional probability. Like the logistic function, the Bayesian result is a probability that has a value from 0 to 1 (expressing the probability of an event occurring from 0% to 100%), the variables are linked together by a probability. The Bayesian method is developed from the Bayes theorem in statistical probability. According to Carlin and Louis (2000), the Bayesian method is more about statistics than regression. The Bayesian method is quite efficient and easy to use, does not require data conditions, and can work on both numeric and alphanumeric data. With small or unbalanced datasets, the method is more effective, when other methods cannot perform or have to process data with a lot of operations. For fraud detection, the Bayesian network will be built with the Bayesian rule along with the condition P(Y = 1) + P(Y = 0) = 1 written as below: where, (X) = P(Y = 1)P(X│Y = 1) + P(Y = 0); P(X│Y = 0) The component is calculated as follows: P(Y = 1) is the error rate of the sample used to run the model, assuming the variables are independent.

Support vector machine
Support vector machine (SVM) is a binary classification algorithm. It takes input and classifies them into two different classes. Given a set of training examples belonging to two given categories, the SVM algorithm builds an SVM model to classify other examples into those two categories. The SVM builds/learns a hyperplane to classify the dataset into two separate classes. To do this SVM will construct a hyperplane or a set of hyperplanes in a multi-dimensional or infinite-dimensional space, which can be used for classification, regression, or other tasks. For the best classification, it is necessary to determine the optimal hyperplane located as far away from the data points of all classes as possible. In general, the larger the margin, the greater the generalization error of the algorithm the smaller the classification. Figure 2 depicts the SVM algorithm. Given a training set represented in a vector space where each document is a point, this method finds a decision hyperplane that can best divide the points on this space into two separate layers, respectively, the layer with the data containing the feature simulated by the black dot and the layer with the data containing the feature simulated by the white dot. The quality of this hyperplane is determined by the distance (called the boundary) of the nearest data point of each layer to this plane. The larger the boundary distance, the better the decision plane and the more accurate the classification. The purpose of the SVM algorithm is to find the maximum boundary distance.

K-nearest neighbors
The k-nearest neighbors (KNN) algorithm is very commonly employed in the field of data mining. KNN is a method to classify objects based on the closest distance between the object to be classified (query point) and all the objects in the training data. An object is classified based on its K-neighbors. K is a positive integer that is determined before the execution of the algorithm. Euclidean distance is often used to calculate the distance between objects.

Random forest
Random forest is an attribute classification method developed by Leo Breiman at the University of California, Berkeley (Breiman, 2001). Breiman is also the co-author of the classification and regression trees method which is rated as one of ten data mining methods. In a random forest, a significant improvement in classification accuracy results from the growth of a set of trees, each of which "votes" for the most popular class. To develop these sets of trees, normally random vectors are generated, which will govern the growth of each tree term in the sets. For the kth tree in the set of trees, a random vector Vk is generated, which is independent of the previously generated vectors V1, V2, ..., Vk-1 but the distribution of the vectors is similar. A tree is grown based on the training set and the resulting vector Vk is a subclass h(x, Vk) where x is the input vector. After a large number of trees are created these trees "vote" for the most popular class.

Evaluation methods
In this study, in addition to measuring accuracy, in the case of severely imbalanced data, the use of accuracy as a measure of model evaluation is often ineffective because most of them are all very accurate. A stochastic model that predicts that the label belongs to the majority group will also yield results close to 100%. Then we consider a number of alternative metrics such as Precision, Recall, and F1-score. These indicators will not be too large to lead to a misconception of accuracy, and at the same time, they will focus more on getting accurate results to evaluate the accuracy of the minority group, which we want to forecast more accurately than the majority group. Positive corresponds to label 1 (financial distress) and Negative corresponds to label 0 (normal). From Figure 3, the meanings indicators as below:  Precision: The level of prediction accuracy in the forecasted cases is Positive: Precision = TP/(TP + FP).
 Recall: The level of accurate prediction of cases is positive in actual cases is Positive: Recall = TP/(TP + FN).
 F1-score: Harmonic mean between Precision and Recall. This is an ideal surrogate metric for accuracy when the model has a high sample imbalance rate: F1-score = 2/(1/Precision + 1/F Recall).
 AUC (area under curve): Represents the relationship between sensitivity (sensitivity) and specificity (specificity). Assess the ability to classify financial distress and normality predicted from the model. Values of AUC less than 0.6 indicate poor predictive ability of the model, AUC between 0.8 and 0.9 is quite good, and above 0.9 is good.
A model with all the above indicators in the high range has better predictive quality. In this study, we use Accuracy, Precision, Recall, F1-score, and AUC as a measure of model evaluation.

Research data
This research uses data collected from the Vietnam Stock Exchange in the period from 2009 to 2020. Data are collected from audited financial statements of listed firms after excluding firms in the fields of banking, securities, and insurance sectors since these fields' characteristics are much different from other fields. After determining the indicators, the data used to perform the analysis and forecast is 4,936 observations, presented in Table 2. Based on using financial management measures according to each model, financial distress data are presented in Table 3, whereby when measured according to the models of Altman, Fich and Slezak, and Zmijewski, it is 50.61%, 25.65%, and 9.83%, respectively. Appendix A, Table A.2 and Table A.3 provide information on mean, standard deviation, and minimum and maximum values between the financial accounting firms and the normal for the three models, respectively. The next step is to select and test the results of the model. We randomly divide the dataset into a pilot set and a test set.
 Pilot set: Based on the input and target variables of the train set, we train the financial distress classification model. The obtained model will be evaluated on other independent data sets such as a test set.
 Test set: This is also a dataset with fields similar to the train set that are considered completely new observations. The test set should have the most similar distribution to the actual data that the user will generate to evaluate the applicability of the model in practice.

RESULTS AND DISCUSSION
Based on the random forest algorithm, which belongs to the class of ensemble models. The results of the algorithm are based on majority election from many decision trees, so the model has higher reliability and better accuracy than simple linear classification models such as logistic or linear regression. The results in Appendix B, Figure B.1, Figure B In the next step, we select the important variables for regression instead of selecting as many variables as possible because of the limitations of having too many features: increased cost and computation time; too many explanatory variables can lead to over-fitting (i.e., the phenomenon that the model works very well on the train set but poorly on the test set); among the variables will be those that cause interference and reduce the quality of the model. In consequence, eight attributes are selected, which are performed data transformations through sklearn.preprocessing. The coefficients of variables in each model are illustrated in Table 4.  Table 5 reveals the results of the accuracy of models (Accuracy), logistic regression, support vector machine, decision tree, random forest, K-nearest neighbors, Bayesian network algorithms in Model 1 -Altman is the lowest 0.81, the highest is 0.98, in Model 2 -Fich and Slezak the lowest is 0.81, the highest is 0.90 and in Model 3 -Zmijewski the lowest is 0.81 and the highest is 0.98. Of the six algorithms used to forecast financial information, the random forest algorithm achieved the highest accuracy with a rate of 98%. We use other metrics for more comprehensive testing. As shown in Table 5, the random forest algorithm gives the highest prediction accuracy, so we base this algorithm to measure the accuracy according to the measures of Precision, Recall, and F1-score. Table 6 shows that Model 1 and Model 3 give the best findings, especially Model 3. However, in Model 3, it reveals that the results of measuring the accuracy of the financial distress group and the normal group have a small difference; in normal conditions, the accuracy reaches 99%, while the accuracy for the case of financial distress is only 93%, 90%, and 92%, respectively for Models 1, 2, and 3. The reason for the small differences may be due to the imbalance of data for Model 3. Based on the data in Table 3, the proportion of observations is subject to financial distress for only 9.83%. To deal with unbalanced data, we use these techniques: under-sampling, over-sampling, and synthetic minority over-sampling (SMOTE). The results of Table 7, after processing the unbalanced data, give very good results and there is no big difference in the measurement of the predictive level of Model 3 for the normal group and the financial distress group. The AUC index measures the area under the receiver operating characteristic (ROC) curve, indicating whether the classification ability of the normal/financial distress group of the algorithms presented above is strong or weak. AUC ∈ [0, 1], the larger its value, the better the model. The GridSearch random forest algorithm (using GridSearch is a technique to help find the right parameters for the model) and decision tree achieve a high prediction accuracy rate, AUC = 0.97 for Model 1, and AUC = 0.95 for Model 3 (see Appendix C, Figure  Total asset turnover is the most important attribute to forecast financial results for firms, which shows that when total asset turnover is less than 1.461, the firm is forecasted as financially distressed. Similar to Model 2, when X8: Profit margin of the firm is less than 2.1% and Model 3, when X5: Debt-to-equity ratio is greater than 3.208, the firm falls into financial distress. This result is consistent with the results of Kim and Upneja (2014) when the debt-to-equity ratio decreases, the level of productivity is high, the asset turnover and profit margin are low, and the firm will be led to a state of financial distress.

CONCLUSION
This study aims to determine the direction of the impact of determinants on the possibility of financial distress and predict the probability of financial distress for listed firms on the Vietnam Stock Exchange in the period from 2009 to 2020. The results show that the debt-to-equity ratio has a positive impact on financial distress; but asset turnover, and profit margin negatively influences financial distress. This forecasting model has an overall correct prediction rate of 98%. Model 1 -Altman and Model 3 -Zmijewski are models capable of predicting financial distress at a high level. The research reveals that internal indicators in each firm will directly affect the probability of financial distress, corresponding to each model. The study has added to practice about issues management in firms which are the most important determinants determining the "health" status of firms.
Based on the findings, the regression coefficients of the independent variables in the models illustrate that as the debt ratio increases, the financial distress increases. Therefore, the more debt a firm has, the higher the risk of default increases, and increase the risk of firm financial distress. The greater the efficiency of asset usage, the lower the financial distress. This is completely consistent with the fact that the more revenue a firm produces, the less likely it is to become financially distressed as the firm sells more products. The higher the profit margin, the lower the probability of financial distress. When a firm has internal funding available, the firm will be proactive in investing, limiting external debt, thereby minimizing the possibility of financial distress.
This study has some limitations. First, no comprehensive data on the financial distress and bankruptcy of firms is collected. Second, no data of non-financial information such as corporate governance or some macro issues are put in the models. In the future, to achieve better and more comprehensive results, we continue to add macroenvironmental and market determinants that influence firm distress as well as compare between different business lines. Therefore, in-depth study by sector helps managers realize the importance of investment, financing, and business operations management.