BOARD DIVERSITY AND FIRM PERFORMANCE: AN EMPIRICAL ANALYSIS OF ITALIAN SMALL-MEDIUM ENTERPRISES

How to cite this paper: Morrone, C., Bianchi, M. T., Marsocci, V., & Faioli, D. (2022). Board diversity and firm performance: An empirical analysis of Italian small-medium enterprises. Corporate Ownership & Control, 19 (3), 8–24. https://doi.org/10.22495/cocv19i3art1 This paper aims to empirically verify if the board of directors’ (BoD) diversity (i.e., gender, age, and nationality) affects firm performance, which we calculate referring to ROE, ROA, and EBITDA margin. So far, scholars do not converge on a single answer about the effects of observable diversity in the boardrooms on corporate performance. Therefore, this study — referring to a significantly bigger sample — applies machine learning models following a data-driven approach based on a three-year (2017–2019) dataset composed of 59,229 Italian small-medium enterprises (SMEs). The analysis conducted shows that board diversity does not impact firm results, either positively or negatively. The lack of a correlation suggests that there is no reason to not appoint females, young people, and foreigners as directors. The involvement of these ―minorities‖, which, as shown, does not negatively impact economic-financial results, could on the opposite improve firm reputation as well as enhance the intellectual capital, solving in the meantime a social matter. of The Authors declare that there is no conflict of interest.


INTRODUCTION
The value brought by diversity in different contexts (e.g., education, business) is a current topic (Arora, 2021; Di Miceli da Silveira, 2021; Francoeur, Labelle, & Sinclair-Desgagné, 2008; Mensi-Klarbach, 2014) that has infinite nuances and implications. If someone believes that diversity is one of the fundamental values of our century, others perceive it as a threat.
In this scenario, we are interested in understanding the role of diversity in the board of directors (BoD). With this scope, this paper aims to empirically investigate the relation between boardroom diversity and firm performance throughout the study of a sample of 59,229 Italian small-medium enterprises (SMEs), covering a period of three years (2017-2019). Even though the topic is not innovative, we use a different methodology from prior literature, applying machine learning models to our dataset of SMEs instead of the most common listed companies or larger ones that allow referring only to smaller sampling. Moreover, the results of the previous analyses are not unanimous and thus we still know little about the issue (Banerjee, Nordqvist, & Hellerstedt, 2020).
Concerning the board diversity, it is interesting to observe that results of a survey conducted by PricewaterhouseCoopers (PwC, 2018) show that most directors identify diversity as a value: more than 80% of interviewees think that it brings unique perspectives to the boardroom, enhancing board performance and improving relationships with investors; 72% agree that diversity has a positive effect on corporate performance and improve strategy/risk oversight. What variables do they refer to? The three kinds of diversity variables considered in the above-mentioned survey are gender, age, and ethnicity/nationality that are also the most used by researchers (Fernández-Temprano & Tejerina-Gaite, 2020; Baker, Pandey, Kumar, & Haldar, 2020). Indeed, as noted by Erhardt, Werbel, and Shrader (2003), two types of diversity exist: observable (e.g., gender, age, ethnicity) and non-observable (e.g., knowledge, personality characteristics) but literature mostly considers the former.
Despite top executives' feelings, literature offers contributions that highlight the benefits of diversity, as well as papers that show its negative impact on the organizations ( Adams et al. (2015), in particular, observe that diversity can have both benefits and costs.
Because of the ambiguity found in the existing literature, we want to reply to the following research question: Is it possible to gain insight concerning the economic-financial performance of Italian SMEs, starting just from corporate governance information, and particularly on gender, age, and nationality of directors? In other words, we want to quantitatively analyze if there is a strong, direct, and diffuse connection between some characteristics of the boardroom and business performance, paying attention to the diversity. To target our research question, the rest of the paper is structured as follows. Section 2 reports a review of the literature and the hypotheses development. Section 3 details the sampling and the applied methodology. Section 4 proposes the description of our findings. Section 5 offers the discussion of the results and Section 6 summarizes the conclusions of our work, underling its limitations.

Diversity and firm performance: Theoretical background
Considering that the main goal of governance is to obtain the best performance, implementing the most possible effective, efficient, ethical, and correct management, literature is rich of papers that investigate which directors' characteristics impact economic and financial results (

Gender
In the last years, the gender issue has become increasingly important. The Organization for Economic Co-operation and Development (OECD) provides specific indexes among which -female share of seats on boards of the largest publicly listed companies‖ that highlight the low presence of women in the boardrooms around the world (OECD, 2020).
Despite the indisputable social matter, the results of research projects that study the nexus between gender diversity and performance are mixed ( Dezsö and Ross (2012) show a positive relation between the percentage of women on top management teams and firm financial performance only to the extent that a firm is focused on innovation as part of its strategy. Also, Li and Chen (2018) remark a positive impact on performance but only if the value of firm size is below some critical value.
Contrariwise, Adams and Ferreira (2009) deduce that the average effect of gender diversity on firm performance is negative. Similar results are indicated by Darmadi (2011), who reveals a significant negative relationship between women in the boardrooms and ROA.
Other authors (

Nationality
Board diversity supporters argue that different opinions in a culturally heterogeneous group generate higher quality decisions (Antonelli, Rivieccio, & Moschera, 2013) and, indeed, multiculturalism is always more common in the companies even though it is not proportional with the globalization of our planet.
In this context, the relevance of different nationalities within the boardroom is an extensive topic of research that interests several authors (Ben-Amar, Francoeur Although many managers think a positive relationship between nationality diversity and value creation exists, previous empirical analysis highlight pros and cons and do not totally confirm the common feeling. Actually, the results are mixed. Ruigrok et al. (2007) show the ability to generate benefit in international markets thanks to the cultural knowledge and expertise of foreign directors, but, at the same time, the potential communication and integration problems within the boardroom are observed. The issues related to cross-cultural communication are described also by Lehman and DuFrene (2007). Piekkari, Oxelheim, and Randøy (2015) propose a specific study concerning the language spoken by directors, emphasizing that language is a distinct dimension of diversity. Definitively, if some authors (Darmadi, 2011;Guest, 2019) find no evidence that foreign directors positively influence corporate performance, others (Choi, Park, & Yoo, 2007;Ruigrok et al., 2007) show a positive relation between nationality diversity and performance. Among the latter, Fernández-Temprano and Tejerina-Gaite (2020) note a positive effect on performance only when nationalities mix refers to insiders. Finally, Khan and Abdul Subhan (2019) assert that nationality diversity is negatively associated with firm financial performance.

Age
Among the variables studied to analyze observable board diversity, directors' age is one of the least investigated. Indeed, a survey (PwC, 2018) shows that only 21% of interviewed directors consider the BoD age diversity important against 46% that state the same for gender diversity. Anyway, despite the pretty scarce attention, the age of directors has an impact on their risk appetite (Liu, Fisher, & Chen, 2018;Vroom & Pahl, 1971) and productivity (Kim & Lim, 2010) and obviously on the experience gained (Fernández-Temprano & Tejerina-Gaite, 2020).
A controversial aspect concerns the propensity for strategic changes for elderly directors; some authors conclude that younger have a higher inclination for changing (Hambrick & Mason, 1984; Wiersema & Bantel, 1992) while others (Golden & Zajac, 2001) find a positive link between the percentage of directors over 50 and adoption of strategic change, explaining by the consideration that to implement them, executives need capabilities, confidence, and experience.
Also with regard to the relation between age diversity and performance, literature shows unclear results: some research projects (Bonn, 2004;Jhunjhunwala & Mishra, 2012) propose empirical analyses with non-significant relation; others (Mahadeo, Soobaroyen, & Hanuman, 2012), acknowledging the impact of a mixed-age board on its workable, find that -in presence of other independent variables -age diversity is a benefit. Analyzing a sample of Korean companies, Kim and Lim (2010) highlight the positive impact on firm valuation concerning outside directors' age.
Prior Jonson, McGuire, Rasel, and Cooper (2020), studying the effects of both the mean age and age diversity of 130 Australian companies, note that to a higher average age of board members a better firm performance is associated, while no significant relation occurs regarding age diversity and economic-financial results.

Hypotheses development
As distinctly emphasized by the literature review, despite the quite intensive research activities concerning the observable diversity of the boardroom and firm performance, scholars still have not obtained consistent results. That said the hypotheses, on which our work relies, are: H1a: Board diversity impacts on the change of ROE and ROA.
H1b: Board diversity impacts on the change of EBITDA margin.
The reasons for choosing these variables to investigate the relationship between board diversity and corporate performance are highlighted later in the paper. For each company, we downloaded both governance and economic-financial data related to 2017, 2018, and 2019. Additionally, the sector of business and localization were identified. Nevertheless, these last two variables were used just as control.

Sample
In order to obtain our sample, we deleted the VAT number:  with sole administrator;  with at least one member of current BoD appointed after 1 st January 2017;  with incomplete information. As shown by the data of our sample, the presence of female and young directors is a minority. The number of international people that seat on the boards is even more immaterial. In Figure 1    Among variables obtained regarding SMEs in our sample, as we will explain better in the remainder of the work, we relied on ROE, ROA, and EBITDA Margin, to create the target variables of our models. Independent variables, instead, are the ones of corporate governance and specifically the number of members, the mean age of the BoD, the variance of ages of the BoD, standard deviation of ages of the BoD, number of females, number of males, percentage of females, percentage of males, number of busy directors (who have more than 3 seats). We practiced a feature selection, as after shown, to select the independent variables of our models.

Methodology
As anticipated, to test our hypotheses, we applied a methodology based on machine learning.

Machine learning: Theoretical framework and literature review
Machine learning (ML) refers to a class of models that can perform complex forecasting tasks even when the relation between predictors and outcome is complex. Thus, ML models can perform highly accurate out-of-sample forecasts without imposing strong assumptions on the structure of the data. The above-mentioned relations are based on correlations that the machine finds among variables (Bishop, 2006).
In the last years, these models became strongly widespread in many application fields. In business economics, they have been becoming emerging lately. Some fields in which they are being applied are the probability of default forecasting (

Applied methodology
In our research, we divided the work into two phases. In the first, we conducted some empirical experiments, investigating the correlations among corporate and economic variables. In the second, we applied supervised machine learning models, both regression and classification ones.
Both share the same concept of utilizing known datasets (referred to as training datasets) to make predictions. In supervised learning, an algorithm is employed to learn the mapping function from a set of input variables , with p number of input variables, to the output variable y: where, n is the number of items in the sample. The main difference between them is that the output variable in the regression is continuous, while for classification it is categorical (i.e., discrete).
Particularly, the regression models (Montgomery, Peck, & Vining, 2021) we chose, are: 1) Linear regression, that is a linear model with coefficients , to minimize the residual sum of squares between the outputs observed in our dataset and the targets predicted by the linear modeling: with (2) where, is the scalar response we want to estimate, the explanatory variables and n the number of items in the sample.
2) Lasso regression performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It can be expressed by the following formula: where, | | ∑ | | , and , X is the vector of the explanatory variable of n × p dimensions, and Y the vector of the output variable, with size of n × 1.
3) Ridge regression, also known as Tikhonov regularization, which solves a regression model where the loss function is the linear least squares function and regularization is given by the L2-norm. It is useful to mitigate the problem of multicollinearity. It can be expressed by the following formula: (4) where, the conventions are the same used for the Lasso regression.
On the other hand, used classification models are: 1) Logistic regression (Montgomery et al., 2021), used in a multiclass case, through the one-vs-rest (OvR) scheme, with a cross entropy loss. The problem, in its dichotomy definition, has the following formula: ( ) 2) Decision tree (Safavian & Landgrebe, 1991), that is a set of rules that recursively split up the full dataset into homogeneous subsets, according to their characteristics, referred to as the output variable. Predictions are obtained in the form of the odds of a given outcome in each subset.
3) Random forest (Breiman, 2001), which ensemble a wide number of predictions stemming from individual decision trees into a single highperforming forecast. Particularly, a random sample of data and a random selection of variables are used for each tree in order to obtain less correlated individual predictions.
To conclude, to ensure the maximum possible performances of our models we applied the following techniques: a strong preprocessing of the dataset, cross-validation, and hyperparameter grid search.
Particularly, to avoid a noisy estimate of the predictive performance of our model, we chose to use cross-validation. Cross-validation is based on the partitioning of the dataset in several complementary S subsets, and then training the algorithm on S-1 subsets, and testing it on the last remained. After that, the same process is iterated choosing a different subset for the testing step. Eventually, the average of the results is taken.
Moreover, grid-search is a technique used to find the optimal hyperparameters of a model (e.g., number of trees in the random forest), that cannot be estimated directly from the dataset. They have to be set before the learning process begins. With grid-search, we ask the machine to train several models with a combination of hyperparameters values.
In the remainder of this section, we will show the empirical results of these processes. All the analyses have been performed with Scikit-learn (Pedregosa et al., 2011).

RESEARCH RESULTS
As described, we built up a dataset of 59,229 companies (identified through VAT/Tax code number) with governance and economic-financial data related to three years (2017, 2018, 2019). The possibility of having a dataset that has information on a three-year continuous time window is a great advantage. In fact, while the economicfinancial variables differ from year to year, the governance variables remain constant, as the BoD in our sample remains unchanged over this period. In this way, the relationship between board composition and corporate performance is more consistent.
In this regard, to conduct analyzes aimed at investigating this relationship, we have created two target variables, based on different considerations: 1) a percentage change of a linear combination of ROA and ROE (pctROE + ROA), combining, to mitigate the singular effect of each variable, two of the most used variables in literature (H1a); 2) a percentage change of EBITDA margin (pctEM). We made this choice because it is an excellent ratio of the company's profitability, expressing the performances of operational management. EBITDA Margin can be used to compare companies with very different capital structures and sectors (H1b).
For both the variables, we applied some preprocessing. We noted that the standard deviation of these variables (i.e., pctROE + ROA and pctEM) was too high, mainly due to two causes: some extreme values, and a strong imbalance of the values around zero. Thus, we clipped the minimum and maximum, respectively to the first and ninety-ninth quantile.
First, we calculated the percentage change. Then we practiced clipping, and, in the case of ROE + ROA, we performed a linear combination, in which each variable had the same importance (arithmetic mean). Finally, we filtered all the samples with at least one missing value on the target variables, ending up with 56,930 records.
In the following Table 1, we can observe the discussed statistics. However, we can still observe how there is a strong imbalance in both variables on values around 0 ( Figure 5). For this reason, the same analyses were also performed on a subset, obtained by taking only those samples with both pctROE + ROA and pctEM values included between -100% and +100%. We observed that the frequency distributions are more uniform (Figure 6). In this way, we believe that the analyses performed can be more consistent. The number of items of this subset is 37,771. Then, we investigated correlations.

Correlation
Our first empirical experiments concerned the Pearson's correlation among target variables and the corporate governance ones.
From a first glance at the correlation matrix (Appendix A, Figure A.1), we can observe how some variables are redundant, due to very high correlation coefficients (e.g., % of females and number of females). Thus, we apply a feature selection, pruning all the overabundant variables. After this step, we computed the correlation matrix (Figure 7). The considerations we assume from these results are: 1) there is no significant correlation among corporate governance and target variables, suggesting the difficulty of highlighting direct relations among these different types of information; 2) the only visible correlation is between pctROE + ROA and pctEM. This correlation empowers our choice of selecting two different target variables; in fact, the information shared by the two variables is important but not completely overlapping.
The same set of analyses and considerations have been made on the subsets composed by the target variables limited between -100% and +100%.
Finally, we found it interesting to investigate if relations do not emerge because they do not actually exist only at a general or macroscopic level, or if however, they could emerge on some specific subset of our dataset.
Hence, we decided to select the subsets based on the firm size defined considering the threshold of the number of employees detailed in article 2 of the annex to Recommendation 2003/361/EC. We obtained three subsamples: 1) micro (72% of records); 2) small (24%); 3) medium (4%).
None of these subsets showed a significant correlation (in Appendix A, Figure A.3, the -micro‖ correlation matrix is offered). Therefore, the most interesting case is the one in which we selected a subset of enterprises characterized by the following features: 1) 30 < % females < 70; 2) 20 < % international members < 80; 3) age variance > 30 years.
Applying these filters, we ended up with 923 items, demonstrating the lack of diversity in the Italian SME.
The matrix demonstrates that there is no statistically valid correlation (Figure 8). This is an important hint at the fact that it is difficult to give information on the economic-financial performance of a company starting only and exclusively from governance variables.
In any case, we proceeded with machine learning models.

Machine learning
In this section the results will be divided as follows: 1) we will illustrate a common procedure to all the subsets; 2) we will present the results of the individual experiments, firstly divided by dataset and then by methodology.
The preprocessing consisted of the following. For the target variables, we tried several transformations: logarithmic, square root, cubic root, and Box-Cox.
The experiments proved to be independent of the transformations, leading to similar results.
Below, we report those concerning the logarithmic transformation.
For the independent variables, we applied two preprocessing steps: imputing the missing values (with the median value of the respective variable) and standardization.
Moreover, based on the considerations presented in the correlation analyses, we practiced a feature selection, choosing the following corporate governance variables: number of BoD members, percentage of females, mean of the BoD age, variance of the BoD age, and percentage of international members. We will refer to them as -core variables‖.
Finally, for the classification models, we binned the target variables in uniforms bins (3, 5, and 10 bins), and we used this new variable as the target one.

Experiments on R_full (1) and E_full (2)
We divided the experiments into the regression and classification ones.

Regression
We selected three models: linear, Ridge, and Lasso. We applied cross-validation with 5 partitions and a grid search to determine the best value for alpha and the solver that is the algorithm adopted in the optimization. After having transformed back to real-world values the predictions and the truth variable, we evaluated the performances of our models through the root mean square error (RMSE) ( Table 2).  Basically, the machine is unable to learn any useful information. In Appendix B, Figure B.1 goes deeper in the evaluation of the trivial approach of the machine.

Classification
For classification, we used as target the binned variable, with three different numbers of bins (3, 5, and 10). The selected models are logistic regression, decision tree, and random forest. Also for this task, we combined cross-validation and grid search. Particularly, on each dataset, we fit 5 partitions on 90 candidates, totaling 450 fits. The candidates are obtained by combining the three aforementioned models, with several combinations of hyperparameters (e.g., the penalty for the logistic regression). The metric we adopted to evaluate the performances is accuracy.
In Table 4, we note how the results differ very little from the random baseline (0.10 for the 10-class classification, 0.20 for the 5-class, 0.33 for the 3-class classification). The performance between datasets with different target variables varies imperceptibly. Also in this task, the information content of the inputs fails to explain the output variable.

Classification
Following the same pipeline as before, these experiments lead to conclusions that almost completely overlap with those already reported (Table 4).

Experiments on E_ben (5)
We conducted the last experiment. We insert an economic-financial variable that is pctROE + ROA. The aim of this experiment is to show that the modeling of the economic-financial variable, starting only with corporate governance variables, failed not because of the ineffectiveness of the chosen model, but because of the impossibility of extracting information on the economic firm performance starting only and exclusively from governance variables. We applied exactly the same models on the E_ben dataset and we ended up with a 100% accuracy. The model selected by grid search and cross-validation is a random forest, with 5 features and 30 estimators.
In the remainder of this paper, these results will be further discussed, highlighting some focal points and describing some interesting research developments.

DISCUSSION OF THE RESULTS
As distinctly emphasized by the literature review, despite the quite intensive research activities concerning the observable diversity of the boardroom and firm performance, scholars still have not obtained consistent results. Findings, indeed, are mixed and unclear. Moreover, as highlighted, most of the papers focus on listed companies that, on one side, represent a minority in most markets, surely in the Italian one (Istat, 2020) and, on the other side, are managed in a different way. Indeed, in SMEs, BoD members are often also firm owners as well as managers and hence governance misses independent directors.
In line with part of the preceding literature, our analysis shows that no correlation exists, neither positive nor negative, and, indeed, the data-driven approach, based on different ML models implementation, confirms that board diversity does not affect firm results. As argued by Rose (2007) this does not mean that diversity should be renounced on boards of directors: there exist other reasons why corporate boards should be more diversified and coincide further with the rest of society. In fact, the lack of correlation, and particularly the absence of a negative one, raises the question of why the board composition is so much different from the one of society; we believe that a greater diffusion of diversity is necessary for a general improvement of gender equality as well as young professionals' development and inclusion of foreign people. Indeed, in the last years, the relevance of the social role played by companies is always more important for both scholars and practitioners and therefore these aspects cannot be ignored anymore.

CONCLUSION
This research has not a merely political economics perspective, but we want to understand the impact of boards diversity on firm performance through a business economics one (Bertini, 1990). Even if not quantitatively demonstrable, we can affirm that diversity in the boards of directors allows to enrich the human capital, introducing different skills and sensitivities on the boardrooms; indeed, according to us the lack of any correlation suggests renewing boards, paying attention to diversity -that could constitute a value in itself -given the potential suite of no strictly financial motives ( Moreover, we can suppose that firm success, which is not linked to observable diversity, is, instead, related to a non-observable one. To conclude, we achieved results similar to other authors (Carter et al., 2010; Chapple & Humphrey, 2014) but applying an innovative approach and referring to a bigger sample of SMEs that make this analysis more significant.
However, our study suffers some limitations. It is based only on Italian companies and the period covered by our dataset is relatively short. With this regard, next investigations could analyze if the impact differs in the short, medium, and long run.
Future research projects may also investigate, using similar models, the relation between so-called non-observable diversity and economic-financial performance.