Assessing Training Success: Comparative Stats

Introduction

The HR department organizes training and development programs for employees. These programs instill a global mindset in budding talent and help them acquire comprehensive business knowledge to enhance their performance and deliver better values to clients. Moreover, development programs prepare future business leaders to develop leadership in thoughts and actions, besides grooming them to lead big and small teams efficiently at different levels.

The compare means technique is generally used to compare the means of continuous variables for different categories of categorical variables. The compare means techniques can be used to compare the means of continuous variables such as salary, performance score, recruitment expenses, satisfaction level, and general expenses of employees with different categorical variables such as demographic variables (gender, age, income group) and other variables such as departments, region, manager name, and recruitment sources. Specific examples include a comparison of expenses from different recruitment sources, a comparison of salary from different departments, regions, age groups, etc.; a comparison of performance scores from different age and gender groups; a comparison of general expenses from the different areas, a comparison of employee satisfaction level concerning different managers, etc. This article, however, focuses on measuring the effectiveness of training programs according to different demographic variables; besides, compare means techniques like independent sample t-test, ANOVA, etc., are also used.

Compare Means

A common problem in research is comparing the central tendency of one group to a value or another group or groups. A test compares sample means to see if sufficient proof exists to predict that the means of the corresponding population distributions also differ. Standard statistical tools for assessing these comparisons are t-tests, analysis of variance, and general linear models. Parametric techniques are used when some assumptions are not met. The general goal for most of these tools is to use the estimate of the means (or another central measure), assess the variation based on sample estimates, and use this information to provide the amount of evidence of a difference in means or central tendency.

T-test and analysis of variance (abbreviated as ANOVA) are two parametric statistical techniques used to test the hypothesis when the dependent variable is continuous and the independent variable is categorical. The sample is taken from different populations. The different samples are measured on some variable of interest. For example, a t-test will determine if the means of the two sample distributions differ significantly. At the same time, ANOVA will decide if the means of more than two sample distributions vary considerably from each other. Both these tests are based on the three common assumptions:

The sample drawn from the population should be normally distributed,
There should be homogeneity of variances and
The independence of observations should exist.

There is a thin line of demarcation between the t-test and ANOVA. When the categorical variable has two groups, the t-test is used, while ANOVA is preferred if there are more than two groups. Thus, when the population means of only two groups are to be compared, the t-test is used, but when the means of more than two groups are to be compared, ANOVA is preferred. Three common assumptions need to be fulfilled before applying a t-test or ANOVA. However, it should be noted that a one-sample t-test should fulfill only the normality assumption since only one sample is involved.

Normally distributed: Non-normally distributed variables (highly skewed or kurtotic variables or with substantial outliers) can distort relationships and significance tests. Before applying a t-test or ANOVA, it is assumed that variables are normally distributed (symmetric bell-shaped distribution).
Independent samples: If we randomly sample each set of items separately, under different conditions, the samples are independent. The measurements in one sample have no bearing on the measurements in the other sample. For example, consider randomly sampling two groups of people based on gender to test their online shopping behavior. Let's take one random sample from a group of males and record their perception and another sample from a group of females and record their perception. We know that the measurement in one sample does not affect the other sample.
Homogeneity of variances: The assumption of homogeneity of variance is an assumption of the independent samples t-test and ANOVA, which states that the variance within each population is equal. However, the independent samples t-test and ANOVA can handle this assumption if the group sizes are equal. Equal group sizes may be defined by the ratio of the largest to the smallest group being less than 1.5. If group sizes are vastly unequal, the homogeneity of variance is violated, then the result will be biased when significant sample variances are associated with small group sizes. When this occurs, the significance level will be underestimated, which can cause the null hypothesis to be falsely rejected. The result will also be biased in the opposite direction if significant variances are associated with large group sizes. This would mean that the significance level will be overestimated. This does not cause the same problems as falsely rejecting the null hypothesis; however, it can cause a decrease in the power of the test.

A test is a process for comparing sample means of different groups. While comparing the means of different groups, we frame a hypothesis in which the dependent variable is continuous, and the independent variable is categorical. For example, in determining the effect of gender on job satisfaction, gender (a categorical variable) will be considered the independent variable, and job satisfaction will be viewed as the dependent variable.

Training Score Data

The TrainingScore.csv file helps us understand KNIME usage. This dataset can be downloaded from the “Data” folder on the KNIME Community Hub.

The Evaluate Training Effectiveness - Comparative Statistics workflow can be downloaded from the KNIME Community Hub.

You are ready after downloading the CSV file from the Data folder.

Reading the CSV-based Dataset

The first step in our analysis will be to load the data for our exploratory analyses. We will do this first step using the CSV Reader node before we persist our analysis in a KNIME table.

The KNIME table is created by loading the TrainingScore.csv CSV dataset. The above table shows that the employee dataset has 220 observations and 5 columns.

Data Exploration and Visualization

We will now try to understand the dataset related to organizational training. The training dataset is created based on surveys conducted before and after training is conducted in an organization.

There are 220 observations and five columns. PostTrainingScore and PreTrainingScore are continuous columns, while Gender, Age, and Department are categorical.

The occurrences table from the Statistics node shows four unique categories for the Department column: ‘Finance’, ‘Human Resource’, ‘Marketing’, and ‘Operations’. Similarly, there are three Age groups: ’21-30’, ’41-60’, and ’31-40’, and two gender groups: ‘Male’ and ‘Female’.

Visualizing Categorical Columns

We will use the Bar Chart node to generate a count plot for the Department, Age, and Gender columns. The results clearly show that there are more female employees than male employees, most of the employees belong to the 41-60 age group, and all the departments have approximately the same number of employees.

Visualizing Continuous Columns

We will use the Line Chart node to display the PostTrainingScore and PreTrainingScore columns. The results show that the minimum scores of an employee pretraining and post-training are 10 and 12, respectively. The maximum scores of an employee pretraining and post-training are 17 and 20, respectively. The mean scores of employees pretraining and post-training are 14.13 and 16.31, respectively.

Compare Training Effectiveness Using One Sample T-Test

Objective: To determine the difference between the training score of employees and the desired score

The t-test is a statistical test that compares the sample mean of one group with a standard value. Since only one sample is involved, only the normality assumption must be fulfilled for this test.

We consider a situation in which an organization has conducted a training program for its employees. It records the scores of employees before and after training. The maximum score is considered to be 20. The organization wants its employees to score 20 after the training. The organization wants to know if there is any significant difference between the actual score of the employees after training and the desired maximum score of 20.

We will first display the observations of the PostTrainingScore column using the Histogram node with 10 bins.

The chart shows that the score after training ranges mainly between 12 and 20. The maximum number of employees achieves a score of 16-18. Skewness and kurtosis functions measure the normality of the data. The data are considered normal if skewness and kurtosis coefficients lie between -1 and +1. Since the skewness value is -0.242 and that of kurtosis is -0.63, as obtained from the Statistics node, we can say that the data are normal.

The probability plot is generated for observations in the PostTrainingScore column using the Continuous Probability Distribution component or Python View node using the “scipy.probplot” package. The chart shows that the data can be assumed normal since nearly all the dots lie on the line.

We will use the Single-sample t-test node to perform a one-sample t-test on the PostTrainingScore column with a test value of 20.

Since the normality assumption has been met, we can apply the test to the data. To determine whether the two distributions differ significantly from each other, the test that measures the probability associated with the difference between the groups may be either a one-tailed test or a two-tailed test of significance. The test examines whether the mean of one distribution differs significantly from the standard value. We reject the null hypothesis since the p-value equals 0.000, less than 0.05. This means that the value does vary substantially from the value 20.

This means that the employees' scores are significantly different from the 20. Thus, the training could not show a 100% score for all the employees. Based on the results, the organization can design strategies and act accordingly.

Compare Training Effectiveness Using Independent T-Test

Objective: To determine the difference between training scores of employees of different genders.

The t-test helps compare the population means of only two groups. Thus, it can be conducted to examine whether the population means of the two samples significantly differ. We will evaluate whether there is any difference between the scores of male and female employees.

We will use the Independent group's t-test node to perform the test on the PostTrainingScore column with Gender as the grouping column. We will also use the Density Plot node to create an interactive plot with PostTrainingScore as the dimension column and Gender (male and female) as the condition column.

We assume the data we have taken in the example above are independent. Thus, the assumption of independent samples is fulfilled. We use Levene’s test to check the homogeneity of variances. If the p-value of the test is greater than 0.05, the assumption of homogeneity of variance is met. From the preceding output, we can see that the p-value is more than the significance level of 0.05 using Levene’s test. This means that the variance across groups is statistically insignificant. Therefore, we assume the homogeneity of variances (equal variances assumed) in the different groups.

A null hypothesis is rejected if the p-value is less than 0.05 at a 5% significance level. Since, in our example, the p-value (0.1842) is greater than 0.05, the null hypothesis is not rejected. This means that the actual difference in the means of male and female groups is equal to 0, which implies that there is no significant difference in the scores of male and female groups.

Compare Training Effectiveness Using Dependent T-Test

Objective: To determine the difference between the scores of employees obtained before and after training.

A dependent (paired) t-test is used for dependent samples. If we collect two measurements on each item, person, or experimental unit, each pair of observations is closely related or matched. In this scenario, we apply the paired t-test using the Paired t-test node to compare the pre-training and post-training scores.

In our use case, we try to understand whether any significant difference occurred after the training was given. The columns PreTrainingScore and PostTrainingScore represent the scores of employees before and after the training. Since the p-value is 0.000, which is less than 0.05, we failed to reject the null hypothesis, which means that there is a significant difference in the scores of the employees after the training. This further implies that training impacts employees and can be considered helpful.

Compare Training Effectiveness Using One-Way ANOVA

Objective: To determine the difference between the training scores of employees belonging to different departments

ANOVA (analysis of variance) is a statistical method commonly used in situations where a comparison is made between more than two population means. In ANOVA, the total variation in a dataset is split into two types: the amount allocated to chance and the amount assigned to particular causes. Its basic principle is to test the variances among population means by assessing the variation within group items proportionate to the variation between groups. It should be noted that within the sample, the variance exists because of the random, unexplained disturbance. However, different reasons may cause a variance between the samples. Using this technique, we test the null hypothesis (H0), wherein all population means are the same, or the alternative hypothesis (H1), wherein at least one population mean is different.

We will now evaluate whether there is any significant difference between employees' scores in different departments.

We will first use the Color Manager node to assign specific colors for the categories in the Department column. Subsequently, we will use the Violin Plot (Plotly) node to create an interactive plot based on the PostTrainingScore grouped by Department.

We will use the One-way ANOVA node to test the PostTrainingScore column with Department as the grouping column.

The violin chart shows that the score is nearly the same for employees belonging to all departments. For the ANOVA test, the data are filtered according to the categorical variable from the primary data set and stored in different groups. According to various departments, different groups, namely ‘Finance’, ‘Human Resource’, ‘Marketing’, and ‘Operations’, are formed. The one-way ANOVA result shows that the p-value is 0.2112; hence, we failed to reject the null hypothesis. This means there is no significant difference between the scores of employees belonging to different departments after training.

Compare Training Effectiveness Using ANOVA with Tukey’s Test

Objective: To determine the difference between the training scores of employees belonging to different age groups

From the previous results of the ANOVA test, we can observe that a significant p-value indicates that some of the group means are different, but we do not know which pairs of groups are different. Multiple pairwise comparisons can also determine if the mean difference between specific pairs of groups is statistically significant. We can apply Tukey HSD (Honest Significant Difference) to perform multiple pair-wise comparisons between the means of groups from the “statsmodel.stats” package.

We will evaluate whether there is any difference between the scores of employees of different age groups. We will use the Python Script node with “statsmodels.stats.multicomp” package to test the PostTrainingScore column with Age as the grouping column. We will also use the Density Plot node to create an interactive plot with PostTrainingScore as the dimension column and Age as the condition column.

For the ANOVA test, the data are filtered according to the categorical variable from the primary dataset and stored in different groups. For example, according to different age groups, different groups, namely, first, second, and third, are formed for various age groups. The plot shows that most of the employees belonging to the 21-30 age group had a score of 18. The one-way ANOVA result from the One-way ANOVA node shows that the p-value is 0.0058; hence, we reject the null hypothesis. This means that there is a significant difference between the three groups.

Tukey’s test determines individual differences between the different groups formed. For Tukey’s test, the various groups are formed automatically after the information on categorical and continuous variables is collected. The function itself forms groups depending on the values of the categorical variable. For example, the dataset has only 21-30, 31-40, and 41-60 age groups. Tukey’s test shows that since there are three groups representing different age groups, three pairs are formed: (21-30 and 31-40), (21-30 and 41-60), and (31-40 and 41-60). The mean difference values for the three groups are -1.207, -0.957, and 0.25, respectively. The result shows that the null hypothesis is rejected for the last group but not for the first two groups.

This means there is no significant difference in the training scores of the people in age groups 31-40 and 41-60. Thus, the employees of these age groups show similar scores after training. However, the employees in age groups 21-30 and 31-40 show different scores.

Summary

In conclusion, evaluating the effectiveness of training programs through comparative statistics is crucial for organizations aiming to enhance employee performance and development. By utilizing statistical techniques such as t-tests and ANOVA, organizations can assess the impact of training across various demographic variables, including gender, age, and department. The analysis of training scores before and after training and comparisons across different groups provides valuable insights into the success of training initiatives. These insights enable organizations to tailor their training strategies to meet desired outcomes and effectively contribute to employee growth and organizational success.

Evaluate Training Effectiveness - Comparative Statistics

Table of contents