Evaluating Job Satisfaction - Association Rule Mining

VijaykrishnaVijaykrishna
10 min read

Introduction

Job or employee satisfaction, an unquantifiable metric, is a positive emotional response. It is a measure of how contented workers are with their jobs. It is important to remember that job satisfaction varies from employee to employee. In the same workplace under the same conditions, the factors that help one employee feel good about their job may not apply to another employee. It is essential to have a multidimensional approach for measuring employee satisfaction. The parameters that should be considered for measuring job satisfaction include the challenging nature of work, flexible hours, regular appreciation, competitive pay, nature of the job, supervisor attitude, etc. This article focuses on determining the association of different levels of job satisfaction with different parameters. Association rule mining is adopted using the apriori algorithm to assess job satisfaction associations. The techniques discussed can be used to determine the association of different parameters according to the requirements of the HR department, such as attrition, performance levels, promotion, quality of work, employee attitude, etc.

Unsupervised Machine Learning

Unsupervised machine learning algorithms are used when the output is unknown, and no predefined instructions are available to the learning algorithms. In unsupervised learning, the learning algorithm only has input data, and knowledge is extracted from these data. These algorithms create a new representation of the data that is easier to comprehend than the original data and help improve the accuracy of advanced algorithms by consuming less time and reducing memory. Common unsupervised machine learning algorithms include association rule mining, dimensionality reduction algorithms, and clustering.

Association Rule

Association rule mining is a data-mining technique that originated in the field of marketing and, more recently, has been used effectively in other fields, such as human resource management, organizational behavior, entrepreneurship, strategic management, bioinformatics, nuclear science, geophysics, etc. The goal is to identify relationships (i.e., association rules) between groups of products, items, factors, or categories, leading to insights into substantive management domains. The association rule mining is predominantly done using the apriori algorithm.

Association rules can be considered an IF-THEN relationship. Suppose factor A is the employee’s choice, and then the chances of factor B being considered by the employee are determined. These are two elements of these rules:

  • Antecedent (IF): This factor is generally related to employee information, such as evaluation score, number of projects, training programs, work accident, department, etc.

  • Consequent (THEN): This comes along as a factor with an antecedent/group of antecedents. For example, job satisfaction is the consequence considering the number of projects, evaluation score, etc., as antecedents. Thus, many factors can be considered to be associated with the consequent (job satisfaction).

Job Satisfaction Data

The JobSatisfaction.csv file helps us understand KNIME usage. This dataset can be downloaded from the “Data” folder on the KNIME Community Hub.

The Evaluating Job Satisfaction – Association Rule Mining workflow can be downloaded from the KNIME Community Hub.

You are ready after downloading the CSV file from the Data folder.

Reading the CSV-based dataset

The first step in our analysis will be to load the data for our exploratory analyses. We will do this first step using the CSV Reader node before we persist our analysis in a KNIME table.

The KNIME table is created by loading the JobSatisfaction.csv CSV dataset. The above table shows that the employee dataset has 14999 observations and 10 columns.

Data Visualization

We will use the Python View node to generate multiple count plots for the LeftCompany, Salary, TimeSpendCompany, PromotionLast5years, NumberProject, and WorkAccident categorical columns.

import knime.scripting.io as knio

# Visualization of employee details
# Importing libraries
import seaborn as sns
import matplotlib.pyplot as plt
from io import BytesIO

# Plotting 
fig = plt.figure(figsize=(12, 10)) 

#Count plot for displaying number of employees who left/stayed
plt.subplot(231) 
sns.countplot(x='LeftCompany', data=knio.input_tables[0].to_pandas(), hue='LeftCompany', legend=False, palette="viridis") 
plt.title("Left Company")

#Count plot for displaying number of employees with different salaries
plt.subplot(232)
sns.countplot(x='Salary', data=knio.input_tables[0].to_pandas(), hue='Salary', legend=False, palette="mako")
plt.title("Salary-wise Employees")

#Count plot according to number of years spent in organization
plt.subplot(233)
sns.countplot(x='TimeSpendCompany', data=knio.input_tables[0].to_pandas(), hue='TimeSpendCompany', legend=False, palette="vlag")
plt.title("Time Spent")

#Count plot for displaying number of employees according to promotion
plt.subplot(234)
sns.countplot(x='PromotionLast5years', data=knio.input_tables[0].to_pandas(), hue='PromotionLast5years', legend=False, palette="rainbow")
plt.title("Employees Promotions")

#Count plot for displaying number of employees for number of projects
plt.subplot(235)
sns.countplot(x='NumberProject', data=knio.input_tables[0].to_pandas(), hue='NumberProject', legend=False, palette="terrain")
plt.title("Employees by Number of Projects")

#Count plot for displaying number of employees for work accident
plt.subplot(236)
sns.countplot(x='WorkAccident', data=knio.input_tables[0].to_pandas(), hue='WorkAccident', legend=False, palette="prism")
plt.title("Work Accident")

# Create buffer to write into
buffer = BytesIO()

# Create plot and write it into the buffer
fig.savefig(buffer, format='svg')

# The output is the content of the buffer
output_image = buffer.getvalue()

# Assign the figure to the output_view variable
knio.output_view = knio.view(fig)  # alternative: knio.view_matplotlib()

We can observe from the chart that employees leaving, promotion, and work accidents are binary variables. Nearly 30% of the employees have left the organization; very few employees have been given promotions in the organization in the last 5 years, and accidents have rarely happened at the workplace. More employees belonged to the low-salary group, more employees spent 3 years in the organization, and a maximum number of employees had three or four projects.

We will use the Bar Chart node to generate the occurrence count plot for the categories in the Department column.

The chart shows that the sales department had the most employees, followed by the technical and support departments. The accounting, HR, and management departments had the fewest employees.

From the output of the Statistics node, we can observe that

  • Two columns SatisfactionLevel (92 unique values) and LastEvaluation (65 unique values), are continuous,

  • Three columns, WorkAccident, LeftCompany, and PromotionLast5years, are binary,

  • Two columns, Department (10 categories) and Salary (3 categories), are categorical,

  • Three columns, AverageMonthlyHours (215 unique values), NumberProject (6 unique values), and TimeSpendCompany (8 unique values) are integers.

Converting Continuous Columns to Categorical Using Binning

Since the apriori algorithm works on categorical columns, converting each column into a categorical one is essential. We will use the Auto-Binner node to convert the SatisfactionLevel, LastEvaluation, and AverageMonthlyHours columns into categorical columns by splitting them into three equal bins based on the frequency of the observations in the column. Subsequently, we will use the Rule Engine node to assign meaningful category values (low, avg, high) to columns for easy analysis and interpretation.

Dummy Encoding

Note that the apriori algorithm only accepts Boolean values. Hence, converting each category into different columns represented by binary values (0, 1) is essential. We will use the One to Many node to transform all possible category values in the selected columns into a new column. It is worth noting that the number of new columns created for a particular column is the same as the number of categories in the column. Subsequently, we will use the Column Filter node to retain only the necessary columns for further analysis.

The resultant dataset now has 14999 observations and 40 columns; all added columns have Boolean (0,1) values.

Apriori Algorithm for Association Rule Mining

Using the Python Script node, the apriori algorithm is executed considering the apriori() and association_rules () functions available in the “mlxtrend” library. The apriori algorithm helps to determine the frequent factors based on their minimum support and maximum length. These factors are considered input to the association_rules() function with some conditions, and the data are filtered. The support and length of the factors play an essential role in the apriori algorithm.

Syntax of apriori()

Apriori(data, use_colnames=, min_support=, max_len=)

where

  • data represents the data on which the algorithm is executed

  • use_colnames has a Boolean value representing whether colnames are considered or not

  • min_support is the minimum support required and has a value between 0 and 1

  • max_len specifies the maximum number of factors that will be considered

Syntax of association_rules()

Association_rules(freq_fact, metric=, min_threshold=)

where

  • freq_fact are frequent factors that are determined

  • metric specifies the measurement that needs to be considered

  • min_threshold specifies the threshold value that should be considered

The code apriori(df, use_colnames=True, min_support=0.1, max_len=3) considers minimum support as 0.1 and maximum length of factors as 3. This means that until and unless the frequency factors are not 10%, these factors will not be considered for further analysis. This is a correct approach because considering factors that do not have an association is meaningless. At the same time, it is not preferred to have 500 factors left after filtering based on support. The lift metric is used as the filter criterion to avoid such cases. This is done using the association_rules() function, considering metric as a lift and minimum threshold as 1. Subsequently, we will use the Sorter node to order the resultant association rules in descending order of “confidence”.

Identify the Association of Low Satisfaction Using Apriori Algorithm

Objective: To identify the association of low satisfaction level with different factors

We will determine the association between low satisfaction level and the factors of association. It is worth noting that the rule has more impact if there is more lift. The higher the value of the lift, the more the association. We will use the Row Filter node with the consequents column equal to “Low_SatisfactionLevel” as the filter criterion.

A lift value greater than one means that all the antecedents displayed in different records are associated with the consequent (Low_SatisfactionLevel). Also, a confidence greater than 0.5 indicates that the occurrence of the records is greater than 50%. The first record indicates that the combination of a number of projects equal to 2 and low average monthly hours is associated with a low satisfaction level. The results also suggest that the number of projects is equal to 2, the time spent in the company is 3 years, and the last evaluation is low, which are some critical factors for low satisfaction. Thus, to increase satisfaction, the organization should change the number of projects and design unique strategies for employees who have completed 3 years. Steps should also be considered for employees whose last evaluation score is low.

Identify the Association of Average Satisfaction Using Apriori Algorithm

Objective: To identify the association of average satisfaction level with different factors

We will determine the association between average satisfaction level and the factors of association. As the filter criterion, we will use the Row Filter node with the consequents column equal to “Avg_SatisfactionLevel.”

A lift value greater than one means that all the antecedents displayed in different records are associated with the consequent (Avg_SatisfactionLevel). Also, a confidence greater than 0.4 indicates that the occurrence of the records is greater than 40%. The first record indicates that the number of projects equal to 3 is highly associated with an average satisfaction level. The results also suggest that the number of projects is equal to 3, there has been no promotion in the last five years, and the monthly hours belong to the average category, which are some critical factors for average satisfaction. Thus, to increase satisfaction, the organization should change the number of projects and design unique strategies for employees who have not been promoted. Steps should also be considered for employees whose monthly hours are average.

Identify the Association of High Satisfaction Using Apriori Algorithm

Objective: To identify the association of high satisfaction level with different factors

We will determine the association between high satisfaction level and the factors of association. As the filter criterion, we will use the Row Filter node with the consequents column equal to “High_SatisfactionLevel.”

A lift value greater than one means that all the antecedents displayed in different records are associated with the consequent (High_SatisfactionLevel). Also, a confidence greater than 0.38 indicates that the occurrence of the records is greater than 38%. The first record indicates that the number of projects equals four and that no workplace accidents are highly associated with high satisfaction. The results also suggest that the number of projects is equal to 4, the last evaluation score is high, and there are no workplace accidents, which are some critical factors for high satisfaction. Thus, to increase satisfaction, the organization should change the number of projects to four and formulate strategies to ensure the last evaluation score is high and no workplace accidents.

Summary

In conclusion, evaluating job satisfaction through association rule mining provides valuable insights into the factors influencing employee contentment. By utilizing the apriori algorithm, organizations can identify key associations between job satisfaction levels and various workplace parameters. This analysis helps in understanding the critical factors that contribute to low, average, and high satisfaction levels among employees. By addressing these factors, organizations can implement targeted strategies to enhance job satisfaction, leading to improved employee retention, performance, and overall workplace morale. The approach outlined in this article offers a systematic method for HR departments to assess and improve job satisfaction, ultimately contributing to a more positive and productive work environment.

0
Subscribe to my newsletter

Read articles from Vijaykrishna directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vijaykrishna
Vijaykrishna

I’m a data science enthusiast who loves to build projects in KNIME and share valuable tips on this blog.