Problem Statement: Why Predict Job Applicants on LinkedIn?

The job market today is more competitive than ever before, and understanding job application trends can play a crucial role in making smarter career decisions. For students and job seekers, predicting job applications on platforms like LinkedIn offers valuable insights for several reasons:

Analyzing & Understanding the Competition: If two similar roles appear, one with zero applicants and another with 120, it’s evident which job provides a better chance. By analyzing applicant counts, students can gauge the level of competition for a given job.
Personalized Studies: Predictive insights help students target job opportunities with lower competition. This allows students to apply to relevant jobs rather than blindly submitting applications to all available roles.
Smarter Skill Development: By analyzing which skills (e.g., machine learning, Tableau) attract more applicants, students can align their learning efforts with current market demands.
Increased Awareness for Better Fit: Predictive insights help students find companies with recurring internship opportunities or those that match their skills and values, increasing their chances of success.

By leveraging these predictions, students can make more informed decisions about where to apply, what skills to learn, and which companies to target. In short, predictive job application insights empower students to make smarter, data-driven career choices.

https://www.kaggle.com/code/surav12/linkden-prediction

Dataset Description

The dataset used for this project consists of over 800 job listings scraped from LinkedIn. Each job posting includes various features related to the company, the job role, and the skills required for the position. Here are the key attributes in the dataset:

Company Information: Includes company name, industry, employee count, and LinkedIn followers.
Job Details: Includes job designation, location, seniority level, and employment type (e.g., full-time, part-time).
Required Skills: Binary indicators for technical skills such as ReactJS, AI, Tableau, NodeJS, Power BI, etc.
Target Variable: The total number of applicants (Total_applicants) for the job posting, which is the value we aim to predict.

The goal of this project was to predict the total number of applicants (Total_applicants) for a job posting based on various company and job-related features.

Problem Type: Regression

This is a regression problem, where the target variable, Total_applicants, is a continuous numerical value (the number of applicants).

1. Exploratory Data Analysis (EDA)

Before diving into the model-building process, I first performed some exploratory data analysis (EDA) to understand the distribution of features and their relationship with the target variable:

Distribution of Total_applicants: The target variable was highly skewed. Many job postings had very few applicants, while some attracted a large number of applicants. This imbalance posed challenges for model performance.
Correlation Analysis: I examined the correlation between Total_applicants and other features like company size (e.g., LinkedIn_Followers, Employee_count), job role details, and required skills. Some interesting patterns emerged, such as certain high-demand job roles in the tech industry (e.g., ReactJS, AI roles) attracting more applicants.

2. Data Preprocessing

Data preprocessing is a crucial step to prepare the dataset for machine learning models. Here’s how I processed the data:

Understanding Data Types (dtypes)
Descriptive Statistics (desc)
Handling Missing Values (isnull() and isna())
Binning Values (bincount())
Identifying and Handling Duplicates
Data Scaling and Normalization
Encoding Categorical Variables

3. Modeling

I experimented with various regression models to predict the number of applicants. Below is a summary of the models tested:

3.1 Support Vector Regressor (SVR)

I began with the Support Vector Regressor (SVR), known for handling non-linear relationships. However, SVR struggled with this dataset, and the results were not impressive:

MAE: 17.59
MSE: 788.61
RMSE: 28.08
R² Score: -0.0030

The negative R² score indicates that SVR underperformed compared to even a simple horizontal mean line.

3.2 K-Nearest Neighbors Regressor (KNN)

Next, I tried K-Nearest Neighbors (KNN), which is a simple yet effective model for regression tasks. It showed slight improvement over SVR but still lacked predictive power:

MAE: 17.81
MSE: 645.88
RMSE: 25.41
R² Score: 0.1785

3.3 Random Forest Regressor (RF)

The Random Forest Regressor (RF) performed the best among the models tested. Random forests are ensemble models that perform well with both numerical and categorical features:

MAE: 16.05
MSE: 595.85
RMSE: 24.41
R² Score: 0.2421

Although the R² score was still low, it was the highest among all models, suggesting that Random Forest was able to capture some of the underlying patterns in the data.

4. Evaluation Metrics

For this regression task, I used the following metrics to evaluate model performance:

Mean Absolute Error (MAE): Represents the average absolute difference between predicted and actual values.
Mean Squared Error (MSE): Measures the average of the squared errors, giving more weight to larger errors.
Root Mean Squared Error (RMSE): Provides a measure of the magnitude of the error in the same units as the target variable.
R² Score: Indicates how well the model explains the variance in the target variable. A higher R² score signifies a better model fit.

Challenges and Insights

Despite trying different models, the prediction accuracy was limited. Key observations include:

Model Limitations: Even the best-performing model (Random Forest) had modest predictive power, suggesting that the data might be noisy, or important variables (like job post age, salary range, or job description content) were missing.
Importance of Skill Features: Binary skills like AI, ReactJS, and Tableau played a noticeable role in distinguishing job postings, but adding more context (e.g., job descriptions) could improve predictions.
Data Sparsity: The distribution of applicants was highly skewed, making it harder for the models to generalize well. Many jobs had only a few applicants, while others attracted a lot, causing imbalance in the data.

Deep Learning Models

To improve the model’s performance, I also explored deep learning techniques by building a neural network model. Here’s a summary of the deep learning model:

Optimizer: Adam
Loss Function: Mean Squared Error (MSE)
Evaluation Metric: Mean Absolute Error (MAE)
Epochs: 30

Training Results:

MAE: 15.77
MSE: 496.52
R² Score: 0.2043

While the neural network performed better than earlier models, the R² score remained low, indicating the model could capture some trends but still had room for improvement.

Future Work and Improvements

To enhance the model's performance, I plan to explore the following approaches:

Feature Engineering: Adding new features, such as the job post’s age, salary range, or detailed job descriptions, could help the model capture more relevant patterns.
Hyperparameter Tuning: Tuning hyperparameters like learning rate, batch size, and model architecture could potentially improve model accuracy.
Complex Models: Exploring more sophisticated models such as Long Short-Term Memory (LSTM) networks or Attention Mechanisms might allow the model to understand sequential trends and market dynamics better.

Conclusion

Predicting the number of applicants for LinkedIn job listings is a challenging task due to the complexity and dynamic nature of the job market. Despite the limited predictive accuracy, this project has provided valuable insights into the factors influencing job application trends. With more data, feature engineering, and advanced modeling techniques, it’s possible to build more robust recruitment intelligence tools that can help students and job seekers make better, data-driven decisions.

By understanding the nuances of job demand prediction, students can align their career efforts more effectively, avoid saturated job markets, and focus on opportunities that best match their skillset. The potential of predictive analysis in the job market is vast, and future advancements in AI and machine learning could make such tools even more impactful.

https://www.kaggle.com/code/surav12/linkden-prediction

Published by: Saurav Rai | Hashnode / Personal Website

SVR to Neural Nets: What I Learned from Predicting LinkedIn Job Demand

Subscribe to my newsletter

Saurav_XR

Saurav_XR