Machine Learning Chapter 2.3: Polynomial Regression

Fatima JannetFatima Jannet
5 min read

Welcome back to Machine Learning! Today we’ll talk about Polynomial Regression

Polynomial Regression Intuition

If you look carefully, polynomial regression is very similar to the multiple linear regression. But, instead of different variable, x1 is different in power. So that’s something we are going to learn.

If you see an observation, like this, you are almost sure that this is going to be a linear regression.

What if the dataset is tilted, like a slope? A linear regression doesn't fit well. In the middle, data is below the line, and as you go up, data goes above the line.

In this case, instead of a linear regression, we can generate a curve which perfectly fits the data set. In the equation, it’s the b2X1² which is giving the parabolic effect.

So instead of using linear, try polynomial too, as it might fit better. For spreading diseases or epidemics across areas or populations, polynomial regression can be useful. It's always about finding what works best, so it's good to try different methods and have more tools available.

(Linear, non linear name refers to the coefficients)

Python Implementation

Resources

  1. Colab file: https://colab.research.google.com/drive/1Oco5Yankkx16SPdL4Dl8YwmEfw3PmZv0

  2. Data sheet: https://drive.google.com/file/d/1-FwufMp4sI2yL2zCW_DHLoSMekZHXGNz/view

  3. Data preprocessing template: https://colab.research.google.com/drive/17Rhvn-G597KS3p-Iztorermis__Mibcz

Let's talk about this data. Imagine you're in the HR department and you want to hire someone. You find a candidate who fits the job really well. After the interview, the common question comes up: "What is your salary expectation?" Let's say the person is experienced and asks for $160,000 per year. As a negotiator, you ask why he expects such a high salary. He replies, "I received the same amount at my previous company, so I expect at least $160,000 from your company." Now, is this true or just a bluff? We're going to find out using a polynomial regression model. We will build a polynomial regression model to predict his previous salary.

The data you have includes salaries from a previous company for various positions, ranging from business analysts to the CEO. We collected this data from websites that show salaries for different roles. Now, we need to determine his previous position. That's simple. Let's say we went to LinkedIn, checked his profile, and found that this person was a regional manager. We also saw that he has been a regional manager for about 2 years. So, you know his salary should not be $150,000 (as shown in our data sheet) and should be between $150,000 and $200,000. We can estimate his position level as between six and seven, so we'll consider it as 6.5. Now we can deploy our model. After that, we can compare the predicted salary to the salary expected by the person to see if it’s true or a bluff.

Code

Importing

First, we are going to build our linear regression model because we have only one feature, the position levels, and the dependent variable, which is the salary to predict.

in x, we have left the first column cause the name of the positions were looking good. Also, it’s a redundant column, there’s no need to take them.

Training the Linear Regression model on the whole dataset

We didn't split the dataset into a training set and a test set because we want to use all the data to train our model. So, we'll use the entire matrix of features x and the whole dependent variable vector y. [Fit method trains the model on the data]

Training the Polynomial Regression model on the whole dataset

Visualizing the Linear Regression results

Now, we will call the matplotlib.pyplot module; which has a shortcut “plt”, from where we’ll call scatter function. First, we will plot the the real position levels going from 1 to 10 and the real salaries. And then we'll plot the predictions.

So this is the linear regression plot.

Visualizing the Polynomial Regression results

If we compare the points where we previously had issues with the linear regression model, we can clearly see that the problem is resolved. The predictions on this blue curve are much closer to the actual salaries, and this is achieved with just N equals two. With higher power, like N = 3,4; the result are going to be even better

Change the degree to 4 and run the polynomial regression model. As you can see, it’s already a perfect fit

Visualizing the Polynomial Regression results (for higher resolution and smoother curve)

For the sake of smoother curve (best fit), copy the code from the file and paste and run. (it’s not mandatory)

Predicting a new result with Linear Regression

Now the final step which is predicting the salary of position 6.5 with both linear regression and polynomial regression.

If you add a single pair of square brackets, it creates a list or a vector. To create an array, which has multiple dimensions, you need a double pair of square brackets. The first pair corresponds to the rows, and the second pair corresponds to the columns.

However, let’s run (Before we execute, remember that the person asked for 160k, which he justified by saying he received it from his previous company.)

With the linear regression model, it predicts a salary of $330,000 per year. If we use this model to negotiate with that person, it would seem very strange, right? This predicted salary is much higher than what the person earned at their previous company. So, it's clear that this prediction is incorrect, as we can see on the graph above, which shows the results of the linear regression model.

So, that’s a really bad prediction.

Predicting a new result with Polynomial Regression

Now we will try to predict with polynomial regression (the good predictor)

Wow! The predicted salary is $158,000, actually $159,000, which is really close to what the person said they earned before. Now, we can feel totally confident about hiring this person because they are not only a great fit for the job but also very honest. We figured this out using polynomial regression. Congratulations, we've cracked this case study! Not only did we solve it, but you also learned how to build a polynomial regression model.

Our next blog will be on Support Vector Regression (SVR)

Enjoy Machine Learning!

0
Subscribe to my newsletter

Read articles from Fatima Jannet directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Fatima Jannet
Fatima Jannet