Real-Life House Price Prediction with Linear Regression
Predicting house prices is a key part of real estate analytics, and in this project, I’ll walk you through how I built a machine learning model using linear regression to predict house prices.
Project Overview
We start with a dataset that contains information such as house square footage, number of bedrooms, and location. Our task is to predict the house price using these features.
Steps Involved:
1. Data Preprocessing
The dataset contained missing values and categorical variables like location
. I handled missing values by filling them with the mean, and converted the location
feature into numerical values using one-hot encoding.
# Handle missing values and one-hot encode 'location'
df['square_footage'].fillna(df['square_footage'].mean(), inplace=True)
df['bedrooms'].fillna(df['bedrooms'].mean(), inplace=True)
df = pd.get_dummies(df, columns=['location'], drop_first=True)
2. Splitting Data
Next, I split the data into training and testing sets to evaluate the model's performance.
from sklearn.model_selection import train_test_split
X = df.drop('price', axis=1)
y = df['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3. Model Training
Using linear regression, I trained the model to predict house prices based on the available features.
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
4. Evaluation
The model's performance was evaluated using Mean Squared Error (MSE), which gives an idea of how close the predicted prices are to the actual ones.
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
5. Visualization
Here’s a plot showing actual vs predicted prices:
import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted House Prices')
plt.show()
Conclusion
This project demonstrates how linear regression can be used to predict house prices. Although the results are promising, further improvement could involve experimenting with other algorithms such as Random Forest or XGBoost for higher accuracy.
Subscribe to my newsletter
Read articles from Roemai directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Roemai
Roemai
At Roemai we are empowering individuals through education, innovation, and technology solutions with robotics, embedded systems, and AI.