Linear Regression in MATLAB: A Comprehensive Guide
Linear regression is a powerful statistical technique widely used in data analysis and machine learning. MATLAB provides robust tools and functions for performing linear regression, making it a popular choice among researchers, engineers, and data scientists. This guide explores the key aspects of linear regression in MATLAB and how it can be effectively utilized.
Understanding Linear Regression in MATLAB
Linear regression aims to model the relationship between one or more independent variables and a dependent variable by fitting a linear equation to the observed data. In MATLAB, this process is streamlined through built-in functions and toolboxes.
Types of Linear Regression
MATLAB supports various types of linear regression:
Simple Linear Regression: This involves one independent variable and one dependent variable. The general equation is:
$$Y = \beta_0 + \beta_1X + \epsilon$$
where Y is the dependent variable, X is the independent variable, β₀ is the y-intercept, β₁ is the slope, and ε is the error term.
Multiple Linear Regression: This involves multiple independent variables. The equation expands to:
$$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon$$
where X₁, X₂, ..., Xₙ are the independent variables.
Multivariate Linear Regression: This involves multiple dependent variables and their relationships with multiple independent variables. The general equations are:
Y₁ = β₀₁ + β₁₁X₁ + β₂₁X₂ + ... + βₙ₁Xₙ + ε₁
Y₂ = β₀₂ + β₁₂X₁ + β₂₂X₂ + ... + βₙ₂Xₙ + ε₂
Yₘ = β₀ₘ + β₁ₘX₁ + β₂ₘX₂ + ... + βₙₘXₙ + εₘ
where Y₁, Y₂, ..., Yₘ are the dependent variables, X₁, X₂, ..., Xₙ are the independent variables, βᵢⱼ represents the coefficient for the ith predictor in the jth equation, and ε₁, ε₂, ..., εₘ are the error terms.
Implementing Linear Regression in MATLAB
Core Functions
fitlm(): Creates a linear regression model with comprehensive statistics
mdl = fitlm(X, y); % X can be a table or matrix, y is the response
Key properties and methods:
mdl.Coefficients
: Table of coefficient estimatesmdl.Rsquared
: R² and adjusted R² valuesmdl.predict(newX)
: Make predictionsmdl.plotDiagnostics
: Generate diagnostic plots
regress(): Performs multiple linear regression using matrix operations
[b, bint, r, rint, stats] = regress(y, X);
Returns:
b
: Coefficient estimatesbint
: 95% confidence intervalsr
: Residualsrint
: Residual intervalsstats
: R², F-statistic, p-value, error variance
polyfit(): Fits polynomial models
p = polyfit(x, y, n); % n is the polynomial degree
Advanced Functions
robustfit(): Performs robust regression
[b, stats] = robustfit(X, y);
stepwiselm(): Performs stepwise regression
mdl = stepwiselm(X, y);
lasso(): Performs LASSO regression
[B, FitInfo] = lasso(X, y);
Key Steps in Linear Regression Analysis
1. Data Preparation
% Load data
data = readtable('mydata.csv');
% Handle missing values
data = rmmissing(data);
% Split features and response
X = data(:, {'feature1', 'feature2'});
y = data.response;
% Split into training and testing sets
cv = cvpartition(height(data), 'HoldOut', 0.3);
Xtrain = X(cv.training, :);
ytrain = y(cv.training);
Xtest = X(cv.test, :);
ytest = y(cv.test);
2. Model Fitting
% Fit model
mdl = fitlm(Xtrain, ytrain);
% Display model summary
disp(mdl)
3. Model Evaluation
% Calculate R-squared
Rsquared = mdl.Rsquared.Ordinary
% Calculate MSE
predictions = predict(mdl, Xtest);
MSE = mean((ytest - predictions).^2)
% Calculate RMSE
RMSE = sqrt(MSE)
% Perform residual analysis
residuals = mdl.Residuals.Raw;
4. Visualization
% Basic scatter plot with regression line
figure;
plot(mdl)
title('Regression Analysis')
xlabel('Predictor')
ylabel('Response')
% Residual plots
figure;
plotResiduals(mdl, 'histogram')
figure;
plotResiduals(mdl, 'probability')
Advanced Techniques
1. Robust Regression
Handles outliers by using weighted least squares:
% Perform robust regression
[b, stats] = robustfit(X, y);
% Compare with ordinary least squares
ols = fitlm(X, y);
disp('Robust coefficients:')
disp(b)
disp('OLS coefficients:')
disp(ols.Coefficients.Estimate)
2. Stepwise Regression
Automatically selects significant predictors:
% Perform stepwise regression
mdl = stepwiselm(X, y, 'Upper', 'quadratic', ...
'Criterion', 'aic');
% Display final model
disp(mdl)
3. Ridge Regression
Addresses multicollinearity:
% Standardize predictors
[Xstd, mu, sigma] = zscore(X);
% Perform ridge regression with cross-validation
[B, FitInfo] = lasso(Xstd, y, 'Alpha', 0, ...
'CV', 10);
% Find optimal lambda
lambda_opt = FitInfo.LambdaMinMSE;
Examples of Linear Regression in MATLAB
Example 1: Simple Linear Regression with Diagnostic Plots
% Generate sample data
x = (1:10)';
y = 2*x + 1 + randn(10,1);
% Fit model
mdl = fitlm(x, y);
% Create diagnostic plots
figure('Position', [100 100 1200 400]);
subplot(1,3,1)
plot(mdl)
title('Regression Line')
subplot(1,3,2)
plotResiduals(mdl,'histogram')
title('Residual Histogram')
subplot(1,3,3)
plotResiduals(mdl,'probability')
title('Normal Probability Plot')
Example 2: Multiple Linear Regression with Cross-Validation
% Generate sample data
n = 100;
X = [ones(n,1) randn(n,2)];
y = X*[2; 3; 4] + randn(n,1);
% Perform k-fold cross-validation
k = 5;
cv = cvpartition(n, 'KFold', k);
mse = zeros(k,1);
for i = 1:k
% Get training and testing indices
trainIdx = training(cv, i);
testIdx = test(cv, i);
% Fit model and calculate MSE
mdl = fitlm(X(trainIdx,:), y(trainIdx));
yPred = predict(mdl, X(testIdx,:));
mse(i) = mean((y(testIdx) - yPred).^2);
end
% Display average MSE
fprintf('Average MSE across folds: %.4f\n', mean(mse))
Example 3: Multivariate Linear Regression
% Generate sample data
X = randn(100, 3); % 3 predictors
Y = X*[1 2; 3 4; 5 6] + randn(100, 2); % 2 responses
% Fit separate models for each response
mdl1 = fitlm(X, Y(:,1));
mdl2 = fitlm(X, Y(:,2));
% Display results
disp('Model for Response 1:')
disp(mdl1)
disp('Model for Response 2:')
disp(mdl2)
% Create side-by-side plots
figure('Position', [100 100 800 400]);
subplot(1,2,1)
plot(mdl1)
title('Response 1')
subplot(1,2,2)
plot(mdl2)
title('Response 2')
Statistical Analysis and Model Diagnostics
1. Coefficient Analysis
% Get coefficient table
coefTable = mdl.Coefficients;
% Extract p-values
pValues = coefTable.pValue;
% Calculate confidence intervals
confInt = coefTable{:, {'Lower', 'Upper'}};
2. ANOVA Analysis
% Perform ANOVA
anova_table = anova(mdl);
disp(anova_table)
3. Model Selection Metrics
% Calculate AIC and BIC
aic = mdl.ModelCriterion.AIC;
bic = mdl.ModelCriterion.BIC;
% Calculate adjusted R-squared
adjRsq = mdl.Rsquared.Adjusted;
4. Residual Diagnostics
% Calculate standardized residuals
stdRes = mdl.Residuals.Standardized;
% Calculate Cook's distances
cooksDist = mdl.Diagnostics.CooksDistance;
% Create residual plots
figure('Position', [100 100 1200 400]);
subplot(1,3,1)
plotResiduals(mdl, 'fitted')
title('Residuals vs. Fitted')
subplot(1,3,2)
plotResiduals(mdl, 'lagged')
title('Residual Autocorrelation')
subplot(1,3,3)
plot(cooksDist, 'o')
title('Cook''s Distance')
xlabel('Observation')
ylabel('Cook''s Distance')
Best Practices and Tips
Data Preprocessing
Always check for missing values and outliers
Consider standardizing predictors for better numerical stability
Check for multicollinearity using VIF (Variance Inflation Factor)
Model Selection
Use stepwise regression for large number of predictors
Consider cross-validation for model validation
Compare different models using AIC/BIC
Diagnostic Checks
Always check residual plots
Look for influential observations using Cook's distance
Verify assumptions of linearity, normality, and homoscedasticity
Reporting Results
Include coefficient estimates with confidence intervals
Report R-squared and adjusted R-squared
Include relevant diagnostic plots
Subscribe to my newsletter
Read articles from HowAiWorks directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by