Linear Regression in MATLAB: A Comprehensive Guide

HowAiWorksHowAiWorks
6 min read

Linear regression is a powerful statistical technique widely used in data analysis and machine learning. MATLAB provides robust tools and functions for performing linear regression, making it a popular choice among researchers, engineers, and data scientists. This guide explores the key aspects of linear regression in MATLAB and how it can be effectively utilized.

Understanding Linear Regression in MATLAB

Linear regression aims to model the relationship between one or more independent variables and a dependent variable by fitting a linear equation to the observed data. In MATLAB, this process is streamlined through built-in functions and toolboxes.

Types of Linear Regression

MATLAB supports various types of linear regression:

  1. Simple Linear Regression: This involves one independent variable and one dependent variable. The general equation is:

    $$Y = \beta_0 + \beta_1X + \epsilon$$

    where Y is the dependent variable, X is the independent variable, β₀ is the y-intercept, β₁ is the slope, and ε is the error term.

  2. Multiple Linear Regression: This involves multiple independent variables. The equation expands to:

    $$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon$$

    where X₁, X₂, ..., Xₙ are the independent variables.

  3. Multivariate Linear Regression: This involves multiple dependent variables and their relationships with multiple independent variables. The general equations are:

    1. Y₁ = β₀₁ + β₁₁X₁ + β₂₁X₂ + ... + βₙ₁Xₙ + ε₁

    2. Y₂ = β₀₂ + β₁₂X₁ + β₂₂X₂ + ... + βₙ₂Xₙ + ε₂

    3. Yₘ = β₀ₘ + β₁ₘX₁ + β₂ₘX₂ + ... + βₙₘXₙ + εₘ

where Y₁, Y₂, ..., Yₘ are the dependent variables, X₁, X₂, ..., Xₙ are the independent variables, βᵢⱼ represents the coefficient for the ith predictor in the jth equation, and ε₁, ε₂, ..., εₘ are the error terms.

Implementing Linear Regression in MATLAB

Core Functions

  1. fitlm(): Creates a linear regression model with comprehensive statistics

     mdl = fitlm(X, y);  % X can be a table or matrix, y is the response
    

    Key properties and methods:

    • mdl.Coefficients: Table of coefficient estimates

    • mdl.Rsquared: R² and adjusted R² values

    • mdl.predict(newX): Make predictions

    • mdl.plotDiagnostics: Generate diagnostic plots

  2. regress(): Performs multiple linear regression using matrix operations

     [b, bint, r, rint, stats] = regress(y, X);
    

    Returns:

    • b: Coefficient estimates

    • bint: 95% confidence intervals

    • r: Residuals

    • rint: Residual intervals

    • stats: R², F-statistic, p-value, error variance

  3. polyfit(): Fits polynomial models

     p = polyfit(x, y, n);  % n is the polynomial degree
    

Advanced Functions

  1. robustfit(): Performs robust regression

     [b, stats] = robustfit(X, y);
    
  2. stepwiselm(): Performs stepwise regression

     mdl = stepwiselm(X, y);
    
  3. lasso(): Performs LASSO regression

     [B, FitInfo] = lasso(X, y);
    

Key Steps in Linear Regression Analysis

1. Data Preparation

% Load data
data = readtable('mydata.csv');

% Handle missing values
data = rmmissing(data);

% Split features and response
X = data(:, {'feature1', 'feature2'});
y = data.response;

% Split into training and testing sets
cv = cvpartition(height(data), 'HoldOut', 0.3);
Xtrain = X(cv.training, :);
ytrain = y(cv.training);
Xtest = X(cv.test, :);
ytest = y(cv.test);

2. Model Fitting

% Fit model
mdl = fitlm(Xtrain, ytrain);

% Display model summary
disp(mdl)

3. Model Evaluation

% Calculate R-squared
Rsquared = mdl.Rsquared.Ordinary

% Calculate MSE
predictions = predict(mdl, Xtest);
MSE = mean((ytest - predictions).^2)

% Calculate RMSE
RMSE = sqrt(MSE)

% Perform residual analysis
residuals = mdl.Residuals.Raw;

4. Visualization

% Basic scatter plot with regression line
figure;
plot(mdl)
title('Regression Analysis')
xlabel('Predictor')
ylabel('Response')

% Residual plots
figure;
plotResiduals(mdl, 'histogram')
figure;
plotResiduals(mdl, 'probability')

Advanced Techniques

1. Robust Regression

Handles outliers by using weighted least squares:

% Perform robust regression
[b, stats] = robustfit(X, y);

% Compare with ordinary least squares
ols = fitlm(X, y);
disp('Robust coefficients:')
disp(b)
disp('OLS coefficients:')
disp(ols.Coefficients.Estimate)

2. Stepwise Regression

Automatically selects significant predictors:

% Perform stepwise regression
mdl = stepwiselm(X, y, 'Upper', 'quadratic', ...
    'Criterion', 'aic');

% Display final model
disp(mdl)

3. Ridge Regression

Addresses multicollinearity:

% Standardize predictors
[Xstd, mu, sigma] = zscore(X);

% Perform ridge regression with cross-validation
[B, FitInfo] = lasso(Xstd, y, 'Alpha', 0, ...
    'CV', 10);

% Find optimal lambda
lambda_opt = FitInfo.LambdaMinMSE;

Examples of Linear Regression in MATLAB

Example 1: Simple Linear Regression with Diagnostic Plots

% Generate sample data
x = (1:10)';
y = 2*x + 1 + randn(10,1);

% Fit model
mdl = fitlm(x, y);

% Create diagnostic plots
figure('Position', [100 100 1200 400]);

subplot(1,3,1)
plot(mdl)
title('Regression Line')

subplot(1,3,2)
plotResiduals(mdl,'histogram')
title('Residual Histogram')

subplot(1,3,3)
plotResiduals(mdl,'probability')
title('Normal Probability Plot')

Example 2: Multiple Linear Regression with Cross-Validation

% Generate sample data
n = 100;
X = [ones(n,1) randn(n,2)];
y = X*[2; 3; 4] + randn(n,1);

% Perform k-fold cross-validation
k = 5;
cv = cvpartition(n, 'KFold', k);
mse = zeros(k,1);

for i = 1:k
    % Get training and testing indices
    trainIdx = training(cv, i);
    testIdx = test(cv, i);

    % Fit model and calculate MSE
    mdl = fitlm(X(trainIdx,:), y(trainIdx));
    yPred = predict(mdl, X(testIdx,:));
    mse(i) = mean((y(testIdx) - yPred).^2);
end

% Display average MSE
fprintf('Average MSE across folds: %.4f\n', mean(mse))

Example 3: Multivariate Linear Regression

% Generate sample data
X = randn(100, 3);  % 3 predictors
Y = X*[1 2; 3 4; 5 6] + randn(100, 2);  % 2 responses

% Fit separate models for each response
mdl1 = fitlm(X, Y(:,1));
mdl2 = fitlm(X, Y(:,2));

% Display results
disp('Model for Response 1:')
disp(mdl1)
disp('Model for Response 2:')
disp(mdl2)

% Create side-by-side plots
figure('Position', [100 100 800 400]);

subplot(1,2,1)
plot(mdl1)
title('Response 1')

subplot(1,2,2)
plot(mdl2)
title('Response 2')

Statistical Analysis and Model Diagnostics

1. Coefficient Analysis

% Get coefficient table
coefTable = mdl.Coefficients;

% Extract p-values
pValues = coefTable.pValue;

% Calculate confidence intervals
confInt = coefTable{:, {'Lower', 'Upper'}};

2. ANOVA Analysis

% Perform ANOVA
anova_table = anova(mdl);
disp(anova_table)

3. Model Selection Metrics

% Calculate AIC and BIC
aic = mdl.ModelCriterion.AIC;
bic = mdl.ModelCriterion.BIC;

% Calculate adjusted R-squared
adjRsq = mdl.Rsquared.Adjusted;

4. Residual Diagnostics

% Calculate standardized residuals
stdRes = mdl.Residuals.Standardized;

% Calculate Cook's distances
cooksDist = mdl.Diagnostics.CooksDistance;

% Create residual plots
figure('Position', [100 100 1200 400]);

subplot(1,3,1)
plotResiduals(mdl, 'fitted')
title('Residuals vs. Fitted')

subplot(1,3,2)
plotResiduals(mdl, 'lagged')
title('Residual Autocorrelation')

subplot(1,3,3)
plot(cooksDist, 'o')
title('Cook''s Distance')
xlabel('Observation')
ylabel('Cook''s Distance')

Best Practices and Tips

  1. Data Preprocessing

    • Always check for missing values and outliers

    • Consider standardizing predictors for better numerical stability

    • Check for multicollinearity using VIF (Variance Inflation Factor)

  2. Model Selection

    • Use stepwise regression for large number of predictors

    • Consider cross-validation for model validation

    • Compare different models using AIC/BIC

  3. Diagnostic Checks

    • Always check residual plots

    • Look for influential observations using Cook's distance

    • Verify assumptions of linearity, normality, and homoscedasticity

  4. Reporting Results

    • Include coefficient estimates with confidence intervals

    • Report R-squared and adjusted R-squared

    • Include relevant diagnostic plots

0
Subscribe to my newsletter

Read articles from HowAiWorks directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

HowAiWorks
HowAiWorks