underfitting and overfitting in machine learning

Error, in the context of machine learning, is the overall discrepancy between the predicted values of the model and the actual values in the dataset. It encompasses both bias and variance and is a crucial metric for assessing model performance.

Bias :

Bias refers to the training error when you use overly simplified model to train the model that lacks to understand the complexity of data. It is also called Underfitting. A high-bias model tends to oversimplify the data and make strong assumptions, leading to systematic errors and underfitting.

Strategies to address Bias :

Increase Model Complexity: Use more sophisticated algorithms or increase the complexity of existing models to better capture the underlying patterns in the data.
Feature Engineering: Introduce additional features or transform existing ones to provide the model with more information to learn from.
Reduce Regularization: Relax constraints imposed by regularization techniques to allow the model to fit the training data more closely.
Gather More Data: Increasing the size of the training dataset can provide the model with more examples to learn from and help reduce bias.

Variance :

On the other hand, variance refers to variation in model prediction on different data points. It represents model’s sensitivity towards change from the training data. It captures the noise in the data instead of underlying patterns it leads to Overfitting. This leads to high accuracy on the training data but very low accuracy on the unseen data.

Strategies to address variance :

Simplify the Model: Reduce the complexity of the model to make it less sensitive to noise and focus on capturing the underlying patterns.
Regularization: Introduce regularization techniques such as L1 or L2 regularization to penalize large model weights and prevent overfitting.
Cross-Validation: Use techniques like k-fold cross-validation to assess the model's performance on multiple subsets of the data and identify overfitting.
Feature Selection: Identify and remove irrelevant or redundant features that may contribute to overfitting.

Finding the balance :

The goal is to finding the perfect balance between both the outcomes and training a model that is free of both errors. The balance involves making the tradeoff between the two sources of error.

Week 6 : Bias and variance