The Balancing Act: Choosing the Right Lambda in Regularization (Linear Regression)

Paul OmagbemiPaul Omagbemi
1 min read

Since we can't be sure which of the parameter to penalize, we penalize all by adding the Regularization term to the cost function. A regularized cost function is defined by:

$$J(\mathbf{w}, b) = \frac{1}{2m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^n w_j^2 $$

$$\frac{\lambda}{2m} \sum_{j=1}^n w_j^2$$

The regularization term, where λ is the regularization parameter and wj​ are the weights.

Note: if λ = 0, then the regularization term is not used, resulting in overfitting. if λ = 10^10, this will make the model penalize the parameter to choose very tiny number almost tantamount to 0, resulting in f(x) = b

also, the gradient descent is not left out. Gradients for w:

$$\frac{\partial J(\mathbf{w}, b)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j$$

Gradients for b:

$$\frac{\partial J(\mathbf{w}, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)$$

Repeat until converge:

$$w_j = w_j - \alpha \left( \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j \right)$$

$$ b = b - \alpha \left( \frac{1}{m} \sum{i=1}^m \left( f{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) \right)$$

Summary: Picking the right lambda matters!

0
Subscribe to my newsletter

Read articles from Paul Omagbemi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Paul Omagbemi
Paul Omagbemi

Exploring the language of data, one algorithm at a time. Machine learning enthusiast, AI researcher, and advocate for tech-driven solutions to real-world challenges. Passionate about using AI for public safety and ethical technology. Join me as I document my journey through data, models, and insights.