The Balancing Act: Choosing the Right Lambda in Regularization (Linear Regression)

Paul OmagbemiPaul Omagbemi
1 min read

Since we can't be sure which of the parameter to penalize, we penalize all by adding the Regularization term to the cost function. A regularized cost function is defined by:

$$J(\mathbf{w}, b) = \frac{1}{2m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^n w_j^2 $$

$$\frac{\lambda}{2m} \sum_{j=1}^n w_j^2$$

The regularization term, where λ is the regularization parameter and wj​ are the weights.

Note: if λ = 0, then the regularization term is not used, resulting in overfitting. if λ = 10^10, this will make the model penalize the parameter to choose very tiny number almost tantamount to 0, resulting in f(x) = b

also, the gradient descent is not left out. Gradients for w:

$$\frac{\partial J(\mathbf{w}, b)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j$$

Gradients for b:

$$\frac{\partial J(\mathbf{w}, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)$$

Repeat until converge:

$$w_j = w_j - \alpha \left( \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j \right)$$

$$ b = b - \alpha \left( \frac{1}{m} \sum{i=1}^m \left( f{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) \right)$$

Summary: Picking the right lambda matters!

0
Subscribe to my newsletter

Read articles from Paul Omagbemi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Paul Omagbemi
Paul Omagbemi