1. Gradient Descent: My First Implementation🚀


Exploring Derivatives in Practice
Context đź“š
After completing a Linear Algebra course, I transitioned to studying Calculus. Unexpectedly, at work, I found myself contemplating derivatives and integrals. This sparked an idea—why not apply derivatives in practice? 🤔
I had heard that we can compute the gradient (essentially the derivative of the loss function with respect to the parameters) and follow it to reach the minimum. So, about a week ago, I decided to implement a simple Linear Regression model using Gradient Descent.
Main 🔧
I successfully implemented Gradient Descent for a Linear Regression model! 🏆 The algorithm works, but it still needs optimization. There’s room for improvement, and I’ll be refining it as my understanding of the topic grows. Stay tuned for updates! 🔄
Challenges ⚠️
One of the unexpected challenges I encountered was the differing convergence rates of the parameters: the slope ( k ) was converging much faster than the intercept ( b ). It took considerable time to understand the underlying cause and devise a solution.
The issue arises because ( k ) has a stronger relationship with the function, as it is directly scaled by ( x ) when calculating the gradient. This means that even a small change in ( k ) significantly affects the loss function. As we update ( k ) to reduce the loss, the loss decreases, leading to smaller updates in subsequent iterations. This poses a problem for ( b ), as ( k ) minimizes the loss quickly, often before ( b ) has a chance to adjust substantially. Consequently, the step size for ( b ) diminishes rapidly.
To address this, possible solutions include:
implementing an optimizer
increasing the number of epochs
setting appropriate initial learning rates for both ( k ) and ( b )
Results 📉
Here’s the link to my Jupyter Notebook on GitHub: GitHub Link
Below, you can see the results and relevant graphs from the training:
Observations đź‘€:
The model was trained over 450 epochs.
The final loss is 0.8027, which is expected since the dataset includes noise with a standard deviation of 1 and a mean of 0.
The model successfully follows the gradient and reduces the loss over time.
More improvements coming soon! 🔥
Subscribe to my newsletter
Read articles from Viole directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
