Neural networks according to me

Keerthi K VKeerthi K V
6 min read

In this article I will explain how a neural network is built and the underlying mechanism which helps us to build these. I will try to explain it as simple and intuitive as possible by making up an imaginatory game and get to the in depth details as we proceed. So here we go

You are given a city building game, the rules of the game are as follows:

  • You are given a fixed number of workers to build your city.

  • Workers use tools provided by you to complete their task.

  • You can modify the tools as per your wish (For simplicity you can either increase or decrease the size of the tool).

  • By modifying the tools to the needs of the workers, you should maximize their efficiency.

  • You are provided with a helper, who can go to each worker and collect information about whether they need their tool size to be increased or decreased.

  • End goal is to minimize time taken to build the entire city.

Lets play this game, in order to reach the end goal we need to know how to maximize the efficiency of each worker by modifying the tools, but how can we modify the tools to increase each workers efficiency ?

At the start of the game, each worker has tools of random size. You are also not sure which worker is taking up heavier task and which workers take lighter ones. Just to check how badly we are doing with random tool sizes, we can run the game and see how much time it takes to complete the city. I know this is an imagination game, but i’ll just say time taken to build the city with random tool sizes is 10 years.

But wait, we have a helper with us who can collect information about tool size changes required by workers. Now once this helper gives us the information about tool size changes, we can alter each and every tool. You need to remember that we only know whether we should increase or decrease the tool size, but we do not know by how much we need to change. So to be very careful we can alter by a very small amount and run the game. Time taken to build the city should have been reduced now since we modified changes according to the request of each worker. Lets say the time taken is now 8 years. Now, we can easily do this information collection and modification according to worker needs multiple times and get to a point where each worker has a tool that is fit to perform their task in an efficient way, thereby reducing the overall time taken to build the entire city, lets say to 5 years.

Great!!, we have just figured out how to optimize city building. The method we used here is pretty similar to how we train a neural network, it is called stochastic gradient descent (SGD). Lets understand how we use SGD to train neural networks.

Connecting the dots..

Taking the analogy of the game, lets see how neural network training is similar. We have

  • Input data - workers

  • Weights - tools for the workers

  • Updating weights - increasing/decreasing tool size

  • Reducing the loss - minimizing the time taken to build city

In a nutshell, this is what we need to do:

Neural network training loop

We have some input data collected, we have randomly initialized weights, we try to predict the output using this inputs and weights, calculate the loss, update weights, make prediction using updated weights and repeat the same steps until we have reached a very minimal loss.

Loss: is a quantitative measure of how far are the predictions from the actual data. If the model is predicting values close to actual data, we will have less loss. This is the basis of training a neural network.

Technical explanation

For the sake of simplicity, lets look at a neural net with just one neuron. You can assume that the same logic applies to much bigger neural networks.

Single neuron

Here is what a single neuron does:

  • Takes multiple inputs

  • Takes a same amount of weights as inputs

  • Multiplies each input value to its corresponding weights and adds all the multiplied values

  • Adds a bias value to the weighted sum

Mathematically, it does

$$z = \sum_{i=1}^{n} w_i x_i + b$$

The output of a neuron or a neural net is a prediction for a given input data, we can take this prediction and compare it with the actual value. By doing so we can understand how good the neural net is predicting.

Lets take a simple method to calculate the effectiveness of a neural net i.e., mean absolute error (MAE). All we do here is, for each prediction that the model has made, we calculate the absolute error and take the mean out off these absolute values. The MAE gives us an idea of how wrong is the model predicting w.r.t the actual values.

I often use the words “neural net“ and “model” interchangeably while explaining concepts, just know that I mean the same thing.

At this point, we have a model that takes inputs and weights and predicts the output, we have a way to calculate how good/bad the model is predicting. We know input is constant and cannot be modified, but we can modify the weights and bias. We need to find out a way to change these weights so that it minimizes the loss. Lets look at how we update the weights and bias.

The derivative!!

There is a concept of derivatives in mathematics that you might have studied in high school math. I don’t want to get into all the math around this now, but simply put “the derivative of a value with respect to a function determines the effect it has on the function“. What I mean is that derivative tells us what happens to the result if you increase the value or decrease the value. Derivative of a value wrt to any function can be either positive or negative.

In the case of our neural net, we need to calculate the derivative of each weight with respect to the loss function(MAE in our case). We are calculating the derivative w.r.t loss function because that is what gives us a measure of our model’s efficiency. Now if the derivative is positive, it means that increasing the weight will increase the loss, and if the derivative is negative, it means that increasing the weight will decrease the loss. So we increase or decrease each weight in a way that the loss is reduced. These derivatives are also called gradients.

If you remember the game we played earlier, you might notice that the helper was actually doing this work of getting the gradients from workers.

Ok, now that we know how to optimize the weights to reduce loss, all we need to do is to create a loop which looks like this:

The loop ends when we are satisfied with the loss.

This process of updating weights to reduce loss by taking gradients of weights is what’s called the Stochastic gradient descent. And this is how neural networks are built.


If you have read this far, thank you for your time. I hope the article was informative. Feel free to comment corrections, suggestions or your thoughts on the article. Bye 👋

0
Subscribe to my newsletter

Read articles from Keerthi K V directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Keerthi K V
Keerthi K V