AI Explained: The Math Behind the Magic

Can Computer really think?

Maybe yes but not just yet 🥲. For now its just bunch of algorithms running and trying to give us humans what we want from it.

But then how exactly are these ChatGPT, Gemini, LLama, etc are working?

Although they are also running different…different types of algos which these billion dollar companies thinks are correct to give out the best answers to user queries. Lets look little more deep into the mechanism of it.

Mechanics behind returning output

When we ask any so called AI models a query lets say “How are you?“. Now we both understand this as this is in english what if a Chinese wrote it “你好吗?“, we can’t right! Same problem is with computers they also can’t understand human languages but they are best at understanding numbers, thats what they were made for. Back to the topic…
When user writes a query “How are you?“ it is first broken into parts example: “how”, “are”, “you”, “?“. We have broken this sentence into 3 parts. And now we convert them into number i.e computer language which could be like 10 15 20 25. These are called Tokens, and this whole process from breaking into parts and converting to numbers is called Tokenization.
In next step, We gave computer a 3D array (special dictionary as i mentioned in diagram above) in technical terms we call it Vector Embeddings (how it is made and where it came from, lets understand it some other day). This dictionary is used to figure out semantic meanings to the tokens we gave it. Example: “the river bank“, “the icici bank“ in these both sentences bank means different so based on the sentence it finds the meaning of the word. Now computer uses this dictionary and match the possible answers and gives output. But whats interesting is how computer matches.

How Computer Matches

Its going to be a little too technical here. Now we know that “bank“ comes after “river“ but does computer knows it? Absolutely no. So to overcome this we give each token a position, called as Positional Encoding.

Now that positions are added, does it mean its correct? “the bank river“, in here it maybe a typo but user have given it anyways, we have to deal with it. For this we give chance to every token to talk to each other and update the positions, called Self-Attention. Doing this in parallel and multiple aspects like how, why when etc is called Multi-Head Attention. Example assume there is a brownish black thick haired big dog coming your way, there would be multiple things coming to your mind in parallel why he is coming towards me, will he bit me, who is the owner of this dog, should i run, etc. This is what we do in Multi-Head Attention, we find different characteristics of a token and allow them to update the position/weights of other tokens.

readers: uff….bas kar re
writer: nahi yr abhi mai break nahi de pauga tumhe 😂

Going forward, we pass all the updated tokens to a black box which we call Neural Network (Feed Forward layer). It predicts what should be the next set of tokens of these given tokens which would essentially be answer to user query. But wait hold on!! we are still in training phase. Ohh…yes. While training when we get final output from Neural Network, we calculate a loss i.e how much are we far from our answer. Lets say “how are you?“ and it outputs “hii hhaaa“. So we calculate a loss and updates the weights of tokens and rerun the full Transformer again until we get what is correct “I am fine.“. But remember all this is just training. Let learn about inferencing/using it. But before that lets take a quick peak on what a transformer is.

A type of architecture which gives out an meaningful output to given input is know as Transformer. And a transformer which is pretrained in predicting the next set of tokens is know as Generative Pretrained Transformer (GPT).

Inferencing

Here when user gives the query, we could have multiple output to a question.

q: “how are you?“
ans1: “I am fine.”
ans2: “I am good.“
ans3: “I am doing great! how about you?“

To choose one we have a layer of Linear, It gives out a probability to each answer like
ans1: 10
ans2: 50
ans3: 90

Then finally comes Softmax this chooses a answer based of probability. And we can control this using a parameter known as Temperature higher the temperature the more random answer will be chosen.

Conclusion

AI= Data+Algo
Current AI is design to predict next set of words/tokens not the answer to our query.
AI can’t really think, all creative jobs are safe chill 😎.

#chaicode

AI isn’t magic — it’s just math!

Table of contents