Ai from Laguage tanslater to the bigest revolution

Prabhutva guptaPrabhutva gupta
2 min read

Well we all have used chatgpt or any kind of “ai” in some way what we call ai is based on the basic framework that was developed years ago for google translate. In different languages there are different constraints, definition and contexts to different word and to translate something from english to native language would not just require translating the words but understand the sentence and context of words in what they are said, to counter the challenge google developed a transformer model architecture which is then used for “The next word predictor” as said by OpenAi’s main head at research Ilya Sutskever

no matter how much much complicated this diagram may look its as simple as it could get

Step 1 : It involves turning the words into something we call tokens. In input embedding part the words are

embedded and linked with some numbers which are called token and the process is called Tokenisation

Since transformers don't understand order by default, positional encoding is added to indicate the position of each word

Step 2 : Now we have some words which are linked with some numbers but they done carry any real meaning till now. The model to understand what is the meaning and context of the words and whats important on which they should form a output carries out Vector embeddings. Vector embedding is a part of multi head attention which means looking at all the words at once and understand whats important, Vector embedding stores the semantic relationship of the word (what does it means in real life) in the form of numbers in a 3d graph

Points: 10000 Dimension: 200

this the basic image of vector embeddings graph we can visit https://projector.tensorflow.org/ to get to know it better

Now the model applies the learnt information to generate the output, the the input and output layer is added together and the process is repeated several times until desired result is generated

Step 3 : The probabilities of the the words generated are calculated and the word with the highest probability is accepted as the output

with every input the the learnings are applied and used to generate and calculate probabilities more efficiently and the outputs are refined

0
Subscribe to my newsletter

Read articles from Prabhutva gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prabhutva gupta
Prabhutva gupta