#chaicode
In 2017, a research paper by Google, “Attention is All You Need“. This was created to enhance Google Translate’s capabilities.

Architecture of Transformers

This was the architecture that was proposed in the research paper. This is the brain behind all the Next-word predictors out there, i.e ChatGPT, Gemini2.5, Claude, etc.

Step 1: Tokenisation is the process that converts the given input strings into ‘Tokens‘ ,which are assigned numbers to the given strings according to their (Vocab size: are size of model’s vocabulary).
Step 2: Vector Embeddings creates relationships across those tokens inside a 3D vector space by holding their semantics or meaning.
Step 3: Positional Encoding positions the Vector according to the position of the input strings

Ex: “Chethan bunked college” and “College bunked chethan” are 2 different things.
Step 4: Self Attention a crucial mechanism that enables the vectors to communicate with each other. and generates the probability of next word occurrences.
Step 5: The decoding process are started.

Conclusion

In reality, GPT or Generative Pre-Trained Transformers is just a next-word generator, but with immense capabilities to change the future of work.

Working Of Transformers

Architecture of Transformers

Conclusion

Subscribe to my newsletter

Chethan G

Chethan G