Analyzing the Architectural Components in Large Language Model (LLMs)

Arnab DeyArnab Dey
4 min read

We’ll explore the architecture of the GPT-2 model and understand how it generates text. Deep Diving into the intricacies of LLM and their pivotal role in the realm of GenAI. I’ll be doing this in Google Colab with Python and a couple of packages called transformers and torch. The reason why I’m looking into this GPT-2 model and not the latest one, such as 3.5 or 4, is it allows us just to take a look at this data step by step as it passes through a model, and we get to see how it changes things, how it predicts the next word in the text. So GPT-2 is just a transformer-based model that’s designed for text generation.

!pip install transformers
!pip install torch

Loading the GPT-2 Model

GPT-2 is a transformer-based model designed for text generation, its architecture includes:

Embedding Layer: Converts inputs tokens into embeddings.

Transformer Blocks: Process embeddings to generate contextual representations.

Output Layer: Produce probabilities for each token in the vocabulary.

And from those probabilities, it will select the next word or the next letter or the next character, and then finish off our transformer model.

So, lets install transformers and torch, which will contain the actual GPT-2 model for us to play with. Now, this is already pre-trained for us, and I won’t be doing any of the training, but I’ll show you how it takes input text and gives us output. So go ahead and install with, pip install transformers and pip install torch.

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

Lets import the GPT-2 Model, the HeadModel and Tokenizer, and initiate these things from a pre-trained variable called “gpt2”.

Tokenization

Tokenization converts input texts into tokens which are numerical representations understood by the model.

So, I start here on line 1 with a text that says, "Machine learning is". That's our input text that we're trying to pass into the model. Let’s see what this Tokenizer Layer does, and how it changes things into embeddings.

Get the input ids using the tokenizer and encode it. Now this is just a shape of the data that we’re trying to return our information or our tokens. Now the tokens is given in this array or this tensor as the 0 index and converting to a list.

text = "Machine learning is"
input_ids = tokenizer.encode(text, return_tensors='pt')
print(f"Input Text: {text}")
print(f"Tokens: {input_ids[0].tolist()}")
print(f"Tokenized Text: {tokenizer.convert_ids_to_tokens(input_ids[0])}")

Now you'll see that there's three representations of our text. The first one is the human-readable form, which says, Machine learning is, just exactly what we’d expect. The second one is the numerical representation of 'Machine learning is'. We’ve got numbers instead of words here, and this is what the model is going to go ahead and do some type of operation on and then the Tokenized Text as well.

The second word it says, 'Glearning' and then 'Gis'. So, it’s a little bit different. There’s a little bit of error. It’s pretty normal with this primitive model, the GPT-2 model.

Model inference

Once passing those tokenized inputs through the model, it's going to generate probabilities for the next token.

By using torch.no_grad(), we avoid computing gradients, making the process faster. The argmax function finds the most likely next token ID, which we then decode back into a word using the tokenizer. For example, given the input "Machine learning is", the model predicts the next word to be "a".

Defining a function generate_text, which generates text from the text prompts passed to it. The first step is to encode the input text into token IDs with tokenizer.encode(text, return_tensors='pt'). Following that, we feed these token IDs into our model, which includes various parameters that allow us to control the output, such as maximum length or beam search, no repeating n-grams, and early stopping. The model returns the output tokens, which is converted to human-readable text by calling tokenizer.decode (output[0], skip_special_tokens=True). This is how it generated text. For example, if we provide a prompt stating "Machine learning is" and call generate_text with a maximum length of 50, it will generate additional words to complete the sentence, which will then be displayed as output.

Output:

Machine learning is one of the most promising areas of research in the field of artificial intelligence. In the past few years, researchers have been working on ways to train neural networks to perform tasks that are difficult for humans to do. For example,

Code : https://colab.research.google.com/drive/11znleU7iKbP3AhcWxbUYqjECsD7SlL6b?usp=sharing

In my next blog post, we will implement techniques to fine-tune large language models for specific generative tasks.

If you enjoyed this article, share it with your friends and colleagues! 🎉🎉🎉

1
Subscribe to my newsletter

Read articles from Arnab Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arnab Dey
Arnab Dey