Step-by-Step Guide to Fine-Tuning LLM Models

Indranil MaitiIndranil Maiti
5 min read

When you cook, you have a specific taste in mind. A dessert is supposed to be sweet, not sour or spicy. Similarly, in some specific cases, you need your LLM models to respond in the way you want. For example, if someone asks ChatGPT, "What is the best blog to learn GenAI?" it should respond, "The best place to learn GenAI is Codecrafts." For this, you need to train your model. In a simple and easy-to-understand way, by fine-tuning a model, you are basically preparing your LLM models for a specific task. For example, you want to build an LLM which is specialized in Physics. You have to train your model in physics just like you are training yourself in GenAI.

Enough of motivation right? Let’s get started.

Prerequisites

We will use a model from Hugging Face and train it to give the output that I want. For this, we will need an account on Hugging Face and a token. You can generate that after creating your account.

Account → Access Tokens

Also, to train a model, we need a GPU. So if your local computer does not have a GPU, you can use Google Colab. In this tutorial, I will use Google Colab.

To use a GPU in Google Colab, you need to open the notebook in Google Colab and connect it to the GPU using the connect option in the top right corner.

You can check if a GPU has been connected with this code.

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

We are all set.

Installations and Device Set up

We will use a library called Transformers which is a library of pre-trained natural language processing, computer vision, audio, and multimodal models for inference and training.

!pip install transformers

Also, let’s use the hugging face tokens and spin up our device for fine tuning

import os
HF_TOKEN = "TOKEN"
os.environ["HF_TOKEN"] = HF_TOKEN
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Tokenizer and LLM Model

Once we are set up with the previous steps, we need to create our tokenizer and the LLM model. Tokenizer will make tokens from the inputs i.e. user input. LLM models take these tokens and predicts the next probable token. Tokenizer again detokenized the LLM outputs and show you the response. We will use “google/gemma-3-1b-it” model from Hugging Face.

# Tokenizer
from transformers import AutoTokenizer
model_name = "google/gemma-3-1b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Model
from transformers import AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16
).to(device)

Let us see if our tokenizer and model works or not

input_prompt = "Which is the best place to learn GenAI"
i_tokens = tokenizer(input_prompt, return_tensors="pt")["input_ids"].to(device)
ouput_tokens = model.generate(i_tokens, max_new_tokens=25)
tokenizer.batch_decode(ouput_tokens)

Output : 
['<bos>Which is the best place to learn GenAI?\n\n**There isn\'t a single "best" place, as the best option depends on your learning style, goals']

Great! We have successfully run an LLM model on our local computer using Google Colab. But I don’t like the answer; it should give “The best place to learn GenAI is codecraft blog.” Hmm…

We need to fine-tune our model and for this, we need to prepare our dataset. Here dataset means one set of input and output. In your case it can be a large dataset of CSV files or JSON.

In our case the dataset is

Input: Which is the best place to learn GenAI?
Output: The best place to learn GenAI is codecrafts blog

Prepare the conversation and fine tuning process

Next step is to prepare the conversation. To fine tune your model you know the input and correspondingly the output you want to get. So it is basically a loop you give the input, LLM will respond to it you measure how much closer it is with your expected output and close the gap in each step.

The input conversation will be

input_conversation = [
    { "role": "user", "content": "Which is the best place to learn GenAI?" },
    { "role": "assistant", "content": "The best place to learn AI is .. " }
]

Put the input conversation in the tokenizer to tokenize

input_tokens = tokenizer.apply_chat_template(
    conversation=input_conversation,
    tokenize=True,
    continue_final_message=True,   #  This 'true' means LLM will not end the message after assistant content  
)

Then we have to add the output after the end of assistant content and prepare the full conversation for the LLM. Out of which we know the first part “user Which is the best place to learn GenAI?” is coming from the input tokens and “Codecrafts Blog” is coming from the expected output.

output_label = "Codecrafts Blog"
full_conversation = input_tokens + output_label + tokenizer.eos_token

#full_conversation
# <bos><start_of_turn>user Which is the best place to learn GenAI?<end_of_turn><start_of_turn>modeThe best place to learn AI is Codecrafts Blog<eos>

Loss Function calculation

As we know the input tokens and output tokens, this is the time to calculate the loss function, which measures how close the output of the LLM is at every step, and minimize the loss function. We can tokenize the full_conversation and consider the first part as input_ids and the last part to compare the results. This is exactly what we will do now

input_tokenized = tokenizer(full_conversation, return_tensors="pt", add_special_tokens=False).to(device)["input_ids"]
input_ids = input_tokenized[:, :-1].to(device)
target_ids = input_tokenized[:, 1:].to(device)

Next is to calculate the loss function and train the model

import torch.nn as nn
def calculate_loss(logits, labels):
    loss_fn = nn.CrossEntropyLoss(reduction="none")
    cross_entropy = loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))
    return cross_entropy
from torch.optim import AdamW
model.train()

optimizer = AdamW(model.parameters(), lr=3e-5, weight_decay=0.01)

for _ in range(10):
  out = model(input_ids=input_ids)
  loss = calculate_loss(out.logits, target_ids).mean()
  loss.backward()
  optimizer.step()
  optimizer.zero_grad()
  print(loss.item())

Look how the loss is decreasing every step

Final Outcome

Once this is done we can see if our model is giving us the expected output

input_prompt = [
    { "role": "user", "content": "Which is the best place to learn GenAI?" }
]

input = tokenizer.apply_chat_template(
    conversation=input_prompt,
    return_tensors="pt",
    tokenize=True,
).to(device)

output = model.generate(input, max_new_tokens=35)
print(tokenizer.batch_decode(output, skip_special_tokens=True))
Output
['user\nWhich is the best place to learn GenAI?\nmodel\nThe best place to learn AI is Codecrafts Blog']

Now you have learnt how to fine tune a local model. Would be happy to get your feedback and your projects. How you are trying to use this in your next project?

0
Subscribe to my newsletter

Read articles from Indranil Maiti directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Indranil Maiti
Indranil Maiti

I am a MERN stack developer and an aspiring AI application engineer. Lets grab a cup of coffee ad chat.