Fine-Tuning a Pretrained Model for Beginners

Tarun SharmaTarun Sharma
5 min read

A Step-by-Step Guide to Training Your Own AI Model


1. What is Fine-Tuning?

Fine-tuning is like teaching a well-read scholar to specialize in a new field. Imagine someone who’s read thousands of books (the pretrained model) – we’re giving them a crash course in computer science using your PDFs.

Key Buzzwords Demystified

  1. Pretrained Model: A model already trained on massive general data (e.g., GPT-2 trained on internet text).

  2. Tokenization: Converting text to numbers the model understands (like translating words to a secret code).

  3. Epoch: One full pass through your training data (like studying a textbook cover-to-cover once).

  4. Loss: A measure of how wrong the model’s predictions are (lower = better).


2. Step-by-Step Process

Let’s use your computer science books example:

Step 1: Prepare Your Data

  • Extract Text from PDFs:

      from pypdf import PdfReader
    
      def extract_text(pdf_path):
          text = ""
          reader = PdfReader(pdf_path)
          for page in reader.pages:
              text += page.extract_text()
          return text
    

    Why? Models can’t read PDFs directly – we convert them to raw text.

Step 2: Tokenization

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token  # Tell the model how to handle short text

def tokenize(text_chunks):
    return tokenizer(
        text_chunks, 
        truncation=True,    # Cut long text to 512 tokens
        padding="max_length",  # Pad short text to 512 tokens
        max_length=512
    )

Why 512? This is GPT-2’s maximum context window (like its short-term memory limit).


Step 3: Load Pretrained Model

from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained("gpt2")
  • GPT2LMHeadModel: A version of GPT-2 specifically designed for text generation.

Step 4: Training

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="my-trained-model",  # Save location
    num_train_epochs=3,             # Study the books 3 times
    per_device_train_batch_size=2,  # Process 2 text chunks at once
    learning_rate=5e-5              # Speed of learning (too fast → mistakes)
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)
trainer.train()

Key Decisions:

  • Batch Size: Depends on your GPU memory (larger = faster but needs more RAM).

  • Learning Rate: Start with small values (1e-5 to 5e-5) to avoid overwriting existing knowledge.


3. Generating Text: The model.generate() Parameters

After training, you’ll use these key parameters to control text generation:

Core Parameters for Text Generation

ParameterWhat It DoesWhy Adjust?
temperature (0-1)Controls randomness: 1. Low (0.2): Safe, predictable answers 2. High (0.8): Creative but riskyTechnical content → lower values Creative writing → higher values
max_lengthMaximum number of words/tokens to generatePrevents endless rambling
top_kOnly considers the top K most likely next words (e.g., top_k=50)Balances quality (higher K) vs focus (lower K)
top_pChooses from the smallest set of words covering P% probability (e.g., top_p=0.9)Dynamic alternative to top_k – better for unpredictable topics
repetition_penaltyPenalizes repeated phrases (1.0=no penalty, 1.5=strong penalty)Fixes looping issues ("Clean code is clean code is clean code...")
num_beamsNumber of parallel searches for optimal text (beam search)Higher values (3-5) = better coherence but slower

Why These Parameters Matter

  1. Temperature

    • Think of it as the "creativity dial":

      • Low (0.2-0.5): For factual answers ("What is Big O notation?")

      • High (0.7-1.0): For brainstorming ("Suggest novel CS project ideas")

  2. top_p vs top_k

    • top_k=50 = "Choose from 50 best options"

    • top_p=0.9 = "Choose from options covering 90% likelihood"

    • Use top_p for technical accuracy, top_k for creative tasks

  3. Repetition Penalty

    • Essential for long-form content – prevents the model from getting stuck in loops

Example Configuration for Technical Content

generation_config = {
    "temperature": 0.3,          # Factual accuracy
    "top_p": 0.85,               # Focused but flexible
    "repetition_penalty": 1.5,   # Strong anti-repetition
    "max_length": 150,           # 1-2 paragraph answers
    "num_beams": 4               # Balanced quality/speed
}

Example Setup

generation_config = {
    "temperature": 0.7,
    "max_length": 200,
    "top_p": 0.9,
    "repetition_penalty": 1.2,
    "num_beams": 4
}

input_prompt = "Explain object-oriented programming:"
inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, **generation_config)
print(tokenizer.decode(output[0]))

4. Why This Matters for Interviews

Understanding these concepts helps answer questions like:

  • “How would you adapt a general AI model to a specific domain?” → Fine-tuning!

  • “What parameters control text generation quality?” → Temperature, top_p, etc.


5. Try It Yourself!

  1. Start with small PDFs (1-2 chapters)

  2. Use Google Colab’s free GPU

  3. Experiment with different generation parameters

Pro Tip: For technical content, use:

{
    "temperature": 0.3,  
    "top_p": 0.85,  
    "repetition_penalty": 1.5  
}

This gives focused, factual answers while avoiding repetition.


6. Learn More About LLMs

Want to dive deeper? Here are hand-picked resources:

A. Hugging Face LLM Course

The Hugging Face LLM Course is the perfect next step. You’ll learn:

  • How transformers work under the hood

  • Advanced fine-tuning techniques

  • Deploying models to production

  • Ethics in LLM development

Perfect for:

  • Beginners transitioning to intermediate

  • Developers building real-world LLM apps


B. Next Steps for Your Project

  1. Add Retrieval-Augmented Generation (RAG)
    Combine your fine-tuned model with a searchable knowledge base for factual accuracy.

  2. Optimize for Deployment
    Convert your model to ONNX format for faster inference.

  3. Monitor Performance
    Track metrics like response relevance and latency.


Remember: fine-tuning models is like raising a bookworm AI—feed it great data, be patient with training, and soon it’ll start quoting Clean Code at you in its sleep. Happy coding! 🚀📚

20
Subscribe to my newsletter

Read articles from Tarun Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tarun Sharma
Tarun Sharma

Hi there! I’m Tarun, a Senior Software Engineer with a passion for technology and coding. With experience in Python, Java, and various backend development practices, I’ve spent years honing my skills and working on exciting projects. On this blog, you’ll find insights, tips, and tutorials on topics ranging from object-oriented programming to tech trends and interview prep. My goal is to share valuable knowledge and practical advice to help fellow developers grow and succeed. When I’m not coding, you can find me exploring new tech trends, working on personal projects, or enjoying a good cup of coffee. Thanks for stopping by, and I hope you find my content helpful!