Trainer and SFTTrainer: Key Differences Explained

transformers
and trl
provide Trainer
and SFTTrainer
for training transformer models. They are used for different purposes.
Trainer
The Trainer
class offers an API for training transformer models using the TrainingArguments
class, which provides various options to customize the training process.
It is ideal for general-purpose training of models from scratch and usually requires large datasets for effective training.
It also manages complex training tasks like checkpointing, gradient accumulation, and distributed training in the background.
SFTTrainer
It offers an easy-to-use API to train models with just a few lines of code using the SFTConfig
class. It automates many tasks, letting you write training code with less effort.
The SFTTrainer
class is optimized for fine-tuning pre-trained models with smaller datasets.
It includes built-in support for PEFT and packing optimizations to reduce memory use during training.
The SFTTrainer
and SFTConfig
classes are essentially subclasses of the Trainer
and TrainingArguments
classes, so the same options can be used.
Trainer | SFTTrainer | |
Purpose | General-purpose training from scratch | Supervised fine-tuning of pre-trained models |
Data requirements | Larget datasets | Smaller datasets |
Customization | Highly customizable | Simpler interface with fewer options |
Which one to choose?
Trainer: Your goal is to train a model from scratch with large datasets and needs extensive customization for your training.
SFTTrainer: Your goal is to fine-tuning a pre-trained model with smaller datasets and want a simple implementation of training code.
Subscribe to my newsletter
Read articles from Baku Kim directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Baku Kim
Baku Kim
With 2+ years of experience in web backend development, I now specialize in AI engineering, building intelligent systems and scalable solutions. Passionate about crafting innovative software, I love exploring new technologies, experimenting with AI models, and bringing ideas to life. Always learning, always building.