Fine-tune LLM models with minimum cost | Frank Shen | 2024

Large Language Models (LLMs) trained on extensive generic corpora excel in general AI tasks. However, their performance may not always align with specific business needs, leading to reduced accuracy.

Full fine-tuning (instruction fine-tuning) can enhance a model’s performance. However, it updates all model weights, therefore requires significant memory, compute resources, and time.

I propose a strategy to fine-tune LLMs that significantly improves model performance while minimizing training costs by utilizing Parameter Efficient Fine-Tuning (PEFT).

A game changing example of what fine-tuning LLM can do for you:
Ask your database any human questions and the AI will return you the result.
You never need to write a single line of SQL code in your application again!
Here is how to do it:
- Fine-tune the LLM model with SQL query training data that has been generated by LLM
- Write a single generic function that takes any SQL query, execute it against your database, and returns the result (a guardrail must be provided to reject any SQL queries that are considered unsafe)
- User ask a question -> used the Fine-tuned LLM model to generate the SQL query
- Utilize LLM function-calling capability to execute the SQL query and returns the result
- Using the result above, ask the LLM to generate the final answer to the question the user has asked

Below are the steps to build the system to achieve it.

Fine-tuning (supervised learning) on the pre-trained base model:
1. Fine-tuning LLM is supervised learning for a specific task (or domain e.g Q&A, SQL generation, etc.) using labeled training data (You can take advantage of private data), e.g. input-output (prompts-responses) pairs to adjust the model’s weights, enhancing its proficiency in the specific task
  1. The training input data is the prompt (user inquiry) & the label is the expected response generated by the LLM
    For example:
    input data: as a SQL expert, given a table schema and a question, generate a SQL statement
    label: the valid and correct SQL statement.
  2. Full Fine Tuning (Instruction fine-tuning) updates all model weights. It demands enormous memory & compute resources and is time-consuming. I will avoid doing it.
  3. On the other hand, Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning that is much more efficient than full fine-tuning. PEFT updates only a small subset (e.g. 1%) of parameters, and keep the rest unchanged. Unlike full fine-tuning, PEFT maintains the original LLM weights, avoiding the loss of previously learned information. There are various ways of achieving Parameter efficient fine-tuning. LoRA & QLoRA are the most effective.
2. Use LLM to automatically generate training data used for fine-tuning the model
  1. generate both the input data and their labels from a LLM model through prompt engineering
  2. clean the input data (e.g. validate the labels) using human intervention
  3. tip: spotting LLM mistakes & hallucinations for your tasks (or domain) and providing correct labels is crucial to achieve high accuracy for the model
  4. do train / validation data split
  5. in the next iteration of fine-tuning, find and correct issues with the previous input data & generate more data and add them to the input data for breadth
3. Build a pipeline to PEFT fine-tune the base LLM model
  The pipeline building is a one-time effort; It can be re-used in the next iterations of fine-tuning.
  It includes the following steps:
  1. training data generation via LLM
    - prompt engineering
    - input data generation
    - labels generation
    - labels validation
    - data cleaning
  2. fine tune LLM
    - load pre-trained model
    - load training data
    - tokenize training data
    - setup the PEFT model for fine-tuning
    - train the PEFT adapter
    - evaluate the fine tuned model and generate performance metrics
    - full code is not included here since this story is for broader audiences
4. Run the pipeline to fine-tune the model
5. Measure performance of fine-tuned model and compare it with the baseline model
  - do inference with the validation data on both the base model and fine-tuned model and compare them.
Do the fine-tuning iteratively until the desired performance threshold is achieved
- repeat all steps in 1 above.
Build a production grade LLM model fine-tuning framework

In order for the users to easily fine tuning their LLMs and apply it to various GenAI tasks, a fine-tuning LLMOps framework must be made available to users, and it should:
- be re-usable for fine-tuning different models for various GenAI tasks and use cases
- help users to generate training dataset via LLM
- users can easily switch to a different LLM base model, training dataset, fine-tuning strategy, configuration, etc.
- fine tuning iterations / versioning support
- built-in automatic performance evaluation & metrics
- UI for human inputs, evaluations & intervention
- etc.

Fine-tuning Large Language Models (LLMs) is a powerful technique to tailor these models to specific business needs, enhancing their performance and accuracy. By leveraging Parameter Efficient Fine-Tuning (PEFT), such as LoRA and QLoRA, we can achieve significant improvements while minimizing resource consumption.

The process involves generating and validating training data, building a fine-tuning pipeline, and iteratively refining the model until the desired performance is reached. Establishing a robust LLMOps framework ensures that users can easily fine-tune models for various tasks and providing a scalable and efficient solution for deploying customized LLM models.

How to build a LLM Fine-Tune system

Subscribe to my newsletter

Frank Shen

Frank Shen