How to fine-tune Flux.1 LoRA in Python 3.12


This is a short walkthrough of fine-tuning Flux1 model. We are using kohya-ss/sd-scripts as a starting point.
1. Environment Setup
Prerequisites:
- GPU with ≥24GB VRAM
I’m using:
Nvidia RTX Pro 6000
Python 3.12
First clone the repo.
git clone git@github.com:kohya-ss/sd-scripts.git
cd sd-scripts
We should switch to sd3
branch since Flux LoRA training script is only in that branch.
git checkout sd3
Let’s setup a venv
for the scripts for better environment isolation.
python3.12 -m venv venv
source venv/bin/activate
Install CUDA version of PyTorch as instructed here.
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Now, let’s install rest of the dependencies
pip install -r requirements.txt
We should be ready to go with the environment.
⚠️ Caveat 1: At the time of writing this, opencv-python
version inside requirements.txt
was frozen at a particular version 4.8.1.78
which was incompatible with numpy
2.x. I had to change the following line in requirements.txt
and reinstall the dependencies.
- numpy<=2.0
+ numpy<2.0
Then, run following line to re-install numpy
.
pip uninstall -y numpy
pip install -r requirements.txt
⚠️ Caveat 2: The bitsandbytes
version in requirements.txt
was causing ImportError: No bitsandbytes
error. So, I had to update to bitsandbytes==0.46.1
.
- bitsandbytes==0.44.0
+ bitsandbytes==0.46.1
Then, run:
pip uninstall -y bitsandbytes
pip install -r requirements.txt
⚠️ Caveat 3: At the time of writing, the captioning-related dependencies are commented out in requirements.txt
. You will need this for Section 3.2.
# for BLIP captioning
- # requests==2.28.2
- # timm==0.6.12
- # fairscale==0.4.13
+ # requests==2.28.2
+ # timm==1.0.17 # 0.6.12 was causing type issues, so had to upgrade.
+ # fairscale==0.4.13
Then:
pip install -r requirements.txt
2. Downloading the Models
Download Flux1 model and auto-encoder into the project root directory.
flux1-dev.safetensors
: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensorsae.safetensors
: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors
Download text encoders into sd3
directory (you will have to create a directory inside the project root.
sd3/clip_l.safetensors
: https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/text_encoders/clip_l.safetensorssd3/t5xxl_fp16.safetensors
: https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/text_encoders/t5xxl_fp16.safetensors
Note: If you already have these files downloaded somewhere, it may be easier to simply change the arguments in Section 4.
3. Prepare Dataset
3.1 Add Images
Now, let’s create and place some images into data/lora
folder.
3.2 Generate Captions
The sd-scripts
contain useful utilities to generate captions. You can use those to generate the captions for the images.
To generate the captions:
python finetune/make_captions.py data/lora --caption_extension .txt
This will generate .txt
file next to images. You can edit these values as seem fit.
3.3 Metadata
Finally, we need to create a metadata.json
file. If you generated captions using 3.2 above, you should have .caption
files next to each images. We will not run a script to combine those into metadata.json
file.
python finetune/merge_captions_to_metadata.py data/lora data/lora/metadata.json --caption_extension .txt
This will generate metadata.json
file which contains mapping of images to captions.
3.4 Toml File
Create and save dataset_1024_bs2.toml
file at project root directory. Schema can be found here.
[general]
shuffle_caption = true
caption_extension = '.txt'
keep_tokens = 1
flip_aug = false
color_aug = false
keep_tokens_separator= "|||"
caption_tag_dropout_rate = 0
[[datasets]]
resolution = [1024, 1024]
batch_size = 2
keep_tokens_separator= "|||"
[[datasets.subsets]]
image_dir = './data/lora'
metadata_file = './data/lora/metadata.json'
num_repeats = 1
caption_prefix = "[SPRITE 128x128 GAME ASSET] |||"
4. Train LoRA
Now, as instructed in the README section, let’s run the HuggingFace’s accelerate
command.
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 \
flux_train_network.py \
--pretrained_model_name_or_path flux1-dev.safetensors --clip_l sd3/clip_l.safetensors \
--t5xxl sd3/t5xxl_fp16.safetensors --ae ae.safetensors --cache_latents_to_disk \
--save_model_as safetensors --sdpa --persistent_data_loader_workers \
--max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 \
--save_precision bf16 --network_module networks.lora_flux --network_dim 64 \
--network_train_unet_only --optimizer_type adamw8bit --learning_rate 2e-4 \
--cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base \
--highvram --max_train_epochs 50 --save_every_n_epochs 1 \
--dataset_config dataset_1024_bs2.toml --output_dir path/to/output/dir \
--output_name flux-lora-name --timestep_sampling shift --discrete_flow_shift 3.1582 \
--model_prediction_type raw --guidance_scale 1.0
After few minutes, you should get a .safetensors
file generated.
5. Inference with LoRA
To generate images with LoRA applied, you can run following command.
python flux_minimal_inference.py \
--ckpt flux1-dev.safetensors \
--clip_l sd3/clip_l.safetensors \
--t5xxl sd3/t5xxl_fp16.safetensors \
--ae ae.safetensors \
--dtype bf16 \
--prompt "[SPRITE 128x128 GAME ASSET] chubby boy scout standing full body" \
--out path/to/output/dir \
--seed 1 \
--flux_dtype fp8 \
--offload \
--lora 'path/to/output/dir/flux-lora-name.safetensors;2.0'
If you want to check the result without LoRA applied, you can run it without the last --lora
line.
6. Results
The quality actually degrades in my example, but it does produce near perfect 128×128 grid alignment.
Without LoRA: The produced image is pixelated but it is not aligned with 128×128 grid. ❌
With LoRA: The produced image is near-perfectly aligned with 128×128 grid. ✅
7. Conclusion
Fine‑tuned on just three samples, you’ll immediately see that grid lock‑step improvement. Go ahead, add more data, bump those hyperparams, and watch it really pop.
— Sprited Dev
Subscribe to my newsletter
Read articles from Sprited Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
