In this post, we try running black-forest-labs/FLUX.1-dev on Nvidia RTX Pro 6000 machine.

AMD Ryzen 7 9800X3D 8-Core Processor
NVIDIA RTX PRO 6000 Blackwell Workstation Edition
Ubuntu 22.04

We using nvidia-driver-570-open.

NVIDIA-SMI 570.153.02
Driver Version: 570.153.02
CUDA Version: 12.8

and Python 3.12.

First, log into Huggingface:

huggingface-cli login

You have to access terms from: https://huggingface.co/black-forest-labs/FLUX.1-dev.

Let’s set up the repo:

mkdir sprite-flux
cd sprite-flux
poetry init --name sprite-flux --python ^3.12 --no-interaction
poetry env use python3.12
eval $(poetry env activate)
poetry source add --priority=explicit torch-cu128 https://download.pytorch.org/whl/cu128
poetry add --source torch-cu128 torch torchvision torchaudio
poetry add protobuf sentencepiece
poetry run python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())" # -> 2.7.1+cu128 12.8 True
poetry add diffusers transformers accelerate xformers

Python Script

# scripts/inference/run_flux_retro_lora.py
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Running it inside poetry env.

poetry run python scripts/inference/run_flux_retro_lora.py

This just takes forever since it’s not using any GPUs. Let’s move it to GPU.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev")

pipe.to("cuda")  # 👈 THIS is what makes it actually use your GPU

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]
image.save("output.png")

Now, it should produce something within reasonable amount of time.

If you run nvidia-smi you should see something like:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.153.02             Driver Version: 570.153.02     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 6000 Blac...    On  |   00000000:01:00.0 Off |                  Off |
| 48%   84C    P1            599W /  600W |   67227MiB /  97887MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1202      G   /usr/lib/xorg/Xorg                      165MiB |
|    0   N/A  N/A            1362      G   /usr/bin/gnome-shell                     23MiB |
|    0   N/A  N/A           14768      C   ...ux-b5L0JzLd-py3.12/bin/python      66986MiB |
+-----------------------------------------------------------------------------------------+

Completed in: 01 minute and 13 seconds.

Result is kinda amazing but quite slow.

Testing black-forest-labs/FLUX.1-dev

Subscribe to my newsletter

Sprited Dev

Sprited Dev