Journey into Running Comfy Headless on RunPod Serverless


Yesterday, I tried deploying Comfy in Fly.io and had some issues with machine timeouts. Perhaps there’s a shortage of machines there, or maybe my Docker image is just too large to handle.
Instead of continuing down that path, I decided to switch focus and try out RunPod Serverless. It already has a base Docker image for headless ComfyUI, which would fit our needs perfectly—if it delivers on performance.
https://github.com/runpod-workers/worker-comfyui
Prerequisite
Computer with Internet access.
Lunch money.
Easy Mode — Watch out for Rabbit Hole…
Here is the steps I took to get it up and running with Flux1.dev models. I just wanted to see it working. Instructions are based on the official documentation.
First, get RunPod account, and put your lunch money in there.
Go to https://hub.docker.com/r/runpod/worker-comfyui
Select from recent tags. I chose:
5.3.0-flux1-dev
.Copy the string after “docker pull” →
runpod/worker-comfyui:5.3.0-flux1-dev
Go to RunPod > Templates > Go to My Templates > New Template
Name: comfy-worker-poc
Serverless
Container Image:
runpod/worker-comfyui:5.3.0-flux1-dev
Container Disk: 30 GB
Then “Save Template.”
Now, go to RunPod > Serverless > New Endpoint
Click on “Import from Docker Registry”
Choose a template > comfy-worker-poc
Endpoint Name: comfy-worker-poc
Endpoint Type: Queue
Worker Type: GPU
GPU Configuration: 24GB PRO
Then “Deploy“
Once you see Workers in idle
, you can go to Serverless > comfy-worker-poc > Requests.
I tried copying and pasting test_input.json from the repo directly, but ran into an error: 'type': 'value_not_in_list', 'message': 'Value not in list', 'details': "ckpt_name: 'flux1-dev-fp8.safetensors' not in []"
Rabbit Hole
Feel free to skip this section entirely, just log of me embarrassing myself.
Looks like the checkpoint is not found. Let’s decipher the Dockerfile to see which Flux model variant is embedded in the images.
It has following lines of code, which suggests that since we selected flux1-dev
model type, it won’t download the quantized version of the model flux1-dev-fp8
.
RUN if [ "$MODEL_TYPE" = "flux1-dev" ]; then \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/unet/flux1-dev.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors && \
wget -q -O models/clip/clip_l.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors && \
wget -q -O models/clip/t5xxl_fp8_e4m3fn.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors && \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/vae/ae.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors; \
fi
RUN if [ "$MODEL_TYPE" = "flux1-dev-fp8" ]; then \
wget -q -O models/checkpoints/flux1-dev-fp8.safetensors https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors; \
fi
Let’s go back to the Request tab and replace flux-dev-fp8
to flux-dev
.
- "ckpt_name": "flux1-dev-fp8.safetensors"
+ "ckpt_name": "flux1-dev.safetensors"
Then click “Run.” Unfortunately, that didn’t work either.
"ckpt_name: 'flux1-dev.safetensors' not in []
Let’s check back on the Dockerfile. In fine prints, it is looking for HUGGINGFACE_ACCESS_TOKEN
, and since we haven’t provided this huggingface access token as environment variable, it would have failed to download the model (and interestingly, the flux1-dev-fp8 model is distributed without needing huggingface authentication. Little weird that non-quantized version requires authentication but the latter doesn’t).
RUN if [ "$MODEL_TYPE" = "flux1-dev" ]; then \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/unet/flux1-dev.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors && \
wget -q -O models/clip/clip_l.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors && \
wget -q -O models/clip/t5xxl_fp8_e4m3fn.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors && \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/vae/ae.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors; \
fi
RUN if [ "$MODEL_TYPE" = "flux1-dev-fp8" ]; then \
wget -q -O models/checkpoints/flux1-dev-fp8.safetensors https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors; \
fi
I was hoping to see wget failure in the logs but looks like the logs got trimmed and I didn’t see any logs that were emitted while bring up.
Anyhow, we need to now provide a HuggingFace tokens. Let’s create one. (It would be easier just switching to the quantized one but let’s try it anyway).
Go to HuggingFace and create your account (free). Then navigate to following Flux.1-dev page here: https://huggingface.co/black-forest-labs/FLUX.1-dev.
The access is only granted to people who agree to their terms and conditions. So accept the terms, and you should have access to the weights.
Now go to Hugging Face > Access Tokens > Create new token.
Token type: Read
Token name: runpod-comfy-worker
Copy the generated token and hop on over to RunPod > Serverless > comfy-worker-poc > Manage > Edit Endpoint > Environment Variables.
key:
HUGGINGFACE_ACCESS_TOKEN
value: Paste in your key
Once you save it, RunPod will start to roll the change out.
Once the rollout completes, let’s run the request again. I still got the same issue though.
ckpt_name: 'flux1-dev.safetensors' not in []
Perhaps the “rollout” isn’t going to re-run the commands in the Dockerfile. … Actually, yeah, the Docker image is already pre-built, so the RUN
steps inside Dockerfiles won’t run when we bring up the instance 🙈.
Let’s actually run the docker image locally to see what is going on.
docker pull --platform linux/amd64 runpod/worker-comfyui:5.3.0-flux1-dev
docker run --platform linux/amd64 -it runpod/worker-comfyui:5.3.0-flux1-dev
This will take some time depending on your network speed. It’s gonna take several hours on my Panera Bread Free Wifi 🙈.
Okay, either I move my ass to my office which has a slightly better internet or let’s switch to runpod/worker-comfyui:5.3.0-base
instead.
Go to RunPod > Serverless > comfy-worker-poc > Manage > Edit Endpoint > Docker Configuration
- Switch from
runpod/worker-comfyui:5.3.0-flux1-dev
torunpod/worker-comfyui:5.3.0-base
.
Once the rollout completes, let’s copy paste the exact text from test_input.json into Requests. 🥁 Nah. That didn’t work either. 🙈
Okay, okay, let’s breath in and then out a few times and try it again. A guy next me my table in Panera Bread is fighting with her sister over the phone for an hour, but it’s okay, let’s just breath in and out. It’s going to be okay.
The main issue is that I don’t have the visibility into the running process in the RunPod Serverless since there is no server to connect to. So, I can’t inspect what is inside the folders unless I run it as a Pod or just run it locally.
I actually have a Ubuntu server running with a better network connection, let’s ssh login.
docker pull --platform linux/amd64 runpod/worker-comfyui:5.3.0-flux1-dev
Hold on ✋ The image is 29.99 GB. That’s really big. Hmm, this one is loading Flux1-dev model which is already 23.8GB. Let’s switch to non base variant which comes with fp8 quantization. That one’s only 8.6 GB.
docker pull --platform linux/amd64 runpod/worker-comfyui:5.3.0-base
Okay good. Now let’s run it just to check what’s inside the models folder.
docker run --platform linux/amd64 -it runpod/worker-comfyui:5.3.0-base bash
--rm
→ ensures the container is deleted when you exit.-it
→ attaches an interactive TTY.bash
(orsh
) → gives you a shell inside the container.
Now, let’s check.
ls /comfyui/models/checkpoints
It returns nothing other than put_checkpoint_here
file.
ls /comfyui/models/diffusion_models
No luck here either.
find /comfyui/models -type f -name "*flux*"
Looking at README again, it actually says the base
variant does not come with any model. So, my assumption that it came with a flux1-dev-fp8 was wrong 😅. I should have seen it coming… My bad.
runpod/worker-comfyui:<version>-base
: Clean ComfyUI install with no models.
Okay, back to downloading Flux1-dev variant again. 🥹
I’m gonna try 5.2.0 instead of 5.3.0 because 5.3.0 may have had a regression.
docker pull --platform linux/amd64 runpod/worker-comfyui:5.2.0-flux1-dev
The 23.8GB out of 29.99 GB is Flux model, so I’m expecting it should be there somewhere.
While I’m downloading the docker image, let’s talk about SpriteDX architecture little bit. My inclination is to use headless Comfy as an workflow orchestration layer for SpriteDX image/video generation. We will create comfy workflows and store json exports inside the another python based api service. This api service will basically be gate keeping headless Comfy. We could simply expose Comfy /prompt
endpoint, but each of these calls are expensive and not vetted for security, so best to keep it behind a gating api layer. Gating API service will be living in the same docker image. It will interface with comfy service. Probably the worker-comfyui already has some sort of api frontend that gates the comfy. I shall look into it next.
Okay, download is done. Let’s check.
docker run --platform linux/amd64 -it runpod/worker-comfyui:5.2.0-flux1-dev bash
Where is it?
> find /comfyui/models -type f -name "*flux*"
/comfyui/models/unet/flux1-dev.safetensors
Little odd that it’s in unet
folder. The CheckpointLoaderSimple
mentioned in test_input.json
would only look at the checkpoints
folder. So, it will not find it.
The “unet” folder appears to be a legacy folder. Right now it is mapped to “diffusion_models”. And, diffusion models are looked up by “Load Diffusion Model“ node (UNETLoader
).
Unfortunately, UNETLoader only exports the diffusion model and not VAEs and text encoders, so I can’t simply replace it………. We will need to find the VAEs and text encoders and separately load them. Did they even test this out before putting up the Docker images? 🫠
Let’s look for VAEs and text encoders now.
> find /comfyui/models -type f -name "*.safetensors"
/comfyui/models/unet/flux1-dev.safetensors
/comfyui/models/clip/clip_l.safetensors
/comfyui/models/clip/t5xxl_fp8_e4m3fn.safetensors
/comfyui/models/vae/ae.safetensors
Okay, at least they exist. Now, we have to craft a workflow that will be able to pull up these models. Luckly I have an comfy instance that is running in my server. Let me create one quickly.
We open ComfyUI and pick Flux1-dev template. It uses Load Checkpoint node (CheckpointLoaderSimple
).
Let’s switch it up with “Load Diffusion Model” node, “Load VAE” and “Load CLIP“ node.
Then click on ComfyUI > Workflow > Export (API).
{
"6": {
"inputs": {
"text": "cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open placing a fancy black forest cake with candles on top of a dinner table of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere there are paintings on the walls",
"clip": [
"41",
0
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Positive Prompt)"
}
},
"8": {
"inputs": {
"samples": [
"31",
0
],
"vae": [
"40",
0
]
},
"class_type": "VAEDecode",
"_meta": {
"title": "VAE Decode"
}
},
"9": {
"inputs": {
"filename_prefix": "ComfyUI",
"images": [
"8",
0
]
},
"class_type": "SaveImage",
"_meta": {
"title": "Save Image"
}
},
"27": {
"inputs": {
"width": 1024,
"height": 1024,
"batch_size": 1
},
"class_type": "EmptySD3LatentImage",
"_meta": {
"title": "EmptySD3LatentImage"
}
},
"31": {
"inputs": {
"seed": 880604085770567,
"steps": 20,
"cfg": 1,
"sampler_name": "euler",
"scheduler": "simple",
"denoise": 1,
"model": [
"39",
0
],
"positive": [
"35",
0
],
"negative": [
"33",
0
],
"latent_image": [
"27",
0
]
},
"class_type": "KSampler",
"_meta": {
"title": "KSampler"
}
},
"33": {
"inputs": {
"text": "",
"clip": [
"41",
0
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Negative Prompt)"
}
},
"35": {
"inputs": {
"guidance": 3.5,
"conditioning": [
"6",
0
]
},
"class_type": "FluxGuidance",
"_meta": {
"title": "FluxGuidance"
}
},
"39": {
"inputs": {
"unet_name": "flux1-dev.safetensors",
"weight_dtype": "default"
},
"class_type": "UNETLoader",
"_meta": {
"title": "Load Diffusion Model"
}
},
"40": {
"inputs": {
"vae_name": "ae.safetensors"
},
"class_type": "VAELoader",
"_meta": {
"title": "Load VAE"
}
},
"41": {
"inputs": {
"clip_name": "clip_l.safetensors",
"type": "stable_diffusion",
"device": "default"
},
"class_type": "CLIPLoader",
"_meta": {
"title": "Load CLIP"
}
}
}
We need to wrap this in the request format expected in RunPod Serverless.
{
"input": {
"workflow": …above stuff…
}
}
Now go back to RunPod > Serverless > comfy-worker-poc > Requests.
Then replace the workflow
section with above JSON. You can also remove images section.
Then, “Run” Fingers crossed 🫰. If things are working correctly, you should see “Running“ and the $/s appearing.
Not quite sure why, but my request is just sitting in the queue and not executing. I can see one node is “running” since 5 minutes ago. Perhaps it takes a while because of the cold start.
Oh, shoot, actually, I’ve configured the base image to be using base variant which contains no models. 😇 🐇🐇🐇🐇🐇🐇🐇🐇🐇🐇🐇
Updated to runpod/worker-comfyui:5.2.0-flux1-dev
and redeploy.
Even with this, the workflow didn’t work.
As much as I hate to admit, I don’t see a good way to run proper Flux text-to-image on this prebuilt image.
TL;DR: Couldn’t run it in Easy Mode.
Hard Mode
Let’s fix up the Dockerfile ourself so that the flux file appears in the correct location.
git@github.com:runpod-workers/worker-comfyui.git
cd worker-comfyui
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Let’s then replace unet
in Dockerfile.
RUN if [ "$MODEL_TYPE" = "flux1-schnell" ]; then \
- wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/unet/flux1-schnell.safetensors https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors && \
+ wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/checkpoints/flux1-schnell.safetensors https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors && \
wget -q -O models/clip/clip_l.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors && \
wget -q -O models/clip/t5xxl_fp8_e4m3fn.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors && \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/vae/ae.safetensors https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors; \
fi
RUN if [ "$MODEL_TYPE" = "flux1-dev" ]; then \
- wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/unet/flux1-dev.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors && \
+ wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/checkpoints/flux1-dev.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors && \
wget -q -O models/clip/clip_l.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors && \
wget -q -O models/clip/t5xxl_fp8_e4m3fn.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors && \
wget -q --header="Authorization: Bearer ${HUGGINGFACE_ACCESS_TOKEN}" -O models/vae/ae.safetensors https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors; \
fi
Then push it to a fork (in my case: https://github.com/kndlt/worker-comfyui).
Then, we must provide some environment variables. We create .env
file (already gitignored) with following:
HUGGINGFACE_ACCESS_TOKEN=<hugging face access token from earlier>
Let’s then try to trigger the build. Let’s build the flux-dev-fp8
variant.
docker buildx bake flux1-dev-fp8 --env-file .env
It should take 10-20 minutes to build the image. Let’s test it once it is built.
docker run --rm -it -p 8000:8000 runpod/worker-comfyui:latest-flux1-dev-fp8
This actually failed with this message:
Traceback (most recent call last):
File "/comfyui/main.py", line 132, in <module>
import execution
File "/comfyui/execution.py", line 14, in <module>
import comfy.model_management
File "/comfyui/comfy/model_management.py", line 221, in <module>
total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
^^^^^^^^^^^^^^^^^^
File "/comfyui/comfy/model_management.py", line 172, in get_torch_device
return torch.device(torch.cuda.current_device())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1071, in current_device
_lazy_init()
File "/opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 412, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
It looks like to use CUDA inside containers, I need to install NVIDIA Container Toolkit on my server. Installed it according to their instructions. Let’s confirm that container is able to detect CUDA.
> docker run --rm --gpus all nvidia/cuda:12.6.3-runtime-ubuntu24.04 nvidia-smi
==========
== CUDA ==
==========
CUDA Version 12.6.3
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Tue Aug 19 21:41:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.153.02 Driver Version: 570.153.02 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... On | 00000000:01:00.0 Off | Off |
| 30% 30C P8 14W / 300W | 24716MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Looks good so far. Let’s try running the built container again.
docker run --rm -it -p 8000:8000 --gpus all runpod/worker-comfyui:latest-flux1-dev-fp8
Finally, it worked, at least in my node! 🧚
DEBUG | local_test | Handler output: {'images': [{'filename': 'ComfyUI_00001_.png', 'type': 'base64', 'data': 'iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAEAAElEQVR4Xlz92bNsSZbeh/2+5b4j4kx3yLyZNXR1dTW6GxQI0ACIEsUHmkwPetEDX/Uv0wyCiZTBREoY2EQ3Gqiurs7KvMM5EbHd16eH5XEyoX3vOSdib/fla/jW4L59R+i/+OZJ6ZRR
…
Let’s push this to Dockerhub now. First, we need to tag it to be under our repo name.
docker tag runpod/worker-comfyui:latest-flux1-dev-fp8 sprited/worker-comfyui:latest-flux1-dev-fp8
Then, push.
docker login # login
docker push sprited/worker-comfyui:latest-flux1-dev-fp8
The image is pretty large, so pushing will take some time.
Pushed sprited/worker-comfyui:latest-flux1-dev-fp8. Yay!
Now, go back to RunPod > Serverless > comfy-worker-poc > Manage > Edit Endpoint > Docker Configuration, then switch to this image.
Then deploy 🚀.
While it’s deploying, let me see if I can pull up some comparison between RunPod and Fly.io.
RunPod is mostly GPU hosting service that caters towards AI workloads, and Fly.io is mostly for app hosting. So, when I tried to run 30GB docker image fly.io is probably going to choke.
My thinking was that, I would use other inference service providers like fal.ai and Replicate to do the heavy lifting so, I wouldn’t need GPUs for the orchestration layer. However, if you want to use headless Comfy, it basically requires you to run on Nvidia CUDA, so no luck there.
We could in theory fork ComfyUI to work with cpu version of pytorch. That might reduce the image size substantially. Perhaps that is a good cost saving measure. At this time though, I need an infra and testbed that just works, so I will stick with the full CUDA comfy.
Comfy being open source, there is also a possibility of forking it and creating a more lightweight version without CUDA requirement which caters to forks using apple devices. That would be an interesting approach.
Why RunPod Serverless? Okay, RunPod Serverless works out pretty well because if there are no calls I don’t have to pay a penny. Also deployments don’t cost any money. So far today, I’ve spent something like 5 cent from my node.
Another benefit is that it works stateless and ephemeral fashion so there is less privacy and security to worry about. It systemically isn’t able to retain any data outside its ephemeral lifetime.
Roll out is completed. Let’s test it out. Go to Requests, and copy and paste test_input.json mentioned above. And the request succeeds now.
The generated image will be in the “images“ section. It will be returned in base64 encoding.
We extract the output.images[0].data
and prefix it with data:image/png;base64,
then use the tools like https://www.site24x7.com/tools/datauri-to-image.html to render the image.
So, now I have a serverless RunPod that can run image generations. 🥳
Next steps:
Post pull request to https://hub.docker.com/r/runpod/worker-comfyui for the
sed/unet/checkpoints
fix.Try calling it directly from SpriteDX Web UI to do a sample generation.
Gotta go.
— Sprited Dev 🌱
Subscribe to my newsletter
Read articles from Sprited Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
