How to Install ComfyUI + Nunchaku FLUX.1-dev - Lightning Fast AI Image Generation

8 min read

Introduction
ComfyUI
+Nunchaku FLUX.1-dev
represents a breakthrough in AI image generation performance. By combining ComfyUI's node-based workflow interface with MIT Han Lab's revolutionary SVDQuant 4-bit quantization technology, this setup delivers 3.0× speedups and 3.6× memory reduction compared to standard FLUX.1-dev implementations. In my testing on Windows 11 + RTX 3080 10GB, image generation times dropped from 40+ seconds to around 11-12 seconds while maintaining exceptional quality. This makes Nunchaku FLUX.1-dev one of the most practical solutions for local AI image generation in 2025.
Features
- Revolutionary Performance: SVDQuant's 4-bit quantization delivers 3.0× speedups over NF4 W4A16 baseline while maintaining visual fidelity
- Memory Efficiency: 3.6× memory reduction enables 12B FLUX.1-dev to run comfortably on 8GB+ RTX cards without CPU offloading
- Easy Installation: Unlike traditional quantization methods requiring hours of compilation, Nunchaku provides pre-built wheels for instant deployment
- Broad GPU Compatibility: Native support for RTX 20xx, 30xx, 40xx, and 50xx series cards through optimized CUDA kernels
- Professional Workflow Integration: Seamless ComfyUI integration with LoRA, ControlNet, and multi-model support
- Production-Ready Stability: ICLR 2025 Spotlight paper backing ensures academic rigor and reliability
Prerequisites
- Operating System: Windows 11 (tested) or Windows 10 with latest updates
- GPU: NVIDIA RTX series with 8GB+ VRAM (10GB+ recommended for FLUX.1-dev)
- System RAM: 16GB minimum, 32GB recommended
- Storage: 15GB+ free space for models and dependencies
- Python: Python 3.12 recommended (ComfyUI Desktop handles this automatically)
Installing ComfyUI Desktop
ComfyUI Desktop
provides the most streamlined installation experience, eliminating Python environment management complexities. [Download Link]
Essential File Downloads
- The following models are required for
Nunchaku FLUX.1-dev
operation. Download each file to its specified directory within your ComfyUI installation:- Nunchaku FLUX.1-dev Model (6.77GB) → models/diffusion_models/
- Nunchaku FLUX.1-Krea-dev Model (6.77GB) → models/diffusion_models/
- Nunchaku FLUX.1-Kontext-dev Model (6.77GB) → models/diffusion_models/
- PuLID Flux Model v0.9.1 (1.14GB) → models/pulid/
- VAE (Variational Autoencoder) → models/vae/
- Text Encoder: t5xxl_fp16 → models/clip/
- Text Encoder: clip_l → models/clip/
- Vision Encoder: EVA02_CLIP_L_336_psz14_s6B → models/clip/
- FLUX.1-Turbo LoRA for Even Faster Generation → models/loras/
- Nunchaku Wheel Installer Workflow → user/default/workflows/
- Nunchaku FLUX.1-dev Example Workflow → user/default/workflows/
- Nunchaku FLUX.1-Kontext-dev Example Workflow → user/default/workflows/
- Nunchaku FLUX.1-dev PuLID Example Workflow → user/default/workflows/
Installing ComfyUI-nunchaku Plugin
- The
Nunchaku
plugin provides essential nodes for 4-bit quantized model loading and inference.
Run [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ [ComfyUI-nunchaku] (Check)
→ [Install]
→ Restart [ComfyUI]
Installing Nunchaku Backend
- This step installs the actual quantization engine that powers the performance improvements.
Run [ComfyUI]
→ [Workflow]
→ [Open]
→ install_wheel.json (Double Click)
→ [Nunchanku Wheel Installer] (Click)
→ version: [v0.3.1] (Select)
→ [Preview Any] (Click)
→ [▷ Execute] (Click)
→ Wait for confirmation: "Successfully installed nunchaku..."
→ Restart [ComfyUI]
[Advanced] Manual Nunchaku Backend Installation
- For users requiring manual control or troubleshooting installation issues:
# Open PowerShell as Administrator
# Navigate to ComfyUI directory
PS> cd .\ComfyUI\
PS> .\.venv\Scripts\Activate.ps1
# Install Nunchaku dependencies
PS> pip install -r custom_nodes\ComfyUI-nunchaku\requirements.txt
PS> pip install nunchaku --upgrade
# Install additional dependencies if needed
PS> pip install facexlib insightface onnxruntime
# Verify installation
PS> python -c "import nunchaku; print(nunchaku.__version__)"
Running Your First Nunchaku FLUX.1-dev Generation
Run [ComfyUI]
→ [Workflow]
→ [Open]
→ nunchaku-flux.1-dev.json (select)
→ Set your prompt in the text input node
→ [▷ Run]
- I applied the following additional configurations to the example workflow provided by Nunchaku and conducted multiple image generation tests. The test results confirmed very fast image generation averaging 11-12 seconds with high quality output.
Nunchaku Flux DiT Loader
* model_path: [svdq-int4_r32-flux.1-dev.safetensors] # INT4 quantized model
* cache_threshold: 0
# Performance optimization with FP16 attention
* attention: [nunchaku-fp16]
# Mixed precision computation
* data_type: [bfloat16]
Nunchaku Flux.1 LoRA Loader
# Speed enhancement, high-quality generation with fewer steps
* lora_name: [flux-1.turbo-alpha.safetensors]
* lora_strength: 1.0
Nunchaku Flux.1 LoRA Loader
# Enhanced realistic human representation
* lora_name: [flux_realism_lora.safetensors]
* lora_strength: 0.7
Nunchaku Text Encoder Loader
* text_encoder1: [t5xxl_fp16.safetensors]
* text_encoder2: [clip_l.safetensors]
FluxGuidance
# Balance between prompt adherence and creativity
# Values below [5] cause watercolor effects due to under-guidance artifacts.
* guidance: 5
BasicScheduler
# Stable noise reduction
# [beta] scheduler removes noise more efficiently at beginning/end steps, preserving high-frequency details vs [simple] scheduler
* scheduler: [beta]
# Low-step generation enabled by Turbo LoRA
* steps: 8
Multiply Sigmas
# Fine-tuning sigma values for detail enhancement
* factor: 0.960
* start: 0.950
* end: 0.980
Width:
* value: 896
Height
* value: 1152
[Tip] Multiply Sigmas: Maximizing Detail in Mechanical and Portrait Generation
Multiply Sigmas
functions as an independent node in ComfyUI that significantly enhances detail quality in mechanical objects and portraits, effectively reducing the characteristic AI-generated appearance. [Related Link]- The most recommended configuration is:
Guidance: 4.5
+Scheduler: Beta
+Multiply Sigmas: 0.96
. - This feature becomes available after installing the
ComfyUI-Detail-Daemon
custom node package in ComfyUI.
# Installing [ComfyUI-Detail-Daemon]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [ComfyUI-Detail-Daemon]
→ [Install]
→ Restart [ComfyUI]
- After installation, you can add the
Multiply Sigmas
node to your workflow as follows:
# [1] Adding [Multiply Sigmas] node to workflow
(Right-click on empty space in workflow canvas)
→ [Add Node]
→ [sampling]
→ [custom_sampling]
→ [sigmas]
→ [Multiply Sigmas (stateless)]
→ factor: 0.96
→ start: 0.95
→ end: 0.98
# [2] Connect [BasicScheduler]'s SIGMAS output to [Multiply Sigmas] input
# [3] Connect [Multiply Sigmas] output to [SamplerCustomAdvanced]'s sigmas input
# Correct Node Connection Sequence
# [BasicScheduler] → [Multiply Sigmas] → [SamplerCustomAdvanced]
[Tip] Face Detailer: Maximizing Facial Detail Enhancement for Characters
Face Detailer
is a powerful feature that detects and enhances facial details in generated images. This is particularly useful for full-body character shots where facial details tend to be significantly degraded. Face Detailer helps maintain and improve these crucial details.- This feature becomes available after installing both the
ComfyUI Impact Pack
andComfyUI Impact Subpack
custom node packages in ComfyUI.
# Installing [ComfyUI Impack Pack] and [ComfyUI Impack Subpack]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [ComfyUI Impack Pack]
→ [Install]
→ Search [ComfyUI Impack Subpack]
→ [Install]
→ Restart [ComfyUI]
- After installation, you can add the
FaceDetailer
node to your workflow as follows:
# Adding [FaceDetailer] node to workflow
(Right-click on empty space in workflow canvas)
→ [Add Node]
→ [ImpactPack]
→ [FaceDetailer]
# Recommended parameters for [Nunchaku FLUX.1-dev]
→ guide_size: 512
→ guide_size_for: [crop_region]
→ max_size: 1024
→ steps: 8
→ cfg: 1.0
→ sampler_name: [euler]
→ scheduler: [beta]
→ denoise: 0.50
→ feather: 5
→ drop_size: 10
# Adding [CLIP Text Encode (Negative Prompt)] node to workflow and type below text
low quality, blurry, bad anatomy, worst quality, low resolution, heavy makeup, rough skin, harsh texture, skin imperfections, overly detailed skin, artificial skin, dirty skin, skin imperfections, acne, blackheads, wrinkles, aged skin, damaged skin, oily skin, uneven skin tone, overly detailed skin, harsh skin texture, artificial skin, large pores, visible pores, textured skin, coarse skin, bumpy skin, weathered skin, leathery skin, sun damaged skin, scarred skin, blemished skin, unsmooth skin, grainy skin, patchy skin, peach fuzz, vellus hair
[Tip] res_2s + bong_tangent: Superior Image Generation with Advanced Sampling
- Sampler
res_2s
combined with Schedulerbong_tangent
delivers the highest quality image generation. [Related Link] - Technical Details:
res_2s
: Uses 2-stage substeps per step, requiring two model calls per step (slower but higher quality than single-stage samplers)bong_tangent
: BONGMATH technology enables bidirectional denoising, processing both forward and backward simultaneously for more accurate sampling
- These features are available by installing the
RES4LYF
custom node package in ComfyUI.)
# Installing [RES4LYF]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [RES4LYF]
→ [Install]
→ Restart [ComfyUI]
- Once installed, you can configure them in
KSamplerSelect
andBasicScheduler
as follows:
KSamplerSelect
# Performs Multistage Sampling (RES Multistage Exponential Integrator)
* sampler_name: [res_2s]
BasicScheduler
# Performs bidirectional denoising (BONGMATH Technology)
* scheduler: [bong_tangent]
* steps: 8
* denoise: 1.00
[Tip] FLUX.1-Krea-dev Best Practices & Optimization
FLUX.1-Krea-dev
is a collaborative model released by Black Forest Labs and Krea AI, featuring an opinionated aesthetic philosophy that emphasizes natural texture, realistic tone, and enhanced detail rendering to completely eliminate the characteristic AI look of FLUX models—including plastic-like skin and oversaturation—pursuing extreme photorealism.- The model demonstrates improved prompt adherence capabilities compared to the base FLUX.1-dev model. Detailed descriptions of temporal context, color grading, composition, and fine details particularly leverage the model's strengths in natural texture and realistic rendering.
- Maintains 100% architectural compatibility with FLUX.1-dev as a drop-in replacement. Recommended settings:
- model:
svdq-int4_r32-flux.1-krea.dev.safetensors
(Nunchaku version) - sampler_name:
res_2s
- scheduler:
bong_tangent
- steps: 8
- denoise: 1.0
- guidance: 5.0
- width x height : 864 x 1152
- loras:
- lora_name:
Flux_Krea_Blaze_Lora-rank32.safetensors
, lora_strength: 1.00 - lora_name: [your-style-lora], lora_strength: 0.50
- lora_name: [your-character-lora], lora_strength: 0.50
- lora_name:
SameFace_Fix.safetensors
, lora_strength: -0.70
- lora_name:
- model:
[Tip] FLUX.1-Kontext-dev Best Practices & Optimization
- Preserve Original Image Size: Set the
FluxKontextImageScale
node to Bypass mode to maintain the input image's original dimensions. This node typically scales images to optimal resolutions for FLUX processing (usually under 2.1MP) and reduces VRAM usage, but bypassing it preserves your desired output size. - Minimize Facial Changes: Set the denoise strength parameter to 0.85 or lower in the
KSampler
orBasicScheduler
nodes. The default value of 1.0 completely replaces the input image with noise, while lower values preserve more original image characteristics. Values between 0.75-0.85 provide the optimal balance between edit quality and identity preservation. - Use Multiple FLUX.1-dev LoRAs: You can load and combine multiple LoRA models trained on the FLUX.1-dev base model. Connect
Nunchaku FLUX LoRA Loader
nodes to the output of theNunchaku FLUX DiT Loader
node and specify your desired LoRA files.
Personal Note
- After extensive testing across various hardware configurations,
Nunchaku FLUX.1-dev
has become my go-to solution for high-quality, fast AI image generation. The combination of academic rigor (ICLR 2025 Spotlight), practical performance gains, and seamless ComfyUI integration makes this the most compellingFLUX.1-dev
implementation available in 2025. The 12-20 second generation times on RTX 3080 10GB represent a significant improvement that makes AI image generation genuinely practical for iterative creative workflows.
References
- https://github.com/mit-han-lab/nunchaku
- https://hanlab.mit.edu/blog/svdquant
- https://github.com/mit-han-lab/ComfyUI-nunchaku
- https://huggingface.co/black-forest-labs/FLUX.1-dev
- https://docs.comfy.org/
- https://comfy.icu/extension/mit-han-lab__ComfyUI-nunchaku
- https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad
- FLUX.1-Krea & the Rise of Opinionated Models - Drew Breunig
0
Subscribe to my newsletter
Read articles from Taehyeong Lee directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
