Introduction

ComfyUI + Nunchaku FLUX.1-dev represents a breakthrough in AI image generation performance. By combining ComfyUI's node-based workflow interface with MIT Han Lab's revolutionary SVDQuant 4-bit quantization technology, this setup delivers 3.0× speedups and 3.6× memory reduction compared to standard FLUX.1-dev implementations. In my testing on Windows 11 + RTX 3080 10GB, image generation times dropped from 40+ seconds to around 11-12 seconds while maintaining exceptional quality. This makes Nunchaku FLUX.1-dev one of the most practical solutions for local AI image generation in 2025.

Features

Revolutionary Performance: SVDQuant's 4-bit quantization delivers 3.0× speedups over NF4 W4A16 baseline while maintaining visual fidelity
Memory Efficiency: 3.6× memory reduction enables 12B FLUX.1-dev to run comfortably on 8GB+ RTX cards without CPU offloading
Easy Installation: Unlike traditional quantization methods requiring hours of compilation, Nunchaku provides pre-built wheels for instant deployment
Broad GPU Compatibility: Native support for RTX 20xx, 30xx, 40xx, and 50xx series cards through optimized CUDA kernels
Professional Workflow Integration: Seamless ComfyUI integration with LoRA, ControlNet, and multi-model support
Production-Ready Stability: ICLR 2025 Spotlight paper backing ensures academic rigor and reliability

Prerequisites

Operating System: Windows 11 (tested) or Windows 10 with latest updates
GPU: NVIDIA RTX series with 8GB+ VRAM (10GB+ recommended for FLUX.1-dev)
System RAM: 16GB minimum, 32GB recommended
Storage: 15GB+ free space for models and dependencies
Python: Python 3.12 recommended (ComfyUI Desktop handles this automatically)

Installing ComfyUI Desktop

ComfyUI Desktop provides the most streamlined installation experience, eliminating Python environment management complexities. [Download Link]

Essential File Downloads

The following models are required for Nunchaku FLUX.1-dev operation. Download each file to its specified directory within your ComfyUI installation:
- Nunchaku FLUX.1-dev Model (6.77GB) → models/diffusion_models/
- Nunchaku FLUX.1-Krea-dev Model (6.77GB) → models/diffusion_models/
- Nunchaku FLUX.1-Kontext-dev Model (6.77GB) → models/diffusion_models/
- PuLID Flux Model v0.9.1 (1.14GB) → models/pulid/
- VAE (Variational Autoencoder) → models/vae/
- Text Encoder: t5xxl_fp16 → models/clip/
- Text Encoder: clip_l → models/clip/
- Vision Encoder: EVA02_CLIP_L_336_psz14_s6B → models/clip/
- FLUX.1-Turbo LoRA for Even Faster Generation → models/loras/
- Nunchaku Wheel Installer Workflow → user/default/workflows/
- Nunchaku FLUX.1-dev Example Workflow → user/default/workflows/
- Nunchaku FLUX.1-Kontext-dev Example Workflow → user/default/workflows/
- Nunchaku FLUX.1-dev PuLID Example Workflow → user/default/workflows/

Installing ComfyUI-nunchaku Plugin

The Nunchaku plugin provides essential nodes for 4-bit quantized model loading and inference.

Run [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ [ComfyUI-nunchaku] (Check)
→ [Install]
→ Restart [ComfyUI]

Installing Nunchaku Backend

This step installs the actual quantization engine that powers the performance improvements.

Run [ComfyUI]
→ [Workflow]
→ [Open]
→ install_wheel.json (Double Click)
→ [Nunchanku Wheel Installer] (Click)
→ version: [v0.3.1] (Select)
→ [Preview Any] (Click)
→ [▷ Execute] (Click)
→ Wait for confirmation: "Successfully installed nunchaku..."
→ Restart [ComfyUI]

[Advanced] Manual Nunchaku Backend Installation

For users requiring manual control or troubleshooting installation issues:

# Open PowerShell as Administrator
# Navigate to ComfyUI directory
PS> cd .\ComfyUI\
PS> .\.venv\Scripts\Activate.ps1

# Install Nunchaku dependencies
PS> pip install -r custom_nodes\ComfyUI-nunchaku\requirements.txt
PS> pip install nunchaku --upgrade

# Install additional dependencies if needed
PS> pip install facexlib insightface onnxruntime

# Verify installation
PS> python -c "import nunchaku; print(nunchaku.__version__)"

Running Your First Nunchaku FLUX.1-dev Generation

Run [ComfyUI]
→ [Workflow]
→ [Open]
→ nunchaku-flux.1-dev.json (select)
→ Set your prompt in the text input node
→ [▷ Run]

I applied the following additional configurations to the example workflow provided by Nunchaku and conducted multiple image generation tests. The test results confirmed very fast image generation averaging 11-12 seconds with high quality output.

Nunchaku Flux DiT Loader
* model_path: [svdq-int4_r32-flux.1-dev.safetensors] # INT4 quantized model
* cache_threshold: 0
# Performance optimization with FP16 attention
* attention: [nunchaku-fp16]
# Mixed precision computation
* data_type: [bfloat16]

Nunchaku Flux.1 LoRA Loader
# Speed enhancement, high-quality generation with fewer steps
* lora_name: [flux-1.turbo-alpha.safetensors]
* lora_strength: 1.0

Nunchaku Flux.1 LoRA Loader
# Enhanced realistic human representation
* lora_name: [flux_realism_lora.safetensors]
* lora_strength: 0.7

Nunchaku Text Encoder Loader
* text_encoder1: [t5xxl_fp16.safetensors]
* text_encoder2: [clip_l.safetensors]

FluxGuidance
# Balance between prompt adherence and creativity
# Values below [5] cause watercolor effects due to under-guidance artifacts.
* guidance: 5

BasicScheduler
# Stable noise reduction
# [beta] scheduler removes noise more efficiently at beginning/end steps, preserving high-frequency details vs [simple] scheduler
* scheduler: [beta]
# Low-step generation enabled by Turbo LoRA
* steps: 8

Multiply Sigmas
# Fine-tuning sigma values for detail enhancement
* factor: 0.960
* start: 0.950
* end: 0.980

Width:
* value: 896

Height
* value: 1152

[Tip] Multiply Sigmas: Maximizing Detail in Mechanical and Portrait Generation

Multiply Sigmas functions as an independent node in ComfyUI that significantly enhances detail quality in mechanical objects and portraits, effectively reducing the characteristic AI-generated appearance. [Related Link]
The most recommended configuration is: Guidance: 4.5 + Scheduler: Beta + Multiply Sigmas: 0.96.
This feature becomes available after installing the ComfyUI-Detail-Daemon custom node package in ComfyUI.

# Installing [ComfyUI-Detail-Daemon]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [ComfyUI-Detail-Daemon]
→ [Install]
→ Restart [ComfyUI]

After installation, you can add the Multiply Sigmas node to your workflow as follows:

# [1] Adding [Multiply Sigmas] node to workflow
(Right-click on empty space in workflow canvas)
→ [Add Node]
→ [sampling]
→ [custom_sampling]
→ [sigmas]
→ [Multiply Sigmas (stateless)]
→ factor: 0.96
→ start: 0.95
→ end: 0.98

# [2] Connect [BasicScheduler]'s SIGMAS output to [Multiply Sigmas] input
# [3] Connect [Multiply Sigmas] output to [SamplerCustomAdvanced]'s sigmas input

# Correct Node Connection Sequence
# [BasicScheduler] → [Multiply Sigmas] → [SamplerCustomAdvanced]

[Tip] Face Detailer: Maximizing Facial Detail Enhancement for Characters

Face Detailer is a powerful feature that detects and enhances facial details in generated images. This is particularly useful for full-body character shots where facial details tend to be significantly degraded. Face Detailer helps maintain and improve these crucial details.
This feature becomes available after installing both the ComfyUI Impact Pack and ComfyUI Impact Subpack custom node packages in ComfyUI.

# Installing [ComfyUI Impack Pack] and [ComfyUI Impack Subpack]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [ComfyUI Impack Pack]
→ [Install]
→ Search [ComfyUI Impack Subpack]
→ [Install]
→ Restart [ComfyUI]

After installation, you can add the FaceDetailer node to your workflow as follows:

# Adding [FaceDetailer] node to workflow
(Right-click on empty space in workflow canvas)
→ [Add Node]
→ [ImpactPack]
→ [FaceDetailer]

# Recommended parameters for [Nunchaku FLUX.1-dev]
→ guide_size: 512
→ guide_size_for: [crop_region]
→ max_size: 1024
→ steps: 8
→ cfg: 1.0
→ sampler_name: [euler]
→ scheduler: [beta]
→ denoise: 0.50
→ feather: 5
→ drop_size: 10

# Adding [CLIP Text Encode (Negative Prompt)] node to workflow and type below text
low quality, blurry, bad anatomy, worst quality, low resolution, heavy makeup, rough skin, harsh texture, skin imperfections, overly detailed skin, artificial skin, dirty skin, skin imperfections, acne, blackheads, wrinkles, aged skin, damaged skin, oily skin, uneven skin tone, overly detailed skin, harsh skin texture, artificial skin, large pores, visible pores, textured skin, coarse skin, bumpy skin, weathered skin, leathery skin, sun damaged skin, scarred skin, blemished skin, unsmooth skin, grainy skin, patchy skin, peach fuzz, vellus hair

[Tip] res_2s + bong_tangent: Superior Image Generation with Advanced Sampling

Sampler res_2s combined with Scheduler bong_tangent delivers the highest quality image generation. [Related Link]
Technical Details:
- res_2s: Uses 2-stage substeps per step, requiring two model calls per step (slower but higher quality than single-stage samplers)
- bong_tangent: BONGMATH technology enables bidirectional denoising, processing both forward and backward simultaneously for more accurate sampling
These features are available by installing the RES4LYF custom node package in ComfyUI.)

# Installing [RES4LYF]
Launch [ComfyUI]
→ [Manager]
→ [Custom Nodes Manager]
→ Search [RES4LYF]
→ [Install]
→ Restart [ComfyUI]

Once installed, you can configure them in KSamplerSelect and BasicScheduler as follows:

KSamplerSelect
# Performs Multistage Sampling (RES Multistage Exponential Integrator)
* sampler_name: [res_2s]

BasicScheduler
# Performs bidirectional denoising (BONGMATH Technology)
* scheduler: [bong_tangent]
* steps: 8
* denoise: 1.00

[Tip] FLUX.1-Krea-dev Best Practices & Optimization

FLUX.1-Krea-dev is a collaborative model released by Black Forest Labs and Krea AI, featuring an opinionated aesthetic philosophy that emphasizes natural texture, realistic tone, and enhanced detail rendering to completely eliminate the characteristic AI look of FLUX models—including plastic-like skin and oversaturation—pursuing extreme photorealism.
The model demonstrates improved prompt adherence capabilities compared to the base FLUX.1-dev model. Detailed descriptions of temporal context, color grading, composition, and fine details particularly leverage the model's strengths in natural texture and realistic rendering.
Maintains 100% architectural compatibility with FLUX.1-dev as a drop-in replacement. Recommended settings:
- model: svdq-int4_r32-flux.1-krea.dev.safetensors (Nunchaku version)
- sampler_name: res_2s
- scheduler: bong_tangent
- steps: 8
- denoise: 1.0
- guidance: 5.0
- width x height : 864 x 1152
- loras:
  - lora_name: Flux_Krea_Blaze_Lora-rank32.safetensors, lora_strength: 1.00
  - lora_name: [your-style-lora], lora_strength: 0.50
  - lora_name: [your-character-lora], lora_strength: 0.50
  - lora_name: SameFace_Fix.safetensors, lora_strength: -0.70

[Tip] FLUX.1-Kontext-dev Best Practices & Optimization

Preserve Original Image Size: Set the FluxKontextImageScalenode to Bypass mode to maintain the input image's original dimensions. This node typically scales images to optimal resolutions for FLUX processing (usually under 2.1MP) and reduces VRAM usage, but bypassing it preserves your desired output size.
Minimize Facial Changes: Set the denoise strength parameter to 0.85 or lower in the KSampler or BasicScheduler nodes. The default value of 1.0 completely replaces the input image with noise, while lower values preserve more original image characteristics. Values between 0.75-0.85 provide the optimal balance between edit quality and identity preservation.
Use Multiple FLUX.1-dev LoRAs: You can load and combine multiple LoRA models trained on the FLUX.1-dev base model. Connect Nunchaku FLUX LoRA Loader nodes to the output of the Nunchaku FLUX DiT Loader node and specify your desired LoRA files.

Personal Note

After extensive testing across various hardware configurations, Nunchaku FLUX.1-dev has become my go-to solution for high-quality, fast AI image generation. The combination of academic rigor (ICLR 2025 Spotlight), practical performance gains, and seamless ComfyUI integration makes this the most compelling FLUX.1-dev implementation available in 2025. The 12-20 second generation times on RTX 3080 10GB represent a significant improvement that makes AI image generation genuinely practical for iterative creative workflows.

References

https://github.com/mit-han-lab/nunchaku
https://hanlab.mit.edu/blog/svdquant
https://github.com/mit-han-lab/ComfyUI-nunchaku
https://huggingface.co/black-forest-labs/FLUX.1-dev
https://docs.comfy.org/
https://comfy.icu/extension/mit-han-lab__ComfyUI-nunchaku
https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad
FLUX.1-Krea & the Rise of Opinionated Models - Drew Breunig

How to Install ComfyUI + Nunchaku FLUX.1-dev - Lightning Fast AI Image Generation

Introduction

Features

Prerequisites

Installing ComfyUI Desktop

Essential File Downloads

Installing ComfyUI-nunchaku Plugin

Installing Nunchaku Backend

[Advanced] Manual Nunchaku Backend Installation

Running Your First Nunchaku FLUX.1-dev Generation

[Tip] Multiply Sigmas: Maximizing Detail in Mechanical and Portrait Generation

[Tip] Face Detailer: Maximizing Facial Detail Enhancement for Characters

[Tip] res_2s + bong_tangent: Superior Image Generation with Advanced Sampling

[Tip] FLUX.1-Krea-dev Best Practices & Optimization

[Tip] FLUX.1-Kontext-dev Best Practices & Optimization

Personal Note

References

Subscribe to my newsletter

Taehyeong Lee

Taehyeong Lee