DeepSeek-R1:671B is a large language model (LLM) developed by DeepSeek AI, boasting an impressive 671 billion parameters. It excels at a wide range of conversational and generative tasks—including complex reasoning—and supports multilingual interactions. You can interact with DeepSeek-R1 via its official chat interface at chat.deepseek.com or through its OpenAI-compatible API at platform.deepseek.com.

This guide provides a comprehensive overview of the hardware specifications required for installing DeepSeek-R1:671B locally. It covers minimum and recommended hardware configurations, the impact of quantization on performance, and practical hardware setups for different use cases.

1. Overview

DeepSeek-R1:671B utilizes a Mixture of Experts (MoE) architecture, meaning only a portion of its parameters are active at any given time. This design allows for a reduction in VRAM requirements compared to what might be expected given the model’s size. However, hardware demands remain high, particularly for the full 671B parameter version.

2. Hardware Requirements

2.1 CPU

Requirements:
A powerful multi-core CPU is crucial—especially when running the model on the CPU or with limited GPU resources.
Recommendations:
A modern multi-core processor with a high clock speed and a large cache. For instance, dual EPYC CPUs with substantial RAM configurations have been reported to perform well.

2.2 System RAM

Requirements:
Sufficient system RAM is essential for storing the model, the KV cache, and intermediate computations.
Recommendations:
- Minimum: 16GB
- Preferred: 32GB or more for larger models and longer context lengths.
- Note: Larger models require proportionally more RAM.

2.3 GPU

Role:
While not mandatory, a powerful GPU (or multiple GPUs) can greatly accelerate inference. NVIDIA GPUs are preferred due to their CUDA support and compatibility with deep learning frameworks.
VRAM Requirements:
The full 671B model typically requires a multi-GPU setup with a combined VRAM of approximately 1,342 GB. Distilled and quantized versions significantly lower these requirements.

VRAM Requirements by Model Variant

Model Variant	Parameters (B)	VRAM Requirement (GB)	Recommended GPU Configuration
DeepSeek-R1	671	~1,342	Multi-GPU setup (e.g., NVIDIA A100 80GB x16)
DeepSeek-R1-Distill-Qwen-1.5B	1.5	~0.7	NVIDIA RTX 3060 12GB or higher
DeepSeek-R1-Distill-Qwen-7B	7	~3.3	NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Llama-8B	8	~3.7	NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-14B	14	~6.5	NVIDIA RTX 3080 10GB or higher
DeepSeek-R1-Distill-Qwen-32B	32	~14.9	NVIDIA RTX 4090 24GB
DeepSeek-R1-Distill-Llama-70B	70	~32.7	NVIDIA RTX 4090 24GB (x2)

For multi-GPU setups, ensure your motherboard supports bifurcation to avoid compatibility issues.

2.4 Storage

Requirements:
DeepSeek-R1 models require substantial storage—from tens to hundreds of gigabytes—depending on model size and quantization.
Recommendations:
A fast NVMe SSD is essential for quickly loading the model and handling data swaps during inference.

Approximate Storage Requirements by Model Variant

Model Variant	Parameters (B)	Approximate Storage Requirement (GB)
DeepSeek-R1	671	~720
DeepSeek-R1-Distill-Qwen-1.5B	1.5	~4
DeepSeek-R1-Distill-Qwen-7B	7	~14
DeepSeek-R1-Distill-Llama-8B	8	~16
DeepSeek-R1-Distill-Qwen-14B	14	~28
DeepSeek-R1-Distill-Qwen-32B	32	~64
DeepSeek-R1-Distill-Llama-70B	70	~140

3. Additional Hardware Considerations

Cooling:
Large language models generate significant heat. Ensure adequate cooling (potentially custom water cooling loops for GPUs/CPUs) to prevent thermal throttling and maintain optimal performance.
Power Consumption:
DeepSeek-R1 can draw high power—especially with multi-GPU setups. Monitor system power and use a power supply with sufficient capacity.
CPU Offload:
For the 671B model, 80 GB of CPU offload is recommended. This means utilizing the CPU’s memory for storing and processing parts of the model that do not fit in GPU VRAM.

4. Quantization for Lower Hardware Requirements

If hardware is limited, quantized versions (e.g., 4-bit or 8-bit) can substantially reduce VRAM and RAM requirements. However, note that quantization may impact output quality.

Example:
A 7B model quantized to 4-bit might only require 4–6 GB VRAM.
Tools:
Tools like llama.cpp or Hugging Face's bitsandbytes can assist with quantization.

Hardware Requirements for 4-bit Quantized DeepSeek-R1

Model Variant	System RAM	GPU VRAM
14B	32 GB	12–16 GB
32B	64 GB	16–24 GB
70B	128 GB	24–32 GB
671B	512 GB+	40+ GB

Note: Balance the reduced hardware requirements against potential performance degradation when choosing quantization levels.

5. Distilled Models

DeepSeek offers distilled versions with fewer parameters (ranging from 1.5B to 70B) that present several advantages:

Reduced Hardware Requirements:
For example, DeepSeek-R1-Distill-Qwen-1.5B requires only ~0.7 GB of VRAM.
Efficient Yet Powerful:
Despite being smaller, distilled models maintain robust reasoning capabilities and often outperform similarly sized models from other architectures.
Cost-Effective Deployment:
They allow experimentation and deployment on lower-end hardware, reducing the need for expensive multi-GPU setups.

If you have limited resources or are beginning with DeepSeek-R1, consider using a distilled model.

6. Running DeepSeek-R1 with Docker

Docker can simplify the installation process by containerizing DeepSeek-R1. To run it with Docker:

Install Docker on your system.
Download the DeepSeek-R1 Docker image.
Run the container using the docker run command.

Recommendation:
For Docker setups, use the GGUF (GPT-Generated Ultra Fast) versions of the model for compatibility with llama.cpp and other inference engines.

7. Example Hardware Setups

7.1 CPU-Based Rig

CPU: AMD Ryzen 9 or Intel Core i9
RAM: 6–128GB DDR5
GPU: None
Storage: Fast NVMe SSD
Inference Engine: llama.cpp or kTransformers with Unsloth's 1.58-bit dynamic quantization

7.2 Hybrid CPU/GPU System

CPU: AMD Ryzen Threadripper or Intel Xeon
RAM: 128GB – 256GB DDR5
GPU: NVIDIA RTX 4090 or similar
Storage: Fast NVMe SSD
Inference Engine: llama.cpp or kTransformers with Unsloth's dynamic quantization (1.58-bit or higher)

7.3 High-End Multi-GPU Server

CPU: Dual Intel Xeon or AMD EPYC
RAM: 512GB – 1TB DDR5
GPU: 8× NVIDIA RTX 4090 (NVLinked in pairs if possible)
Storage: Multiple fast NVMe SSDs (RAID0 optional)
Inference Engine: vLLM or a custom implementation with tensor parallelism

7.4 Cost-Effective Server

CPU: Dual AMD EPYC 7702 or 7V135
RAM: 12GB – 1TB DDR4 ECC
GPU: None
Storage: 1 TB NVMe SSD
Inference Engine: llama.cpp with 4-bit quantization

7.5 Mid-Tier Rig

Platform: Apple M2 Ultra
RAM: 192GB unified memory
GPU: Apple M2 Ultra (integrated)
Storage: Built-in SSD
Inference Engine: llama.cpp with Unsloth's 2.51-bit pre-quantized model

8. Training DeepSeek-R1

Training DeepSeek-R1 requires significantly more resources than inference. Below is a summary of the minimum and recommended hardware for training:

Component	Minimum Specification	Recommended Specification
GPUs	NVIDIA A100, RTX 4090, or H100 (Single)	24 H100 GPUs (ep=8, pp=3) or equivalent configurations
RAM	32GB	64GB (Minimum), 128GB+ preferred
CPU	Single CPU	Multiple high-end CPUs (e.g., dual Intel Xeon or AMD EPYC)

Note: Storage for training DeepSeek-R1 ranges from 10–20 GB for smaller models (e.g., 7B) to 200+ GB for larger models (e.g., 65B). SSDs are highly recommended for faster loading times.

9. Conclusion

DeepSeek-R1:671B is a powerful LLM suitable for a variety of applications. Despite its size, its Mixture of Experts architecture and quantization techniques make it more accessible than expected. However, installing the full model locally demands a robust system with:

A high-end multi-GPU setup with substantial VRAM
A powerful multi-core CPU
Ample system RAM and fast NVMe storage

For users with limited hardware resources, consider:

Quantized Versions: Lower VRAM and RAM requirements with a potential trade-off in output quality.
Distilled Models: More accessible deployment while still providing strong performance.

Looking ahead, advancements in hardware—such as AMD MI300X GPUs with 192GB HBM3 and more affordable 256GB DDR5 kits—may further enhance the accessibility of running DeepSeek-R1.

By carefully evaluating your use case and available resources, you can select the optimal hardware configuration to install and utilize DeepSeek-R1:671B for your applications.

如何在本地部署 DeepSeek-R1？完整硬體指南