如何在本地部署 DeepSeek-R1?完整硬體指南


DeepSeek-R1:671B is a large language model (LLM) developed by DeepSeek AI, boasting an impressive 671 billion parameters. It excels at a wide range of conversational and generative tasks—including complex reasoning—and supports multilingual interactions. You can interact with DeepSeek-R1 via its official chat interface at chat.deepseek.com or through its OpenAI-compatible API at platform.deepseek.com.
This guide provides a comprehensive overview of the hardware specifications required for installing DeepSeek-R1:671B locally. It covers minimum and recommended hardware configurations, the impact of quantization on performance, and practical hardware setups for different use cases.
1. Overview
DeepSeek-R1:671B utilizes a Mixture of Experts (MoE) architecture, meaning only a portion of its parameters are active at any given time. This design allows for a reduction in VRAM requirements compared to what might be expected given the model’s size. However, hardware demands remain high, particularly for the full 671B parameter version.
2. Hardware Requirements
2.1 CPU
Requirements:
A powerful multi-core CPU is crucial—especially when running the model on the CPU or with limited GPU resources.Recommendations:
A modern multi-core processor with a high clock speed and a large cache. For instance, dual EPYC CPUs with substantial RAM configurations have been reported to perform well.
2.2 System RAM
Requirements:
Sufficient system RAM is essential for storing the model, the KV cache, and intermediate computations.Recommendations:
Minimum: 16GB
Preferred: 32GB or more for larger models and longer context lengths.
Note: Larger models require proportionally more RAM.
2.3 GPU
Role:
While not mandatory, a powerful GPU (or multiple GPUs) can greatly accelerate inference. NVIDIA GPUs are preferred due to their CUDA support and compatibility with deep learning frameworks.VRAM Requirements:
The full 671B model typically requires a multi-GPU setup with a combined VRAM of approximately 1,342 GB. Distilled and quantized versions significantly lower these requirements.
VRAM Requirements by Model Variant
Model Variant | Parameters (B) | VRAM Requirement (GB) | Recommended GPU Configuration |
DeepSeek-R1 | 671 | ~1,342 | Multi-GPU setup (e.g., NVIDIA A100 80GB x16) |
DeepSeek-R1-Distill-Qwen-1.5B | 1.5 | ~0.7 | NVIDIA RTX 3060 12GB or higher |
DeepSeek-R1-Distill-Qwen-7B | 7 | ~3.3 | NVIDIA RTX 3070 8GB or higher |
DeepSeek-R1-Distill-Llama-8B | 8 | ~3.7 | NVIDIA RTX 3070 8GB or higher |
DeepSeek-R1-Distill-Qwen-14B | 14 | ~6.5 | NVIDIA RTX 3080 10GB or higher |
DeepSeek-R1-Distill-Qwen-32B | 32 | ~14.9 | NVIDIA RTX 4090 24GB |
DeepSeek-R1-Distill-Llama-70B | 70 | ~32.7 | NVIDIA RTX 4090 24GB (x2) |
For multi-GPU setups, ensure your motherboard supports bifurcation to avoid compatibility issues.
2.4 Storage
Requirements:
DeepSeek-R1 models require substantial storage—from tens to hundreds of gigabytes—depending on model size and quantization.Recommendations:
A fast NVMe SSD is essential for quickly loading the model and handling data swaps during inference.
Approximate Storage Requirements by Model Variant
Model Variant | Parameters (B) | Approximate Storage Requirement (GB) |
DeepSeek-R1 | 671 | ~720 |
DeepSeek-R1-Distill-Qwen-1.5B | 1.5 | ~4 |
DeepSeek-R1-Distill-Qwen-7B | 7 | ~14 |
DeepSeek-R1-Distill-Llama-8B | 8 | ~16 |
DeepSeek-R1-Distill-Qwen-14B | 14 | ~28 |
DeepSeek-R1-Distill-Qwen-32B | 32 | ~64 |
DeepSeek-R1-Distill-Llama-70B | 70 | ~140 |
3. Additional Hardware Considerations
Cooling:
Large language models generate significant heat. Ensure adequate cooling (potentially custom water cooling loops for GPUs/CPUs) to prevent thermal throttling and maintain optimal performance.Power Consumption:
DeepSeek-R1 can draw high power—especially with multi-GPU setups. Monitor system power and use a power supply with sufficient capacity.CPU Offload:
For the 671B model, 80 GB of CPU offload is recommended. This means utilizing the CPU’s memory for storing and processing parts of the model that do not fit in GPU VRAM.
4. Quantization for Lower Hardware Requirements
If hardware is limited, quantized versions (e.g., 4-bit or 8-bit) can substantially reduce VRAM and RAM requirements. However, note that quantization may impact output quality.
Example:
A 7B model quantized to 4-bit might only require 4–6 GB VRAM.Tools:
Tools like llama.cpp or Hugging Face's bitsandbytes can assist with quantization.
Hardware Requirements for 4-bit Quantized DeepSeek-R1
Model Variant | System RAM | GPU VRAM |
14B | 32 GB | 12–16 GB |
32B | 64 GB | 16–24 GB |
70B | 128 GB | 24–32 GB |
671B | 512 GB+ | 40+ GB |
Note: Balance the reduced hardware requirements against potential performance degradation when choosing quantization levels.
5. Distilled Models
DeepSeek offers distilled versions with fewer parameters (ranging from 1.5B to 70B) that present several advantages:
Reduced Hardware Requirements:
For example, DeepSeek-R1-Distill-Qwen-1.5B requires only ~0.7 GB of VRAM.Efficient Yet Powerful:
Despite being smaller, distilled models maintain robust reasoning capabilities and often outperform similarly sized models from other architectures.Cost-Effective Deployment:
They allow experimentation and deployment on lower-end hardware, reducing the need for expensive multi-GPU setups.
If you have limited resources or are beginning with DeepSeek-R1, consider using a distilled model.
6. Running DeepSeek-R1 with Docker
Docker can simplify the installation process by containerizing DeepSeek-R1. To run it with Docker:
Install Docker on your system.
Download the DeepSeek-R1 Docker image.
Run the container using the
docker run
command.
Recommendation:
For Docker setups, use the GGUF (GPT-Generated Ultra Fast) versions of the model for compatibility with llama.cpp and other inference engines.
7. Example Hardware Setups
7.1 CPU-Based Rig
CPU: AMD Ryzen 9 or Intel Core i9
RAM: 6–128GB DDR5
GPU: None
Storage: Fast NVMe SSD
Inference Engine: llama.cpp or kTransformers with Unsloth's 1.58-bit dynamic quantization
7.2 Hybrid CPU/GPU System
CPU: AMD Ryzen Threadripper or Intel Xeon
RAM: 128GB – 256GB DDR5
GPU: NVIDIA RTX 4090 or similar
Storage: Fast NVMe SSD
Inference Engine: llama.cpp or kTransformers with Unsloth's dynamic quantization (1.58-bit or higher)
7.3 High-End Multi-GPU Server
CPU: Dual Intel Xeon or AMD EPYC
RAM: 512GB – 1TB DDR5
GPU: 8× NVIDIA RTX 4090 (NVLinked in pairs if possible)
Storage: Multiple fast NVMe SSDs (RAID0 optional)
Inference Engine: vLLM or a custom implementation with tensor parallelism
7.4 Cost-Effective Server
CPU: Dual AMD EPYC 7702 or 7V135
RAM: 12GB – 1TB DDR4 ECC
GPU: None
Storage: 1 TB NVMe SSD
Inference Engine: llama.cpp with 4-bit quantization
7.5 Mid-Tier Rig
Platform: Apple M2 Ultra
RAM: 192GB unified memory
GPU: Apple M2 Ultra (integrated)
Storage: Built-in SSD
Inference Engine: llama.cpp with Unsloth's 2.51-bit pre-quantized model
8. Training DeepSeek-R1
Training DeepSeek-R1 requires significantly more resources than inference. Below is a summary of the minimum and recommended hardware for training:
Component | Minimum Specification | Recommended Specification |
GPUs | NVIDIA A100, RTX 4090, or H100 (Single) | 24 H100 GPUs (ep=8, pp=3) or equivalent configurations |
RAM | 32GB | 64GB (Minimum), 128GB+ preferred |
CPU | Single CPU | Multiple high-end CPUs (e.g., dual Intel Xeon or AMD EPYC) |
Note: Storage for training DeepSeek-R1 ranges from 10–20 GB for smaller models (e.g., 7B) to 200+ GB for larger models (e.g., 65B). SSDs are highly recommended for faster loading times.
9. Conclusion
DeepSeek-R1:671B is a powerful LLM suitable for a variety of applications. Despite its size, its Mixture of Experts architecture and quantization techniques make it more accessible than expected. However, installing the full model locally demands a robust system with:
A high-end multi-GPU setup with substantial VRAM
A powerful multi-core CPU
Ample system RAM and fast NVMe storage
For users with limited hardware resources, consider:
Quantized Versions: Lower VRAM and RAM requirements with a potential trade-off in output quality.
Distilled Models: More accessible deployment while still providing strong performance.
Looking ahead, advancements in hardware—such as AMD MI300X GPUs with 192GB HBM3 and more affordable 256GB DDR5 kits—may further enhance the accessibility of running DeepSeek-R1.
By carefully evaluating your use case and available resources, you can select the optimal hardware configuration to install and utilize DeepSeek-R1:671B for your applications.
Subscribe to my newsletter
Read articles from KS Mooi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

KS Mooi
KS Mooi
AI Enthusiast Exploring the forefront of AI with a focus on deep learning, reinforcement learning, and agentic AI. Passionate about creating intelligent, adaptive models and applying retrieval-augmented generation (RAG) techniques to push the boundaries of what's possible in real-world applications.