如何在本地部署 DeepSeek-R1?完整硬體指南

KS MooiKS Mooi
7 min read

DeepSeek-R1:671B is a large language model (LLM) developed by DeepSeek AI, boasting an impressive 671 billion parameters. It excels at a wide range of conversational and generative tasks—including complex reasoning—and supports multilingual interactions. You can interact with DeepSeek-R1 via its official chat interface at chat.deepseek.com or through its OpenAI-compatible API at platform.deepseek.com.

This guide provides a comprehensive overview of the hardware specifications required for installing DeepSeek-R1:671B locally. It covers minimum and recommended hardware configurations, the impact of quantization on performance, and practical hardware setups for different use cases.


1. Overview

DeepSeek-R1:671B utilizes a Mixture of Experts (MoE) architecture, meaning only a portion of its parameters are active at any given time. This design allows for a reduction in VRAM requirements compared to what might be expected given the model’s size. However, hardware demands remain high, particularly for the full 671B parameter version.


2. Hardware Requirements

2.1 CPU

  • Requirements:
    A powerful multi-core CPU is crucial—especially when running the model on the CPU or with limited GPU resources.

  • Recommendations:
    A modern multi-core processor with a high clock speed and a large cache. For instance, dual EPYC CPUs with substantial RAM configurations have been reported to perform well.

2.2 System RAM

  • Requirements:
    Sufficient system RAM is essential for storing the model, the KV cache, and intermediate computations.

  • Recommendations:

    • Minimum: 16GB

    • Preferred: 32GB or more for larger models and longer context lengths.

    • Note: Larger models require proportionally more RAM.

2.3 GPU

  • Role:
    While not mandatory, a powerful GPU (or multiple GPUs) can greatly accelerate inference. NVIDIA GPUs are preferred due to their CUDA support and compatibility with deep learning frameworks.

  • VRAM Requirements:
    The full 671B model typically requires a multi-GPU setup with a combined VRAM of approximately 1,342 GB. Distilled and quantized versions significantly lower these requirements.

VRAM Requirements by Model Variant

Model VariantParameters (B)VRAM Requirement (GB)Recommended GPU Configuration
DeepSeek-R1671~1,342Multi-GPU setup (e.g., NVIDIA A100 80GB x16)
DeepSeek-R1-Distill-Qwen-1.5B1.5~0.7NVIDIA RTX 3060 12GB or higher
DeepSeek-R1-Distill-Qwen-7B7~3.3NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Llama-8B8~3.7NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-14B14~6.5NVIDIA RTX 3080 10GB or higher
DeepSeek-R1-Distill-Qwen-32B32~14.9NVIDIA RTX 4090 24GB
DeepSeek-R1-Distill-Llama-70B70~32.7NVIDIA RTX 4090 24GB (x2)

For multi-GPU setups, ensure your motherboard supports bifurcation to avoid compatibility issues.

2.4 Storage

  • Requirements:
    DeepSeek-R1 models require substantial storage—from tens to hundreds of gigabytes—depending on model size and quantization.

  • Recommendations:
    A fast NVMe SSD is essential for quickly loading the model and handling data swaps during inference.

Approximate Storage Requirements by Model Variant

Model VariantParameters (B)Approximate Storage Requirement (GB)
DeepSeek-R1671~720
DeepSeek-R1-Distill-Qwen-1.5B1.5~4
DeepSeek-R1-Distill-Qwen-7B7~14
DeepSeek-R1-Distill-Llama-8B8~16
DeepSeek-R1-Distill-Qwen-14B14~28
DeepSeek-R1-Distill-Qwen-32B32~64
DeepSeek-R1-Distill-Llama-70B70~140

3. Additional Hardware Considerations

  • Cooling:
    Large language models generate significant heat. Ensure adequate cooling (potentially custom water cooling loops for GPUs/CPUs) to prevent thermal throttling and maintain optimal performance.

  • Power Consumption:
    DeepSeek-R1 can draw high power—especially with multi-GPU setups. Monitor system power and use a power supply with sufficient capacity.

  • CPU Offload:
    For the 671B model, 80 GB of CPU offload is recommended. This means utilizing the CPU’s memory for storing and processing parts of the model that do not fit in GPU VRAM.


4. Quantization for Lower Hardware Requirements

If hardware is limited, quantized versions (e.g., 4-bit or 8-bit) can substantially reduce VRAM and RAM requirements. However, note that quantization may impact output quality.

  • Example:
    A 7B model quantized to 4-bit might only require 4–6 GB VRAM.

  • Tools:
    Tools like llama.cpp or Hugging Face's bitsandbytes can assist with quantization.

Hardware Requirements for 4-bit Quantized DeepSeek-R1

Model VariantSystem RAMGPU VRAM
14B32 GB12–16 GB
32B64 GB16–24 GB
70B128 GB24–32 GB
671B512 GB+40+ GB

Note: Balance the reduced hardware requirements against potential performance degradation when choosing quantization levels.


5. Distilled Models

DeepSeek offers distilled versions with fewer parameters (ranging from 1.5B to 70B) that present several advantages:

  • Reduced Hardware Requirements:
    For example, DeepSeek-R1-Distill-Qwen-1.5B requires only ~0.7 GB of VRAM.

  • Efficient Yet Powerful:
    Despite being smaller, distilled models maintain robust reasoning capabilities and often outperform similarly sized models from other architectures.

  • Cost-Effective Deployment:
    They allow experimentation and deployment on lower-end hardware, reducing the need for expensive multi-GPU setups.

If you have limited resources or are beginning with DeepSeek-R1, consider using a distilled model.


6. Running DeepSeek-R1 with Docker

Docker can simplify the installation process by containerizing DeepSeek-R1. To run it with Docker:

  1. Install Docker on your system.

  2. Download the DeepSeek-R1 Docker image.

  3. Run the container using the docker run command.

Recommendation:
For Docker setups, use the GGUF (GPT-Generated Ultra Fast) versions of the model for compatibility with llama.cpp and other inference engines.


7. Example Hardware Setups

7.1 CPU-Based Rig

  • CPU: AMD Ryzen 9 or Intel Core i9

  • RAM: 6–128GB DDR5

  • GPU: None

  • Storage: Fast NVMe SSD

  • Inference Engine: llama.cpp or kTransformers with Unsloth's 1.58-bit dynamic quantization

7.2 Hybrid CPU/GPU System

  • CPU: AMD Ryzen Threadripper or Intel Xeon

  • RAM: 128GB – 256GB DDR5

  • GPU: NVIDIA RTX 4090 or similar

  • Storage: Fast NVMe SSD

  • Inference Engine: llama.cpp or kTransformers with Unsloth's dynamic quantization (1.58-bit or higher)

7.3 High-End Multi-GPU Server

  • CPU: Dual Intel Xeon or AMD EPYC

  • RAM: 512GB – 1TB DDR5

  • GPU: 8× NVIDIA RTX 4090 (NVLinked in pairs if possible)

  • Storage: Multiple fast NVMe SSDs (RAID0 optional)

  • Inference Engine: vLLM or a custom implementation with tensor parallelism

7.4 Cost-Effective Server

  • CPU: Dual AMD EPYC 7702 or 7V135

  • RAM: 12GB – 1TB DDR4 ECC

  • GPU: None

  • Storage: 1 TB NVMe SSD

  • Inference Engine: llama.cpp with 4-bit quantization

7.5 Mid-Tier Rig

  • Platform: Apple M2 Ultra

  • RAM: 192GB unified memory

  • GPU: Apple M2 Ultra (integrated)

  • Storage: Built-in SSD

  • Inference Engine: llama.cpp with Unsloth's 2.51-bit pre-quantized model


8. Training DeepSeek-R1

Training DeepSeek-R1 requires significantly more resources than inference. Below is a summary of the minimum and recommended hardware for training:

ComponentMinimum SpecificationRecommended Specification
GPUsNVIDIA A100, RTX 4090, or H100 (Single)24 H100 GPUs (ep=8, pp=3) or equivalent configurations
RAM32GB64GB (Minimum), 128GB+ preferred
CPUSingle CPUMultiple high-end CPUs (e.g., dual Intel Xeon or AMD EPYC)

Note: Storage for training DeepSeek-R1 ranges from 10–20 GB for smaller models (e.g., 7B) to 200+ GB for larger models (e.g., 65B). SSDs are highly recommended for faster loading times.


9. Conclusion

DeepSeek-R1:671B is a powerful LLM suitable for a variety of applications. Despite its size, its Mixture of Experts architecture and quantization techniques make it more accessible than expected. However, installing the full model locally demands a robust system with:

  • A high-end multi-GPU setup with substantial VRAM

  • A powerful multi-core CPU

  • Ample system RAM and fast NVMe storage

For users with limited hardware resources, consider:

  • Quantized Versions: Lower VRAM and RAM requirements with a potential trade-off in output quality.

  • Distilled Models: More accessible deployment while still providing strong performance.

Looking ahead, advancements in hardware—such as AMD MI300X GPUs with 192GB HBM3 and more affordable 256GB DDR5 kits—may further enhance the accessibility of running DeepSeek-R1.

By carefully evaluating your use case and available resources, you can select the optimal hardware configuration to install and utilize DeepSeek-R1:671B for your applications.


0
Subscribe to my newsletter

Read articles from KS Mooi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

KS Mooi
KS Mooi

AI Enthusiast Exploring the forefront of AI with a focus on deep learning, reinforcement learning, and agentic AI. Passionate about creating intelligent, adaptive models and applying retrieval-augmented generation (RAG) techniques to push the boundaries of what's possible in real-world applications.