GPU Configurations for DeepSeek R1 Distilled Models

KS MooiKS Mooi
5 min read

Introduction

This article provides a detailed examination of various GPU configurations for running DeepSeek models, with a focus on cost, performance (measured in tokens per second, or tps), and operational considerations. The analysis covers models including deepseek-r1.14b, deepseek-r1.32b, and deepseek-r1.70b, offering insights into the best hardware choices for different needs.


Key Points

  • Comparison Focus: The chart compares GPU setups for DeepSeek models, emphasizing cost and performance (tps).

  • deepseek-r1.70b Options:

    • Performance-focused: NVIDIA RTX 6000 ADA offers 19 tps at $6199.

    • Cost-effective: 2 x NVIDIA RTX 3090 offers 17 tps at $1600.

  • Electricity Costs: Although 2 x RTX 3090 have higher power consumption, their total cost over 3 years remains lower than that of the RTX 6000 ADA.


GPU Options and Performance

The provided chart outlines various GPU configurations, listing cost in USD and performance in tps for running different DeepSeek models:

  • NVIDIA RTX 3090:

    • deepseek-r1 (14b): $800 (58 tps).

    • deepseek-r1 (32b): $800 (31 tps).

    • deepseek-r1 (70b): Requires 2 units (2 x RTX 3090), totaling $1600, delivering 17 tps.

  • NVIDIA RTX 4070 Ti Super 16GB:

    • deepseek-r1 (14b): $800 (52 tps).
  • NVIDIA RTX Titan 24GB:

    • deepseek-r1 (14b): $789 (44 tps).

    • deepseek-r1 (32b): $789 (23 tps).

  • NVIDIA RTX 6000 ADA 48GB:

    • deepseek-r1 (32b): $6199 (36 tps).

    • deepseek-r1 (70b): $6199 (19 tps).

  • Apple M3 Max 128Gb 40 GPU:

    • deepseek-r1 (32b): $4000 (23 tps).

    • deepseek-r1 (70b): $4000 (4 tps).


Cost vs. Performance Trade-offs

For deepseek-r1.70b:

  • RTX 6000 ADA:

    • Offers 19 tps at $6199.

    • Lower power consumption (300 W).

  • 2 x RTX 3090:

    • Offers 17 tps at $1600.

    • Higher power consumption (700 W total), resulting in increased electricity costs.

Despite higher electricity usage, the lower initial cost of 2 x RTX 3090 results in a lower total cost over 3 years.


Surprising Detail: M3 Max Performance

Although the Apple M3 Max boasts 128 GB of memory, its performance for deepseek-r1.70b is notably poor at just 4 tps. This likely stems from its architecture not being optimized for large-scale AI models compared to dedicated NVIDIA GPUs.


Survey Note: Comprehensive Analysis of GPU Configurations for DeepSeek Models

Chart Data and Verification

The chart includes key data points organized in the table below:

GPU ConfigurationModelCost (USD)Tokens per Second (tps)
Ollama RTX 3090 (24GB)deepseek-r1 (14b)80058
Ollama RTX 3090 (24GB)deepseek-r1 (32b)80031
2 x RTX 3090 (48GB)deepseek-r1 (70b)160017
Ollama RTX 4070 Ti Super (16GB)deepseek-r1 (14b)80052
Ollama RTX Titan (24GB)deepseek-r1 (14b)78944
Ollama RTX Titan (24GB)deepseek-r1 (32b)78923
Ollama RTX 6000 ADA (48GB)deepseek-r1 (32b)619936
Ollama RTX 6000 ADA (48GB)deepseek-r1 (70b)619919
Ollama M3 Max (128GB)deepseek-r1 (32b)400023
Ollama M3 Max (128GB)deepseek-r1 (70b)40004

Additional insights include:

  • deepseek-r1.14b: Can run on any GPU without significant performance gaps.

  • deepseek-r1.32b: Performs best on a single GPU with ~24GB VRAM, with the RTX 3090 offering the best price/performance ratio.

  • deepseek-r1.70b: Achieves optimal cost/performance with 2 x RTX 3090 (17 tps), although electricity costs are higher compared to RTX 6000 ADA (19 tps).

GPU Analysis and Suitability

  • NVIDIA RTX 3090:

    • Offers 24GB VRAM and excellent cost-effectiveness at $800.

    • Suitable for smaller models with high tps, and for deepseek-r1.70b, two units are required for a balanced performance.

  • NVIDIA RTX 4070 Ti Super 16GB:

    • Offers 16GB VRAM and is suitable for deepseek-r1.14b.

    • Limited data for larger models suggests lower capacity.

  • NVIDIA RTX Titan 24GB:

    • Comparable to the RTX 3090 in cost and VRAM, but not recommended for larger models due to missing data for deepseek-r1.70b.
  • NVIDIA RTX 6000 ADA 48GB:

    • A professional-grade GPU with 48GB VRAM, providing high stability and lower power consumption.

    • Delivers strong performance for deepseek-r1.70b but comes with a high upfront cost.

  • Apple M3 Max 128Gb 40 GPU:

    • Despite high memory capacity, it underperforms for deepseek-r1.70b due to architectural constraints.

Electricity Cost Analysis

Electricity costs significantly affect the total cost of ownership (TCO) over time. Assuming an electricity cost of $0.15 per kWh:

  • 2 x RTX 3090:

    • Total power consumption: ~700 W.

    • Estimated 3-year electricity cost: ~$919.80.

    • Total TCO: $1600 + ~$919.80 = ~$2519.80.

  • RTX 6000 ADA:

    • Power consumption: ~300 W.

    • Estimated 3-year electricity cost: ~$394.20.

    • Total TCO: $6199 + ~$394.20 = ~$6593.20.

The lower initial cost of the 2 x RTX 3090 makes it more attractive over a 3-year period despite its higher power draw.

Performance Per Dollar

For deepseek-r1.70b:

  • 2 x RTX 3090: 17 tps / $1600 ≈ 0.0106 tps per dollar.

  • RTX 6000 ADA: 19 tps / $6199 ≈ 0.0031 tps per dollar.

This calculation shows that the 2 x RTX 3090 offers a better performance per dollar ratio, reinforcing its cost-effectiveness.

Limitations and Potential Issues

  • RTX 3090:

    • High power consumption (700 W for two units) may necessitate additional cooling solutions.
  • RTX 6000 ADA:

    • The high upfront cost may not be feasible for budget-conscious users, despite its energy efficiency.
  • M3 Max:

    • Its poor performance for deepseek-r1.70b (4 tps) highlights its unsuitability for large-scale AI models, possibly due to lack of optimization for CUDA-based workloads.

Recommendations and Future Considerations

  • For deepseek-r1.70b:

    • Performance Priority: Choose RTX 6000 ADA (19 tps) if performance and energy efficiency are critical.

    • Cost-Effectiveness: Opt for 2 x RTX 3090 (17 tps, $1600) for a lower total cost over 3 years.

  • For smaller models, GPUs like RTX 3090 or RTX Titan 24GB are recommended.

  • Future GPU releases (e.g., Nvidia's RTX 50-series, AMD's RDNA 4 series) may offer improved performance, but current recommendations are based on existing data.


Key Citations


0
Subscribe to my newsletter

Read articles from KS Mooi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

KS Mooi
KS Mooi

AI Enthusiast Exploring the forefront of AI with a focus on deep learning, reinforcement learning, and agentic AI. Passionate about creating intelligent, adaptive models and applying retrieval-augmented generation (RAG) techniques to push the boundaries of what's possible in real-world applications.