How to Deploy DeepSeek-R1–0528-Qwen3–8B on Novita AI GPU Instances


What if you could run an 8B parameter model that outperforms models 30 times its size?
DeepSeek-R1–0528-Qwen3–8B delivers breakthrough reasoning performance, matching 235B parameter models on complex mathematical tasks while running efficiently on a single RTX 4090.
This guide shows you how to deploy this game-changing model on Novita AI in minutes.
What is DeepSeek-R1–0528-Qwen3–8B
DeepSeek-R1–0528-Qwen3–8B is a sophisticated reasoning model created by distilling chain-of-thought capabilities from DeepSeek-R1–0528 into the Qwen3 8B Base model. This innovative approach has produced a state-of-the-art open-source model that achieves remarkable performance on mathematical and reasoning benchmarks, including AIME 2024 where it surpasses Qwen3 8B by +10.0% and matches the performance of the much larger Qwen3–235B-thinking model.
The model demonstrates exceptional capabilities across various evaluation metrics, scoring 86.0 on AIME 24, 76.3 on AIME 25, and 61.5 on HMMT Feb 25. What makes this model particularly valuable is its ability to deliver reasoning performance comparable to much larger models while maintaining the efficiency and deployability of an 8B parameter model.
Why Running DeepSeek-R1–0528-Qwen3–8B on Novita AI GPU Instances?
1. Significant Price Advantage and Flexible Pricing Models
Novita AI offers competitive pricing in the market for GPU computing, making advanced AI models like DeepSeek-R1–0528-Qwen3–8B accessible to researchers, business and developers at any scale.
Choose between On-Demand and Subscription pricing based on your usage patterns. For DeepSeek-R1–0528-Qwen3–8B running on RTX 4090:
On-Demand: $0.35/hour — Suitable for testing and variable workloads
1–5 months: $226.80/month (10% OFF) — Medium-term projects
6–11 months: $206.64/month (18% OFF) — Extended development cycles
12 months: $189.00/month (25% OFF) — Greater savings for long-term commitments
The annual subscription can save you hundreds of dollars while ensuring guaranteed resource availability. Learn more about pricing models.
2. Multiple GPU Choices for Performance Optimization
Novita AI provides comprehensive GPU options to match your computational needs and budget:
RTX 3090 24GB: Cost-effective for development and testing
RTX 4090 24GB: Recommended for DeepSeek-R1–0528-Qwen3–8B — balanced performance and cost
RTX 6000 Ada 48GB: Enhanced VRAM for larger context lengths
L40S 48GB: Professional-grade performance with extended memory capacity
A100 SXM 80GB: High-performance computing with substantial memory bandwidth
H100 SXM 80GB: Enterprise-grade performance for production deployments
3. Ready-to-Use Templates and Custom Flexibility
Pre-configured templates for popular models like DeepSeek-R1–0528-Qwen3–8B eliminate manual setup complexity, including optimized container configurations, environment variables, and tested deployment parameters. Advanced users can create fully custom templates with specialized configurations and personalized deployment scripts, ensuring both ease of use for beginners and full customization for experienced developers.
4. Global Deployment Network
Deploy GPU instances closer to your users through Novita AI’s worldwide network with 15 regions across Americas (US, Canada, Brazil), Asia-Pacific (Japan, Singapore, India, UAE, Hong Kong), and Europe (Germany, UK). This global infrastructure ensures reduced latency and reliable performance for your DeepSeek-R1–0528-Qwen3–8B deployment, providing dependable access regardless of user location.
How to Deploy DeepSeek-R1–0528-Qwen3–8B on Novita AI
Step 1: Template Selection
Select the DeepSeek-R1-0528-Qwen3-8B
template from the model library. Choose one RTX 4090 as your GPU type and click Deploy.
Step 2: Parameter Confirmation
Review the deployment parameters displayed on the configuration screen. Verify all settings are correct and click Next to proceed.
Step 3: Instance Deployment
Click Deploy to initiate the instance creation process. The system will begin provisioning your GPU instance.
Step 4: Monitor Deployment Progress
Navigate to Instance Management to access the control console. This dashboard allows you to track the deployment status in real-time.
Step 5: View Image Pulling Status
Click on your specific instance to monitor the container image download progress. This process may take several minutes depending on network conditions.
Step 6: Track Model Download
After the instance starts, it will begin pulling the model. Click “Logs” –> “Instance Logs” to monitor the model download progress.
Step 7: Verify Successful Deployment
Look for the message "Application startup complete."
in the instance logs. This indicates that the deployment process has finished successfully.
Step 8: Obtain Access URL
Click “Connect“, then click –> “Connect to HTTP Service [Port 8000]“. Since this is an API service, you’ll need to copy the address.
Step 9: Access Your Deployed Model
To make requests to your model, please replace “http://7a65a32b51e37482-8000.jp-tyo-1.gpu-instance.novita.ai“ with your actual exposed address. Copy the following code to access your private model!
# Call the server using curl:
curl -X POST "http://7a65a32b51e37482-8000.jp-tyo-1.gpu-instance.novita.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
{"id":"chatcmpl-56d12c91edbb46fcb93ccbbc0ecddd2c","object":"chat.completion","created":1748588145,"model":"deepseek-ai/DeepSeek-R1-0528-Qwen3-8B","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\nOkay, the user is asking for the capital of France. Let me start by recalling the basic answer. Paris is definitely the correct response, so I'll start with that.\n\nBut why is the user asking this? They might be a student preparing for a test, or maybe someone traveling who needs to know the main city for planning. Alternatively, they could be testing my knowledge. But since it's a straightforward question, I'll focus on providing accurate information.\n\nWait, maybe they need more context. Should I mention some points of interest to add value? Like Eiffel Tower or Louvre. That could help if they're interested in tourism or education. \n\nI should check if there's any recent political changes or administrative updates but nothing seems off with Paris' status as a capital. Alright, keep it simple but informative. Let me structure the answer first, then decide on the optional details. \n\nAlso, considering the user might not want a long answer. But including key landmarks might make it more engaging. They didn't ask for historical info, so maybe just stick to the status and one or two unique facts. \n\nYes, \"city of love\" is a common nickname, so that adds a nice touch. Alright, final answer will confirm Paris, mention the nicknames, and list two landmarks to cover possible interests without being overwhelming.\n</think>\nThe capital of France is **Paris**.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":10,"total_tokens":294,"completion_tokens":284,"prompt_tokens_details":null},"prompt_logprobs":null}
Configure the API address in your applications like Chatbox, and you’ll have your own personal assistant!
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Subscribe to my newsletter
Read articles from NovitaAI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
