The world of open-source innovation is taking a major leap forward with the release of DeepSeek V3, an advanced code generation model that promises to redefine how developers approach programming tasks. Now available on Novita AI, this groundbreaking model is set to empower developers, researchers, and tech enthusiasts alike with unparalleled capabilities.

What is DeepSeek V3?

DeepSeek V3 is a state-of-the-art Mixture-of-Experts (MoE) large language model, boasting 671 billion parameters, with 37 billion activated per token during inference. Released as an open-source model, it enables developers to tackle complex challenges in coding, reasoning, mathematics, and text generation.

DeepSeek V3 stands out for its efficient architecture and cost-effective training. Its training required only 2.788 million H800 GPU hours, costing approximately $5.5 million, far less than the resources needed for comparable closed-source models like GPT-4. By incorporating innovative techniques like Multi-Head Latent Attention (MLA) and Multi-Token Prediction (MTP), DeepSeek V3 delivers exceptional performance while remaining scalable and accessible.

Key Features of DeepSeek V3

1. Mixture-of-Experts (MoE) Architecture

DeepSeek V3 employs an MoE framework with fine-grained dynamic load-balancing techniques. Unlike traditional MoE models, it eliminates the need for auxiliary loss, ensuring computational resources are efficiently distributed without performance degradation.

2. Multi-Head Latent Attention (MLA)

MLA enhances inference efficiency by compressing attention keys and values, reducing memory overhead while maintaining high attention quality. This enables DeepSeek V3 to handle long context windows of up to 128K tokens, making it ideal for tasks involving extended text input.

3. Multi-Token Prediction (MTP)

The MTP objective allows DeepSeek V3 to predict multiple tokens simultaneously, improving both training efficiency and inference speed. This feature is particularly useful for generating long-form content or solving complex problems.

4. FP8 Mixed Precision Training

DeepSeek V3 uses FP8 (8-bit floating-point) precision for training, reducing memory and computational costs while maintaining numerical stability. This innovation enables the model to scale efficiently without requiring a large hardware footprint.

5. English and Chinese Language Support

DeepSeek V3 is optimized for English and Chinese, making it an excellent choice for developers building multilingual applications for these two widely used languages. While it doesn’t support a broad range of languages, it excels in these specific domains.

Benchmark Performance of DeepSeek V3

DeepSeek V3 consistently outperforms many open-source and even closed-source models across a variety of benchmarks. Below is a comparison of its performance:

DeepSeek 3 benckmark

Coding Excellence:
- DeepSeek V3 achieves 82.6% on HumanEval-Mul (Pass@1) and 51.6% on Codeforces, making it a robust solution for code generation and evaluation tasks.
- On LiveCodeBench (Pass@1-COT), it significantly outpaces its predecessors, with 40.5%, and performs competitively against other models.
Mathematical Reasoning:
- DeepSeek V3 leads in MATH-500 (90.2%) and shows a strong performance in CNMO 2024 (Pass@1) and AIME 2024, demonstrating its ability to handle advanced mathematical problem-solving.
Multilingual Capabilities:
- With 90.9% on CLUEWSC and 86.5% on C-Eval, DeepSeek V3 solidifies itself as an excellent model for Chinese-specific tasks, while retaining robust performance in English benchmarks like MMLU (88.5%).
General Knowledge and Reasoning:
- It achieves 91.6% on DROP (3-shot F1) and 89.1% on MMLU-Redux, showcasing its usefulness in knowledge-intensive applications.

Start Free Trail

Deployment Options for DeepSeek V3

DeepSeek V3 offers flexibility in deployment, allowing users to integrate it seamlessly into their workflows. Whether you prefer running it locally or on the cloud, the model supports a variety of hardware and open-source community software tools. Here are the recommended options for deployment:

DeepSeek-Infer Demo: A simple and lightweight demo for FP8 and BF16 inference, providing an easy way to test the model.
SGLang: Fully supports the DeepSeek-V3 model in both BF16 and FP8 inference modes, with Multi-Token Prediction support coming soon.
LMDeploy: Enables efficient FP8 and BF16 inference for both local and cloud deployment.
TensorRT-LLM: Currently supports BF16 inference and INT4/INT8 quantization, with FP8 support planned for future updates.
vLLM: Supports DeepSeek V3 in FP8 and BF16 modes, enabling tensor parallelism and pipeline parallelism for efficient scaling.

Access the DeepSeek V3 API via Novita AI

Novita AI’s platform simplifies the deployment of DeepSeek V3 by providing pre-configured APIs and affordable GPU cloud infrastructure. Developers can integrate the model seamlessly into their applications without worrying about hardware setup or scalability.

To get started with DeepSeek V3 on Novita AI, follow these steps:

Step 1: Go to Novita AI and log in using your Google, GitHub account, or email address

Step 2: Try the DeepSeek V3 Demo.

Deepseek v3

Step 3: Monitor the LLM Metrics Console of the model on Novita AI.

Step 4: Get your API Key:

Navigate to “Key Management” in the settings
A default key is created upon your first login
Generate additional keys by clicking “+ Add New Key”

Step 5: Set up your development environment and configure options such as content, role, name, and prompt

API Integration

Novita AI provides client libraries for Curl, Python and JavaScript, making it easy to integrate DeepSeek-R1 Instruct into your projects:

For Python users:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "deepseek/deepseek_v3"
stream = True # or False
max_tokens = 8192
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

For JavaScript users:

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.novita.ai/v3/openai",
  apiKey: "<YOUR Novita AI API Key>",
});
const stream = true; // or false

async function run() {
  const completion = await openai.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Be a helpful assistant",
      },
      {
        role: "user",
        content: "Hi there!",
      },
    ],
    model: "deepseek/deepseek_v3",
    stream,
    response_format: { type: "text" },
    max_tokens: 8192,
    temperature: 1,
    top_p: 1,
    min_p: 0,
    top_k: 50,
    presence_penalty: 0,
    frequency_penalty: 0,
    repetition_penalty: 1
  });

  if (stream) {
    for await (const chunk of completion) {
      if (chunk.choices[0].finish_reason) {
        console.log(chunk.choices[0].finish_reason);
      } else {
        console.log(chunk.choices[0].delta.content);
      }
    }
  } else {
    console.log(JSON.stringify(completion));
  }
}

run();

For Curl users:

curl "https://api.novita.ai/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR Novita AI API Key>" \
  -d @- << 'EOF'
{
    "model": "deepseek/deepseek_v3",
    "messages": [
        {
            "role": "system",
            "content": "Be a helpful assistant"
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "response_format": { "type": "text" },
    "max_tokens": 8192,
    "temperature": 1,
    "top_p": 1,
    "min_p": 0,
    "top_k": 50,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "repetition_penalty": 1
}
EOF

Unlock the Power of DeepSeek V3 Today

DeepSeek V3 represents a breakthrough in open-source AI, combining scalability, cost-effectiveness, and exceptional performance. Along with versatile deployment options across GPUs and cloud platforms, DeepSeek V3 is a powerful tool for developers and businesses alike.

Get started with DeepSeek V3 on Novita AI today and unlock the potential of advanced AI for your projects.

About Novita AI

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

DeepSeek V3: Advancing Open-Source Code Models, Now Available on Novita AI

Table of contents