Qwen 3 Now Available on Novita AI — Claim Your $10 Free Credits

Table of contents
- What is Qwen 3?
- Key Features of Qwen 3
- Benchmarks and Performance
- Flagship Model: Qwen3–235B-A22B
- Other Smaller Models
- How to Access Qwen 3 on Novita AI
- Use the Playground (No Coding Required)
- Integrate via API (For Developers)
- Connect Qwen 3 API on Third-Party Platforms
- Best Practices for Optimal Qwen 3 Performance
- Conclusion

Alibaba’s cutting-edge Qwen 3 large language models are now live on Novita AI’s Model API platform!
For a limited time, new users can claim $10 in free credits to explore and build with Qwen 3.
Here’s the current Qwen 3 lineup and pricing on Novita AI:
Qwen3–235B-A22B: $0.20 / M input tokens, $0.80 / M output tokens
Qwen3–30B-A3B: $0.10 / M input tokens, $0.45 / M output tokens
Qwen3–32B: $0.10 / M input tokens, $0.45 / M output tokens
Qwen3–14B: $0.07 / M input tokens, $0.275 / M output tokens
Qwen3–8B: $0.035/ M input tokens, $0.138 / M output tokens
Qwen3–4B: free
Qwen3–1.7B: free
Power your chatbots, apps, and workflows with state-of-the-art language models — Qwen 3 is just an API call away.
What is Qwen 3?
Qwen 3 is the latest and most advanced family of large language models developed by Alibaba Cloud’s Qwen team. Building on the experience of QwQ and Qwen2.5, Qwen 3 sets a new standard for open-source AI with major improvements in reasoning, multilingualism, and agentic abilities.
Key Features of Qwen 3
Dense and Mixture-of-Experts (MoE) models in various sizes: Qwen 3 is available in both dense and MoE architectures, ranging from lightweight 0.6B and 1.7B models up to large-scale 32B (dense) and flagship 30B-A3B and 235B-A22B (MoE) variants.
Hybrid thinking modes: The model allows seamless switching between thinking mode (for complex, step-by-step logical reasoning, math, and code generation) and non-thinking mode (for fast, efficient, general-purpose chat).
Significantly enhanced reasoning: Qwen 3 surpasses previous Qwen models in mathematics, code generation, and commonsense logical reasoning. It also offers more stable and controllable reasoning budgets for different tasks.
Superior human preference alignment: The model excels in creative writing, role-playing, multi-turn dialogues, and instruction following, resulting in more natural, engaging conversations.
Advanced agentic capabilities: Qwen 3 is designed for agent-based workflows, supporting seamless integration with external tools and precise function calling in both reasoning modes. This enables state-of-the-art performance in complex, agent-driven tasks.
Robust multilingual support: Supporting 119 languages and dialects, Qwen 3 is capable of high-quality multilingual instruction following and translation, opening the door for truly global applications.
Benchmarks and Performance
The Qwen 3 series demonstrates industry-leading performance across a comprehensive suite of AI benchmarks, excelling in coding, mathematics, general reasoning, and multilingual understanding.
Flagship Model: Qwen3–235B-A22B
The flagship model, Qwen3–235B-A22B, consistently achieves top or near-top results when compared with the most advanced models available today, such as DeepSeek-R1, OpenAI-01, OpenAI-o3-mini, Grok-3 Beta, and Gemini-2.5-Pro.
Complex Reasoning: Highest scores on ArenaHard (95.6), outperforming or matching all competitors.
Mathematics: Leading results on AIME’24 (85.7) and AIME’25 (81.5), well ahead of most commercial and open-source models.
Coding: Exceptional performance on LiveCodeBench (70.7) and CodeForces Elo (2056), confirming its strength in software and algorithmic tasks.
Multilingual & General Capabilities: Qwen3–235B-A22B achieves strong results on LiveBench and MultiF, demonstrating robust real-world and multilingual understanding.
Other Smaller Models
Qwen 3’s architectural innovations also translate to outstanding performance at smaller model sizes:
Qwen3–32B (Dense): Delivers results just behind the flagship, still outperforming most alternative models across all categories.
Qwen3–30B-A3B (MoE): Outperforms QwQ-32B, despite using only a tenth of the activated parameters — showcasing Qwen’s efficiency and smart scaling.
Qwen3–4B (Dense): Even this compact model can rival the performance of much larger models like Qwen2.5–72B-Instruct, especially on reasoning and multilingual tasks.
How to Access Qwen 3 on Novita AI
Getting started with Qwen 3 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive $10 in free credits — enough to fully explore Qwen 3’s power, build prototypes, and even launch your first use case without any upfront cost.
Use the Playground (No Coding Required)
Instant Access: Sign up, claim your free credits, and start experimenting with Qwen 3 and other top models in seconds.
Interactive UI: Test prompts, chain-of-thought reasoning, and visualize results in real time.
Model Comparison: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.
Integrate via API (For Developers)
Seamlessly connect Qwen 3 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.
Option 1: Direct API Integration (Python Example)
To get started, simply use the code snippet below:
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="<YOUR Novita AI API Key>",
)
model = "qwen/qwen3-235b-a22b-fp8"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Key Features:
Unified endpoint:
/v3/openai
supports OpenAI’s Chat Completions API format.Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.
Streaming & batching: Choose your preferred response mode.
Option 2: Multi-Agent Workflows with OpenAI Agents SDK
Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:
Plug-and-play: Use Novita AI’s LLMs in any OpenAI Agents workflow.
Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.
Python integration: Simply point the SDK to Novita’s endpoint (
https://api.novita.ai/v3/openai
) and use your API key.
Connect Qwen 3 API on Third-Party Platforms
Hugging Face: Use Qwen 3 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.
Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.
OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.
Best Practices for Optimal Qwen 3 Performance
- Sampling Parameter Settings
Thinking Modeenable_thinking=True
Temperature: 0.6
TopP: 0.95
TopK: 20
MinP: 0
Tip: Avoid greedy decoding to prevent degraded performance or repetitive outputs.
Non-Thinking Modeenable_thinking=False
Temperature: 0.7
TopP: 0.8
TopK: 20
MinP: 0
Repetition Control
For supported frameworks, adjust presence_penalty
between 0 and 2 to reduce repetitions.
Note: Higher values may cause some language mixing or a slight decrease in model performance.
Output Length Recommendations
For most queries, set the output length to 32,768 tokens.
For complex benchmarking tasks (such as math or programming competitions), increase the max output length to 38,912 tokens for more comprehensive responses.
Standardizing Output Format
Math Problems: Include this in your prompt: “Please reason step by step, and put your final answer within \boxed{}.”
Multiple-Choice Questions: Standardize responses using a JSON field: “Please show your choice in the answer field with only the choice letter, e.g., “answer”: “C”.”
Conversation History Management
In multi-turn conversations, include only the final output in the chat history. Omit any intermediate “thinking” content.
If using a Jinja2 chat template, this is handled automatically. For other frameworks, ensure this practice is followed manually.
By following these recommendations, you’ll ensure Qwen 3 consistently delivers accurate, high-quality results across all use cases.
Conclusion
Qwen 3 delivers best-in-class performance for coding, reasoning, and multilingual tasks — no matter the project size. Ready to see it in action?
Try the Qwen 3 demo on Novita AI now and claim your free credits!
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Subscribe to my newsletter
Read articles from NovitaAI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
