Qwen 3 AI Update Surpasses DeepSeek V3

The pace of innovation in the AI large model space has become breathtaking. Just last week, Kimi K2 was updated. Then, yesterday, Alibaba dropped a surprise release: a major update to the Qwen 3 series. This new version positions Qwen 3’s performance metrics to directly challenge, and even surpass, Kimi K2 and DeepSeek V3.

What's New in the Qwen 3 Update?

The Qwen team has updated its flagship Qwen3 model, releasing an enhanced version of the Qwen3-235B-A22B-FP8 non-thinking model, now named Qwen3-235B-A22B-Instruct-2507-FP8. While the name is a mouthful, it’s packed with information.

This update focuses on several key areas:

Drastically Improved General Capabilities: Whether handling complex instructions, rigorous logical reasoning, deep text comprehension, mathematical calculations, scientific Q&A, or even code generation and tool use, Qwen 3 demonstrates superior comprehensive strength.
Broader Knowledge Coverage: The model now has enhanced coverage of niche knowledge across multiple languages, meaning it can understand and answer questions in more specialized and obscure domains.
More Powerful Long-Context Capabilities: For users who need to process massive amounts of information, Qwen 3’s enhanced understanding of up to 256K long contexts makes it more adept at handling lengthy documents and analyzing complex reports.
Better Alignment with User Preferences: Through extensive optimization and fine-tuning, the responses generated by Qwen 3 are higher in both validity and quality, more closely matching the practical needs of users.

The official evaluation used GPT-4o as a referee, a method that provides a valuable point of reference.

In my own hands-on testing, I used several specific tasks to assess its real-world capabilities:

Instruction Following: For complex prompts with multiple constraints, Qwen 3 showed a high completion rate, reducing the need for repeated prompt adjustments.
Coding Ability: When generating Python and Go code for data processing tasks, the code was highly usable with a lower error rate than the control models.
Long-Text Reasoning: After inputting a ~200K technical document, Qwen 3 was able to accurately locate and summarize information when questioned about specific details.

My conclusion is that Qwen 3’s overall performance is highly competitive. The rapid iteration of its tech stack is certainly worth our attention.

The Core Technical Highlight: FP8 Quantization

FP8 is the key to this update.

FP8 is a model compression technique. It reduces the precision of model parameters from the traditional FP16 or BF16 down to 8-bit floating-point (FP8) without significantly impacting performance. This dramatically lowers VRAM usage and computational requirements.

The benefits are clear:

Lowering the Barrier: It allows this powerful model, which once required top-tier hardware, to run on a wider range of devices.
Boosting Efficiency: Inference is faster, and response times are shorter.
Enabling Local Deployment: It provides immense convenience for individual developers and researchers to experiment with and deploy the model locally.

Qwen 3's Agent (Tool-Use) Capabilities

Beyond its powerful foundational abilities, Qwen 3 also excels as an Agent, capable of more accurately understanding user intent and calling external tools (like APIs or database queries) to complete complex tasks.

Qwen 3 is complemented by the Qwen-Agent framework, which supports tool use and can be used to build automated task workflows. This expands the model's applications far beyond that of a simple chatbot.

How to Quickly Deploy Qwen 3 Locally

Deploying large models has never been easy, traditionally involving tedious environment setup, dependency management, and hardware adaptation.

My recommendation is to use ServBay + Ollama to simplify the entire deployment process.

Install ServBay: Get the application from the official ServBay website (https://www.servbay.com). It’s a local development environment that integrates common tools and manages services and dependencies, making it easy to set up on both macOS and Windows.
Install Ollama: In the left navigation menu, click "Packages," find Ollama, and click install. ServBay will automatically handle its environment configuration. Once installed, don't forget to click the activation button to start Ollama.

Install Qwen 3: Click "AI" in the left navigation menu, find qwen3, and install it with a single click.

This process bypasses most manual configuration. You don't need to worry about complex dependencies or config files; ServBay and Ollama have paved the way for you.

Deployment Options for Advanced Users

Of course, for professional users seeking higher throughput and customized deployments, Qwen 3 also offers more advanced solutions, such as using vLLM and SGLang.

Deploying with vLLM:

  vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --tensor-parallel-size 4 --max-model-len 262144

Deploying with SGLang:

  python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --tp 4 --context-length 262144

Note: When deploying long-context models, you may encounter out-of-memory (OOM) issues. The official recommendation is to try reducing the context length (--max-model-len or --context-length) to decrease VRAM consumption if this occurs.

Conclusion

In conclusion, the latest Qwen 3 update has sent a new shockwave through the large model landscape. With significant improvements in general capabilities, long-context understanding, and Agent functionality, it poses a formidable challenge to existing popular models like DeepSeek and Kimi, making it one of the most compelling models to watch and try today.

And for those who want to experience Qwen 3's powerful features on their local machine right away, using ServBay to one-click install Ollama is undoubtedly the simplest and most efficient method. It allows you to skip all the tedious setup and get straight to the main event: experiencing the power of a top-tier AI model firsthand.

Chinese AI model Qwen 3 Drops a Update, Outshining DeepSeek V3