Continuous AI Operations

Recently, the DeepSeek-R1 model gained massive popularity, leading to an overwhelming surge in traffic. As demand skyrocketed, the model began experiencing slowdowns and even outages, making it unreliable for production applications.

LangDB’s Fallback Routing ensures uninterrupted AI service by automatically switching to a backup model when the primary model struggles with high traffic. Instead of facing downtime, applications leveraging DeepSeek-Reasoner could seamlessly reroute requests to alternative models like GPT-4o or even other providers of DeepSeek-R1, ensuring smooth operations even during peak demand.

Why Fallback Matters

AI reliability is crucial, but high traffic, model failures, or server outages can cause disruptions. Instead of leaving applications vulnerable, LangDB’s Fallback Routing ensures continuity by rerouting requests to a backup model in real time.

How it Works

When a request is sent to LangDB’s routing system, it first attempts to process it using the preferred model. If that model is down, experiencing delays, or overloaded, the system seamlessly reroutes the request to a predefined backup model. This prevents downtime, reduces latency issues, and improves reliability.

Setting Up Fallback Routing in LangDB

LangDB provides an easy way to configure Fallback Routing via the UI or API.

Here’s how you can use the UI to set it up:

Here’s how you can set it up programmatically:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-router",
        "type": "fallback",
        "targets": [
            { "model": "deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

This configuration ensures that if DeepSeek-Reasoner is overloaded or unavailable, requests automatically switch to GPT-4o, maintaining uninterrupted services.

Fallback router with Percentage

In the previous example, we implemented a simple fallback mechanism. However, a more robust solution would be to distribute queries across multiple providers of DeepSeek-R1 while maintaining a fallback to GPT-4o if both providers fail. This method helps balance traffic efficiently while ensuring uninterrupted AI services.

Here’s how you can configure Fallback Routing with Percentage-Based Load Balancing:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-percentage-router",
        "type": "fallback",
        "targets": [
            {
                "model": "router/dynamic",
                "router": {
                    "name": "percentage-balanced",
                    "type": "percentage",
                    "model_a": [
                        { "model": "fireworksai/deepseek-r1", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ],
                    "model_b": [
                        { "model": "deepseek/deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ]
                }
            },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

How This Works:

Primary Route: The system distributes requests evenly (50-50%) between two providers of DeepSeek-R1 to balance the load.
Fallback Route: If both DeepSeek-R1 providers are unavailable or fail, all requests are automatically rerouted to GPT-4o, ensuring continuous service.

This approach provides load balancing, and reliable failover protection, making it ideal for AI applications facing high demand and occasional model unavailability.

In more complex scenarios, you can configure a multi-level fallback system with percentage-based distribution. This approach allows requests to be routed dynamically based on pricing, performance, or reliability, ensuring efficiency while preventing downtime. Checkout Routing Strategies for more details about this.

By leveraging dynamic routing, you can:

Prevent downtime by automatically switching to backup models.
Optimize performance and cost with smart load balancing.
Ensure scalability without manual intervention.

With LangDB’s flexible and powerful routing capabilities, you can build AI applications that are not only intelligent but also robust and fail-safe.

Get Started Today

Ready to implement fallback routing in your AI stack? Check out the LangDB Routing Docs and deploy your AI applications on LangDB to ensure reliability, scalability, and seamless failover.

AI Without Downtime