Choosing the right LLM to use Cline & Roo Code without going broke

Daniel RosehillDaniel Rosehill
6 min read

The fast-growing world of agentic IDEs offers breathtaking autonomous code generation capabilities.

However, as anybody who has become highly enthusiastic about these tools will have quickly learned, using them with the recommended LLM of Sonnet 3.5 quickly becomes very expensive.

My Open Router Invoice history tells a story that is probably familiar to many:

Using RooCode on a large codebase, it would be extremely easy to rack up daily charges of hundreds of dollars - if not far more. Simply using these for a few modest Python programs, my daily spend has exceeded $20.

I still regard that as a bargain.

I've been able to redesign my personal website, set up a simple landing page for my business, create highly useful business tools, all for this $150 spend. But although my initial splurge has been a lot of fun and has brought me quickly up to speed with this exciting new crop of development assistants, I know that going forward I'll need to find an alternative that can fit within my budget.

2 Things LLMs Need For Good Agentic Code Gen Performance

If you find yourself in a similar situation, I can offer a few cliff notes as to what trying out the alternatives has been like.

Using LLMs for this kind of work poses a few decisive requirements.

Fast Inference Is A Must

Firstly, the models need to have fast inference. Writing out hundreds of lines of Python, or whatever language you’re working in, needs to take seconds, not minutes. This is particularly important given the fact that many models use the write-to-file method (tool) to avoid corrupting good code.

Qwen Coder 32B is one of my favourite coding models. I appreciate the fact that it's direct and no-nonsense in fixing or generating code.However, it is a small model comparatively speaking, and my attempt at using it via OpenRouter in Cline was overall a flat failure.

Reasoning Is Non-Negotiable Too

The second essential ingredient in using LLMs for this kind of work is high reasoning capabilities.

However, reasoning in this regard is, oddly, a kind of double sword. You don't want a model that agonises for minutes over what approach to take to a development challenge, which is why the ability to select the reasoning effort available on some frontends is particularly useful.

OpenAI's O1 series has a lot to offer for this kind of work and it's o1-mini is priced at an affordable price point.So long as it can be coaxed into expanding enough efforts in reasoning, but not too much, it can do this kind of work well.

With Agentic Code Gen, Rate Limits Are Enemy Number 1

Google is hot on the heels of Anthropic and OpenAI in trying to build usage among its Gemini models.For those reeling from the cost of using Anthropic for all their projects, the fact that some of these highly capable models are offered currently for free is hugely attractive.

But, as everybody knows, there is no such thing as a free lunch - or, a free co-generation LLM.

Google is offering some of its Gemini models for free in order to gather user data and one gets the feeling that the kind of vast context exchange required for code generation was never quite the usage data they were hoping to gather.

Perhaps for this reason, or anticipating the kind of use they would be exploited for, Google is imposing strong rate limitations on those Gemini models currently available for free.

Gemini Flash 2.0 vs. Deep Seek: Which For Code Gen?

The model that I'm currently using the most for code generation and editing is Gemini Flash 2.0, generally accessed through Open Router, simply for the convenience of being able to centralise my billing.

Both Cline and Roo Code allow you to access the Google models both directly (From Google) or via their API. So if you’re not an Open Router customer, you can just use these models directly from Google.

The somewhat confusing part, or perhaps this is just how I find it, is making sure that you are using the intended model from within the selection.
In the image below, for example, flash-001 is the paid model, whereas the one with exp (for experimental) is the free version.

If you're not sure which is which, as I was for a long time, you can just look up Open Router’s very useful documentation section where they provide the model endpoint names correlated to the actual marketing names for the various models.

Flash, Deep Seek vs Sonnet: API Pricing Compared

ModelContext LengthMax OutputInput Price (per 1M)Input % of SonnetOutput Price (per 1M)Output % of SonnetLatency
Claude 3.5 Sonnet200K8K$3.00100%$15.00100%1.28-1.55s
DeepSeek V364K8K$0.279%$1.107.3%3.90s
DeepSeek Reasoner64K8K$0.5518.3%$2.1914.6%N/A
Gemini Flash 2.01M8K$0.103.3%$0.402.7%0.57s

As the table comparison makes clear, both Deep Seek and Flash offer significantly cheaper performance relative to Sonnet across both the input and output token costs.Flash is the cheapest of the alternatives.

The Open Router leader board paints an interesting story of what’s going on and how developers are adapting to new model releases.

Here's how the top model by month timeline looked today, February the 11th, comparing model access across all users over the past month.We can see that Sonnet is a clear leader with Flash 1.5 as the second preferred.

Looking at data just from today, we can see that Flash 2.0 has displaced 1.5 in popularity. This trend will likely be sustained as users shift to Google’s more performant model.

OpenRouter also produces a rankings chart showing what applications and models are being used in showing that although some users are using these for general purpose frontends and fiction generators, code generation IDEs remain, by a large mile, the biggest use case. Interestingly (or so I thought), Cline retains a strong lead over Roo Codde, despite the latter project having a more active development cycle.

Takeaways And My Two Cents

While there's no doubt that Sonnet provides unmatched performance with agentic IDE tools that leverage MCP servers at the moment, the huge amount of context provided when sharing entire codebases with models means that users are faced with API costs that are unsustainable for many private and small business developers. A common refrain heard on subreddits discussing agentic code generation is that these tools use API tokens for breakfast. They do, and spit out large credit card bills.

My suggestion is that Google's affordable Flash models provide a good alternative at the moment.

0
Subscribe to my newsletter

Read articles from Daniel Rosehill directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Daniel Rosehill
Daniel Rosehill

I believe that open source is a way of life. If I figure something out, I try to pass on what I know, even if it's the tiniest unit of contribution to the vast sum of human knowledge. And .. that's what led me to set up this page!