What percentage of your code should be AI-generated?

Table of contents
- AI vanity metrics are the darlings of investors and execs
- What percentage of AI code is too much?
- Should percentage of code be a metric that’s tracked?
- ‘Percentage of code’ metrics seem focused on cost cutting
- Metrics developers actually need
- Creating useful AI adoption metrics: best practices
- Should we track AI-generated code percentages at all?

We’ll come clean: That title is mostly clickbait.
This isn’t an article where we tell you that 20% or 30% or even 50% of your codebase should be AI-generated. We’re writing this because it’s looking like very soon someone could be telling you that. Maybe your boss. Or your C-suite.
In April, both Google and Microsoft came out publicly with claims that up to 30% of their new or existing code was AI-generated. What’s interesting is that both Microsoft and Google decided to quantify their AI usage in similar percentages during the same month. Even more notable is where they made these claims.
Google’s Sundar Pichai shared that stat in an earnings call – and this wasn’t the first time he talked about AI-generated code at Google in this way. During Google’s November 2024 earnings call, Pichai stated that 25% of Google’s new code was AI-generated. So, he explicitly framed it as an update for investors on Google’s progress at implementing AI.
Not to be outdone, Microsoft’s Satya Nadella came out a few days later during a fireside chat with Mark Zuckerberg at LlamaCon with the claim that 20% to 30% of Microsoft’s codebase was AI-generated.
What we likely saw in those two updates was the birth of a new metric that investors and executives will believe says something important about the state or future of a business. But what does a percentage like that really say? And, if it’s going to be used more widely as a way to measure AI adoption or a business’ competitiveness, how would companies and developers even decide what the ‘right’ percentage looks like?
AI vanity metrics are the darlings of investors and execs
If you wondered why so many publicly traded companies rushed in 2023 and 2024 to share their ‘AI strategy,’ it’s because the stock market rewarded them for it. With investors projecting both revenue increases and cost savings from adopting AI, the stock prices of companies who announced an AI strategy increased by 2% more on average than companies that didn’t.
But 67% of companies had even better results – their stock prices soared over 6% higher. BuzzFeed’s stock even went up a whopping 120% just for announcing they planned to use generative AI to create content. Companies that didn’t articulate their AI strategy were generally punished in publicly traded markets.
AI adoption has since become a key priority for Executives – both for the actual business benefits it promises AND for the stock price increases they now rely on whenever they announce new AI investments or adoption.
That’s leading to what some have characterized as a toxic culture in some companies where all AI adoption is counted as good AI adoption because it makes investors and executives happy – and gives the veneer of increased productivity and velocity.
In April, Lead Dev wrote about how companies are now instituting AI coding mandates and how that’s – in their words – “driving developers to the brink.” These mandates can be anything from the request to increase the number of suggestions you accept from AI coding agents to public leaderboards ranking AI usage by employee to vague performance-based OKRs where devs are simply expected to use AI ‘more’ from quarter to quarter.
The problem? Like many metrics, they’re blunt ways to measure and incentivize behaviors that have complex outcomes. This is demonstrated in the comments of a Reddit post the Lead Dev article links to. Says one dev: “At our monthly engineering all hands, they give us a report on our org’s usage of Copilot (which has slowly been increasing) and tell us that we need to be using it more. Then, a few slides later we see that our severe incidents are also increasing.”
It’s clear there’s a disconnect between engineers and executives on the benefits of AI coding tool use. An Atlassian survey of over 2,000 IT managers and developers in 2024 showed that leaders listed AI as the most important factor in improving developer productivity and satisfaction but only a third of developers reported experiencing AI-related productivity gains.
By only measuring AI use and not the quality of the code that AI generates or the actual time saved once debugging and more involved code reviews are factored in, you could be incentivizing AI use even when AI use hurts your company.
In that case, you might achieve 50% of your codebase being AI generated while also adding exponentially more bugs to your code and increasing issues and customer complaints. And to make it worse, you might ALSO not be saving any time since your devs could be spending an equal amount of time reviewing and fixing that code as they would have if they wrote it from scratch.
But your AI usage would sound impressive during an earnings call, right?
What percentage of AI code is too much?
While companies like Google and Microsoft are racing to make as much of their codebase as possible AI generated – and Microsoft’s CTO is even predicting that 95% of all code will be AI generated by 2030 – it’s unlikely other companies are aiming that high.
On a YC podcast back in March, Jared Friedman, Y Combinator Managing Partner claimed that a quarter of the accelerator’s current cohort have codebases that are 95% generated by AI.
I’ll summarize (and sanitize) the YouTube comments on that one for you: Most devs, unsurprisingly, felt that having a codebase that was 95% AI-generated was a recipe for disaster.
It appears we have a Goldilocks dilemma here. There is a magic number that is neither too low, nor too high but just right when it comes to AI-generated code. But what is it?
At what point does your code become too AI-generated? Is it 40%? 50%? 75%? Over 75%? Does it depend on the application? The language? Does it vary from one company to another? And what are you actually saying about a company by sharing this percentage?
Should percentage of code be a metric that’s tracked?
To answer this question, let’s dig into what this metric might actually be measuring.
When companies like Google and Microsoft say 30% of their code is AI-generated, what are they counting? Usually, this refers to the number of lines or commits that originated from AI coding tools like Copilot, Cursor, Claude, or Windsurf. But raw lines of code is a notoriously poor metric of productivity or value. And that doesn’t account for the lines AI wrote that were then heavily edited.
AI coding tools often excel at writing boilerplate or repetitive code—exactly the kind of low-complexity, low-value code that developers could produce rapidly themselves anyway. Counting these easy wins inflates AI adoption numbers without necessarily indicating meaningful productivity gains. Another challenge is that developers report frequent and concerning hallucinations like making up API keys that don’t exist.
More critically, a metric based purely on volume doesn’t capture the complexity or quality of the code generated. It doesn’t tell you how much developer time was needed to debug or review the AI-generated code. Without these nuances, a 30% metric means almost nothing about actual efficiency or quality outcomes.
Developer forums and surveys consistently highlight frustration at AI-generated code's tendency to introduce more bugs and vulnerabilities with one survey estimating that AI coding tools add up to 41% more bugs. For example, Harness recently released a survey that revealed 67% of developers spend more time debugging AI-generated code than human-generated code and 68% spend more time resolving security issues. Even worse was that 59% in the Harness survey said that they experienced problems with deployments at least half the time they used AI coding tools.
Perhaps, for this reason, it’s not surprising why companies that are selling coding agents, like Microsoft, might want teams to adopt ‘percentage of the codebase’ as a success metric over others that give a more holistic view of the use of AI in development.
But misguided OKRs, rigid quotas, and public leaderboards of each developer’s AI use ignore context and quality, leading developers to spend more time chasing meaningless metrics than delivering high-quality software. If you’re trying to measure the value of your AI investment, there are much better metrics to track.
‘Percentage of code’ metrics seem focused on cost cutting
What’s more concerning is that many devs report that AI mandates like these are often connected to hiring freezes at their companies – with entry-level jobs being hit harder. That suggests many execs share the ‘optimism’ that AI will replace developers – something that people like Meta CEO Mark Zuckerberg, Salesforce CEO Marc Benioff and AWS CEO Matt Garman are talking about publicly a lot these days. In early 2025, Bernioff shared his plans on a podcast saying, “Maybe we aren’t going to hire anybody this year. We have seen such incredible productivity gains because of the agents.”
If this seems overly optimistic of AI’s potential to you, you’re right to feel that way. After all, Microsoft’s Nadella even admitted that they were seeing ‘mixed results’ with AI-generated code in certain languages with their own Copilot tool in the LlamaCon chat.
The problem with metrics like this is that many companies now believe that the use of AI-generated code is a way to reduce costs by replacing developers – or, at least, reduce their numbers. But, if you're measuring what percentage of your codebase is AI-generated because you believe you’ll eventually be able to cut your workforce by 50% once you achieve 50% AI code, you’re going to be sadly mistaken.
In that case, adopting this metric appears like it could put companies on a collision course – not just to create more technical debt and issues – but also to disappoint investors. Either those layoffs won’t materialize or, if they do, they’ll lead to increased issues and noticeable quality degradations that will impact a company’s bottom line.
Metrics developers actually need
Instead of just measuring raw code generated, companies should be measuring the quality of that code and the true productivity impact of AI adoption. For example, in addition to AI usage and adoption metrics companies should also track:
Bug rates before and after adopting AI coding tools.
Deployment stability (frequency of production incidents) against AI usage,
Actual time saved in the full development lifecycle once debugging and more complex code reviews are added in.
Developer satisfaction and productivity based on qualitative feedback.
Another strategy is to follow what ChargeLab has done. Mentioned in the LeadDev article, it has a more dev-focused AI strategy. Their developers choose their AI tools freely, which resulted in a measured 40% productivity increase. This increase was not driven by mandates but by empowering developers with context-specific and meaningful metrics they themselves set and allowing them to have choice over their tools.
Another LeadDev article also suggested that AI adoption shouldn’t be narrowly focused on code generation since productivity gains can equally be had at other parts of the software development lifecycle like code reviews, refactoring, testing and documentation. Metrics around how much of your codebase is AI-generated ignore the potential savings from those areas.
Indeed, the DORA report on the Impact of AI in Software Development outlined 5 strategies for ensuring AI actually helps with productivity gains. The first strategy? Use AI at all stages of the development cycle, not just for code generation. The use of AI throughout the entire development cycle is becoming so common, that we flagged it as the main development trend we expect to see in 2025 in a post last month.
Creating useful AI adoption metrics: best practices
Effective AI adoption metrics should come directly from engineering teams themselves, not from executives disconnected from the realities of the company’s codebase.
Metrics should:
Be developed in collaboration with engineers who understand day-to-day workflows.
Align with real productivity and business outcomes, not superficial adoption targets.
Encourage flexible, context-aware experimentation rather than rigid enforcement.
ChargeLab's strategy, for example, involved setting a broad organizational goal (e.g., saving $1 million annually in dev time by using AI) but giving teams freedom in how to achieve it.
It balances clear direction with developer empowerment, focusing on measurable, meaningful outcomes instead of simplistic quotas that are narrowly limited to the use of one universal tool.
Such a metric would allow developers to decide to write code manually when it makes more sense to and to save time at the code review or QA/testing phase by deploying AI tools there instead.
Should we track AI-generated code percentages at all?
Ultimately, "percentage of AI-generated code" as a standalone metric has limited value. It’s too simplistic, incentivizes the wrong behaviors, and risks causing developer frustration.
Instead, engineering leaders and developers should focus on metrics tied explicitly to productivity, code quality, and developer satisfaction. These nuanced, outcome-oriented metrics provide true insight into AI’s impact far beyond what a simplistic “percentage of your codebase” metric could ever convey.
Want to try an AI tool that will help you ship better code faster? Start a 14-day CodeRabbit trial today!
Subscribe to my newsletter
Read articles from Sahil Mohan Bansal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
