Stop Trusting Code Coverage: Mutation Testing with Stryker Will Change How You Write Unit Tests


Unit test coverage is like flossing. You say you do it, but deep down, we know you’re not doing it well enough (not me though, I'm perfect). Allow me to introduce you to mutation testing, the slightly evil cousin of code coverage that intentionally breaks your code to see if your test suite even notices. In this post, I’ll walk through my first experience using Stryker Mutator… the tool that gleefully mutates your production code and then judges your test suite for sport.
Stryker Mutation Testing works by making small changes (called mutants) to your production source code and then rerunning your test suite to see if the unit tests catch the changes. If a test fails, the mutant is “killed”, which indicates that the test suite is effectively validating the behavior of your production code. If the mutant survives, it means the tests did not detect the change, telling you there are gaps in your test coverage and overall test quality.
The First Report
I wrote a project with some basic unit tests that were a mixture of good and not useful tests. I have an entire repository around this, but here’s the commit that I used for the first test. Using the Getting Started Instructions provided by Stryker, I installed Stryker globally, and that allowed me to execute the commands via CLI. I did dotnet new tool-manifest
because that’s what the instructions said to do, then executed Stryker via CLI on the testing project using the command dotnet stryker
while having the unit test project as the working directory, and it generated a new directory at the root of the repository called StrykerOutput
. Opening the HTML report gave me this summary page:
That was easy. Right away, I liked what I saw. This is a clear summary broken down by file. Since my project only had one class, all the statistics naturally bubbled up to the top level. That said, I wasn't immediately sure how it came up with certain numbers in the "Killed" and "Survived" columns.
Digging into SomeBusinessClass.cs
, I was greeted with a more detailed view:
This view showed some insightful metrics. I knew there were bad unit tests in the mix, and even though I had around 75% code coverage, it was clear that metric wasn’t telling the whole story. Stryker breaks things down into several categories:
Category | Definition |
Killed | At least one test failed when the code was mutated (this is what we strive for) |
Survived | The mutation passed all unit tests, which likely means a missing or incomplete test. |
Timeout | Tests took too long, possibly due to an infinite loop. |
No Coverage | The mutated code wasn’t hit by any test at all. |
Ignored | Mutants that were explicitly ignored. |
Runtime Errors | Mutants that caused exceptions (e.g., out-of-memory). Interesting to compare with Timeout. |
Compile Errors | Mutants that didn’t even compile. |
Detected | Any mutant that was caught (i.e., Killed, Timeout, etc.). |
Undetected | Mutants that snuck through and not covered or not asserted correctly. |
Total | All mutants, minus runtime and compile errors. |
These definitions are provided directly in the Stryker report. At first, I wasn’t sure how to locate each mutant or understand exactly what led them to survive. Still, it clearly reported back that my tests needed work. That said, one of my early frustrations was that the report didn’t clearly explain why a mutant survived or what the change was.
Looking further into the test coverage, I found lines that weren’t hit and sure enough, Stryker highlighted them too. Stryker also highlighted vulnerabilities to your logic based on the mutations:
So I wrote a test to cover that line but just that line, not much else. This actually increased the number of surviving mutants. This is Stryker’s way of reporting back to you that you wrote a low quality test. While the unit test technically passed, it wasn’t comprehensive. I didn’t verify that SomeCallAsync("no")
was invoked, only the result. Once I added an assertion to verify the method call, my “Killed” numbers jumped up and the mutation score improved dramatically.
One thing I did notice is that even with this small project, the run took about 15 seconds. That made me wonder how long it would take on a larger, enterprise-level codebase. I also started to wonder whether Stryker could distinguish between different types of tests, like focusing only on unit tests and skipping integration ones. Then I remembered that the package manager is scoped to the project you have selected, so you’re good to go if they’re in a different project.
As I spent more time in the report view, things started to click. Stryker generates a report of your code base and adds small color-coded markers to your source code dots showing which lines were tested, mutated, or missed. At the top, toggles let you filter different mutation types, like Killed or Survived, which made the insights easier to parse:
Looking at the surviving mutants (your problem children), I received clues that were especially helpful. For example, it flagged that I didn’t test the right side of a null-coalescing operator (a common one to miss). It also showed a mutation related to an arithmetic operation that slipped past my tests. Another survivor was a call to a dependency where I had mocked the input too generically.
This showed me that Stryker doesn’t just mutate logic, it also checks whether calls to dependencies are verified. In my original tests, I used Moq
with It.IsAny<string>()
, and while the test passed, it wasn’t precise. Stryker picked up on that, which pushed me to write more explicit tests.
Interestingly, I even wrote a failing unit test on purpose, and Stryker didn’t flag it. Stryker isn’t about test results; it’s about test effectiveness. You can have passing tests that don’t catch regressions, and Stryker helps uncover exactly that.
Overall, I’m really impressed with Stryker so far. It was easy to set up (just make sure you have nuget.org as a package source because for some reason it was removed from one of my machines), the CLI experience was smooth, and the reports are packed with actionable insights. It helps shine a light on the weaker parts of your test suite, especially when paired with something like dotCover. Sure, it takes a little getting used to (killed = good, survived = bad), and the feedback isn’t always crystal clear, but it does its job well.
There’s also functionality to ignore specific files or methods, which is handy when you’ve inherited code or need to set team-wide mutation coverage standards. Stryker intentionally skips mutating constants, which I’m totally fine with.
As I write this post and continue exploring the Stryker documentation, I find myself asking more and more questions about how to get the most out of mutation testing. There’s a lot to unpack, and I’m still deep in the weeds of experimenting including kicking off a run against one of my enterprise-level solutions to see how it performs at scale. Here’s what’s on my mind:
Goodhart’s Law - Business, business, business. Numbers.
Every time I hear someone talk about unit test coverage percentages, I instinctively cringe a little. Yes, coverage matters but almost no one talks about test quality.
This is where Goodhart’s Law comes into play:
"When a measure becomes a target, it ceases to be a good measure."
- Marilyn Strathern
If you tell engineers they need to hit 80% code coverage, they’ll hit it. But will they actually test meaningful logic? Probably not. You’ll get quick assertions like “make sure the return value isn’t null,” but that doesn’t truly verify behavior. Mutation testing helps raise the bar by holding our tests accountable to catch real issues.
Execution Time: A Reality Check
That enterprise-level test run I mentioned? I started it over two hours ago and it’s still going. Stryker analyzes the project, generates mutations, compiles each mutant, and runs the full test suite for every one. It then bundles all that into a neat little report.
If each test takes 10 seconds and you’ve got hundreds of mutants, and thousands of tests, the total runtime starts to feel... exponential. It’s clear this isn’t something I’d want in the CI pipeline, at least not without some serious optimization. That’s an area I still need to explore.
Where and When Do I Do This?
I’m a huge believer in automating what I can and delivering feedback to developers early and often. But now that I’ve seen the time and resources this takes, I’m asking: When should mutation testing actually run?
It might make sense as part of a regular quality audit, or in pre-release cycles, rather than per-commit. Still figuring that out.
Mutant Schemata
Earlier, I mentioned the different mutant categories, but it left me wondering… how can a mutant cause a runtime exception?
The answer hit me once I read up on Mutant Schemata. Stryker actually injects modified logic into your production code during test runs. That means something like a string concatenation mutation could cause a runtime error:
if (Environment.GetEnvironmentVariable("ActiveMutation") == "1") {
return "hello " - "world"; // mutated code
} else {
return "hello " + "world"; // original code
}
It is a great example and it shows how deep the tool goes. There are ways to tune this behavior, but I haven’t gotten that far yet.
As I dig deeper into mutation testing, I’m excited to learn more. What shows up in the reports? How other teams are using it, and where are the real pain points? If you’ve used Stryker, another mutation tool, or are thinking about it, I’d love to hear your thoughts.
Ask me questions. Challenge assumptions. Give me info!
Subscribe to my newsletter
Read articles from Larry Gasik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by