The rule of three
Every developer is a software architect.
Every time we write code, we follow the design patterns, we decide on the design patterns, we take software principles into consideration, we weigh pros and cons, we aim towards a more maintainable and cleaner code and want to be good boy/girl scouts. We do this always, every day, every hour.
Software architecture principles and concepts are second nature to us. They are of course not written on stone, but they give us solid ground to base our decisions on.
One of these principles, which we continuously engage and probably not even think about, is the DRY principle. It's almost omnipresent. But as always, this one also has its pitfalls. And when misused, can do more harm than good. Let me explain.
Reusing as best-practice
One of the activities we perform daily is to avoid repeating what has been done before (so called the Don’t Repeat Yourself - DRY principle): To maintain always one source of truth. This is essential to:
improve maintainability (or to keep the level of maintainability stable):
The more duplicated code there is, the more difficult it will be to comprehend it, to read the code, to refactor, to fix issues. Fixes will target one copy and miss the other one. When the time comes to refactor it, we will have to deal with the clones, which were branched some years ago, with tens of references to each, and a change history that is alien to us. We have all been there.
follow the existing patterns and guidelines and styles:
There are cases, where we know that we could have done it different or better, but reusing the existing code can also help unifying the programming practices in the code base we are working on. How many times have you encountered two test frameworks that are being used together? Or REST API’s along with GraphQL? Sometimes this can be an outcome of an ongoing migration/refactoring. In all other cases, leaning towards reusing helps keep the architecture in shape.
reduce the chance to introduce new issues:
Existing code is the tested code. It endured many tests in the past. Probably it was once not as good as it is now. Its issues were fixed, its implementation was made better. It was hardened in time with testing and with feedback.
On the other hand, what we need to accept is that our own fresh version of the code will start from scratch and might have to pass through these stages to achieve similar maturity.
Branching everywhere
Take a data access layer. It’s a good candidate to always reuse, right? This is because data access layers have a certain, specific patterns to use them. And most probably the previous developers in the project have already covered all what you need to access to your database: Authentication, authorization, caching, performance, data integrity, transaction management, you name it.
Yet in a past project I was working on (where the app was already some 15 years old), there were four(!) different data access layers accessing the very same database. Needless to say, this was repeatedly producing bugs, which were not easily traceable and fixable.
Once we started the refactoring, we immediately recognized the reason behind the branching. A familiar story for many of you: Someone decides that the existing code doesn't provide what she needs, doesn't want to impact anything else, branches/rewrites her own version and adds on top the missing piece she wanted in the first place. Give it five years, half of the code base uses the old, half the new version.
Gray zones
On the other hand, not in every situation the decision can be as clear as the example above. Even then, many developers have an unconditional tendency to abstract and to reuse. The emphasis is on the word "unconditional": No matter what, the duplication must be avoided.
However, more than often, there are cases, where making decisions on whether to reuse or not, becomes a gray zone. How do I define a situation a gray zone, and how do I define my approach? Here is the question I ask myself to classify and assess whether abstracting / reusing is the better approach:
How sure are we that these code pieces continue to share the same characteristics?
Will they continue to have similar requirements, or will they deviate from each other? Is there already a plan for them, are we in the middle of a long running refactoring? Because if you are not sure, or if you know that the duplicated blocks are already drifting apart, the persistence to reuse might leave you with a more complex implementation, instead of helping you.
The message you give
Before going any further, let me give an example to make this problem more concrete. In a past project, I had a code review, having a similar code block as below (simplified here for demonstration purposes):
Point Next(Point p) {
var x = p.x * 2;
var y = p.y * 2;
return new Point(x, y);
}
As you can guess, my reviewer told me to extract the two lines (with the arithmetic operations) to a separate function. Because, of course, I was repeating myself and this should not be allowed.
To explain my perspective in detail, let's refactor the code above as follows:
Point Next2(Point p) {
var x = getNextCoordinate(p.x);
var y = getNextCoordinate(p.y);
return new Point(x, y);
}
int getNextCoordinate(int coord) {
return coord * 2;
}
Now, the code repetition is avoided, just as my reviewer once asked.
Consider now you implement Next (or Next2), and leave the team after some time. A future developer comes and looks at your code. Do you recognize the different messages you give, with your code, to this new developer? The Next method says "Next coordinates are calculated by multiplying the current coordinates by two, but the calculations are independent. I assert no opinions". The Next2 method says instead "the coordinates are intended to be multiplied with the same multiplier".
Which message have you had in mind to give in the first place?
The DRY principle (just like all other design choice) is not without impact. Just as any other code change you do, remember that you imprint your messages to your code, and by doing so, shape its direction.
And this is exactly why and how every developer is an architect.
The Rule of three
In a reuse scenario, if I find myself unsure about the characteristics of the duplicated code blocks and their references, here is what I do: I simply wait.
I wait for the code to mature further. Later when I come back to the same code block, the situation will be either of the two: Either the clones are there, untouched; or they have changed. If they have already deviated from each other, good for me, I saved the effort, I avoided unnecessary code coupling, I avoided weird-long variable names, I avoided asserting opinions in the code and maybe some cyclomatic complexity. On the other hand, if they have stayed the same, then fine, I (or someone else) can do the same assessment again.
There is a certain metric I find useful when I do my assessment: I simply look for further similar requirements, which take use of the same code block, or a variation of it. Each further requirement clarifies every time how this code block should actually work. So, to put it in simply, what I do is to wait for a third reference point before extracting/refactoring the clone code. This is the Rule of Three.
The idea behind the Rule of Three is simple: Before deciding on abstracting & reusing, make sure that there are sufficient number of duplicates / references to give a good idea on how to achieve the best abstraction.
Going down the Rabbit Hole
Extracted and reused code blocks sometimes get more complex in time. Because:
Reused code blocks join the fates of the previously independent pieces of code, and it carries their combined burden.
If I do the refactoring and abstract and reuse, will it simplify or complicate my code? This depends on whether the characteristics of the references remain similar in the future or not. Simply waiting, or the rule of three can be good strategies to identify these cases.
In the data access layer example above, the outlook seems clear. That big legacy application didn't seem to change its data access characteristics in the foreseeable future.
But there are also other, more obscure cases. Consider an application, which is specialized on doing a particular task. It implements a considerable domain knowledge. At some point in time, the product owners decide to branch from this application to create a sister application that targets a similar, yet different task.
It can be a medical application for ankle surgeries, now the application will target knee surgeries. It can be an e-learning application for universities and the new application will target high schools. It can be a finance application that targets the US market, and now they want to target the EU market.
How often do you encounter code blocks like this:
double calculate(
double x,
int factor,
bool isEUMarket,
bool isGermany,
bool isMultiply,
int isSpecialCaseMultiplicationFactor = 1) {
int result = isMultiply ? x * factor : x / factor;
if (isEUMarket) {
if (isMultiply && !isGermany) {
result *= 2;
}
}
else if (isMultiply) {
result *= isSpecialCaseMultiplicationFactor;
}
}
This is of course a hypothetical example and might not make much sense when given in isolation, but it should be sufficient to make my point. It starts with a good intention to reuse, and in time, the requirements deviate from each other. This starts putting pressure on the shared code. More and more specific business logic, specific variables are introduced. It makes the code block less and less readable, less logical and maintainable. The complexity increases.
The example above should actually be two separate methods, used in two different contexts, having probably no shared aspects. The result of an immature abstraction (and the message that the developers gave) led to a path where the benefit is lost and only the complexity remains.
And the crucial thing is, this path goes only in one direction. In time, the refactoring cost and the impact risk will only increase. Countless times have I found myself breaking down these anomalies, applying better suited design patterns, reducing complexity and removing unnecessary couplings.
You can (sometimes) repeat yourself
As Sandi Metz beautifully states in her seminal blog post:
"Duplication is far cheaper than the wrong abstraction."
The cost of wrong, and immature abstraction will get you trapped in a "sunk cost fallacy", which will not always be easy to retreat.
Here is a checklist on my decision process on the DRY principle:
Do the requirements (of the to-be-reused candidates) share the same characteristics?
Are the dependencies we're about to create logical?
Are we sure that we assert no opinions that we do not intend?
Is the code mature enough?
Are there sufficient reference points?
If not; if you are not sure about the future of your duplicated code, do not worry to leave it as it is. Make use of static code analysis tools. Add specific comments to track the duplicated code. Waiting for the code to mature, or waiting for further references can help you keep your code simple and lean.
You can sometimes repeat yourself, and it’s OK.
Subscribe to my newsletter
Read articles from Kemal Taskin directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Kemal Taskin
Kemal Taskin
Developer, architect, Ph.D.