On Premature Abstraction

Intro

When evaluating code similarity, it's important to think twice before making hasty decisions. Functions or classes may appear identical at first glance, but they often serve distinct purposes. These purposes are likely to evolve independently over time.

Rather than focusing solely on reducing code length or eliminating duplication, consider asking yourself these 3 questions to help you decide if the abstraction is premature:

Are you sure that duplicated code serves the same purpose?

Or might it diverge? Imagine you’re working on a new project where you’re tasked with creating a web app for a pet hotel. You need to create a component which displays details of a single animal checked into the hotel. Currently, the customer has mentioned they only take care of dogs, but they might change their mind and include more animals in the future.

For dog, let’s say you need to track its name, breed and walkDuration for a day.

Most of us would begin with something like this:

interface Dog {
  name: string;
  breed: string;
  walkDuration: number;
}

const getDogDetails = (dog: Dog) => {
  return `${dog.name}, ${dog.breed}`;
}

const DogCard = ({ dog }: { dog: Dog }) => (
  <div>
    <h2>Dog Details</h2>
    <p>{getDogDetails(dog)}</p>
  </div>
);

So far, everything is going well, and no changes are needed for a couple of months. However, the customer returns and requests that the web app should now also support adding cats, as there are many cat owners in the area, and the manager has decided to accommodate them in the hotel.

You need to add a new type of animal to your app. Having heard in the meeting that more animals will be added in the future, you decide to be proactive and create a generic animal card component that accepts any type of animal as a prop.

interface Animal {
  name: string;
  type: 'dog' | 'cat';
  breed: string;
}

interface Dog extends Animal {
  type: 'dog';
  walkDuration: number;
}

interface Cat extends Animal {
  type: 'cat';
}

const getAnimalDetails = (animal: Animal) => {
  const details = `${animal.name}, ${animal.breed}`;
  if (animal.type === 'dog') {
    return `${details}, ${animal.walkDuration}`;
  }
  return `${details}`;
}

const AnimalCard = ({ animal }: { animal: Animal}) => {
  const title = animal.type === 'dog' ? 'Dog Details' : 'Cat Details';
  return (
    <div>
      <h2>{title}</h2>
      <p>{getAnimalDetails(animal)}</p>
    </div>
  );
}

Now we have a generic animal card component that can display details of any type of animal. Brilliant! Or is it?

Weeks go by and you discover that cats actually require more care than dogs so you need to add a new property. Now, you need to track the last time the cat had its nails clipped.

Naturally, you search for your generic getAnimalDetails function and update it to include the new property.

interface Cat extends Animal {
  type: 'cat';
  nailsClippedAt: Date;
}

const getAnimalDetails = (animal: Animal) => {
  const details = `${animal.name}, ${animal.breed}`;
  if (animal.type === 'dog') {
    return `${details}, ${animal.walkDuration}`;
  } else if (animal.type === 'cat') {
    return `${details}, ${animal.nailsClippedAt}`;
  }
  return details;
}

Things have become a bit more complicated in the getAnimalDetails function. When you compare it with the original implementation of getDogDetails, the latter was much more straightforward.

const getDogDetails = (dog: Dog) => {
  return `${dog.name}, ${dog.breed}`;
}

Yes we saved some code from being repeated, but at the cost of making the function more difficult to understand. Now this is just an example, but in real projects you can easily imagine similar cruft accumulating in a codebase of thousands of lines. Things can get out of hand very quickly after contributions from an entire team of developers.

At this point, we can just separate the logic for each animal type into its own function, and you will see how much it simplifies the code.

const getDogDetails = (dog: Dog) => {
  return `${dog.name}, ${dog.breed}, ${dog.walkDuration}`;
}

const getCatDetails = (cat: Cat) => {
  return `${cat.name}, ${cat.breed}, ${cat.nailsClippedAt}`;
}

const DogCard = ({ dog }: { dog: Dog }) => (
  <div>
    <h2>Dog Details</h2>
    <p>{getDogDetails(dog)}</p>
  </div>
);

const CatCard = ({ cat }: { cat: Cat }) => (
  <div>
    <h2>Cat Details</h2>
    <p>{getCatDetails(cat)}</p>
  </div>
);

Now you can understand this code in a couple of seconds. It only costs us 3 more lines of code in total. Was the abstraction worth it?

How many times is your code duplicated?

We all know that duplication is considered a bad practice in programming because it makes the code harder to maintain in the long run. However some folks tend to focus on this issue way too early in the development process.

Instead of focusing on DRYing your code at all costs, you could ask yourself: Is it really such a big deal to duplicate this? Is it worth spending my time right now and coming up with a clever abstraction while risking it might be incorrect? This approach is also called AHA (Avoid Hasty Abstractions) programming and you can explore it in an awesome article by Kent C. Dodds.

Important thing to remember - getting rid of duplicated code is always cheaper than getting rid of wrong abstractions. This is especially true with larger projects where bad code design is already deeply entagled through thousands of lines of code.

There is a popular rule of thumb popularized by Martin Fowler in his book Refactoring. It’s called Rule of Three and it’s very simple to follow. It states that:

Two instances of similar code do not require refactoring, but when similar code is used three times, it should be extracted into a new procedure.

This way you give yourself a chance to accumulate more examples of duplication and maybe see slight variations among them. This additional information will most likely help you to come up with better abstraction and avoid frustration down the road.

Is it likely this implementation might evolve soon?

Usually I advise against thinking too much into the future when developing a piece of software. Trying to make any immediate decisions regarding abstraction based on assumptions from the future can be dangerous. Priorities change too often for this and YAGNI.

But if you already know with pretty high certainity that implementation might evolve in upcoming weeks or months (e.g. you are still at exploration phase of your product) then it’s most probably better to just wait and think about abstraction when you have more context.

After that next development iteration you will hopefully get much clearer understanding of the whole picture. Also this costs you almost nothing you just postponed the solution for the duplication problem until you have more information for the decision to make in your codebase. And this is completely fine.

Conclusion

So, what's the takeaway here? While cutting down on duplicate code is usually a good thing, we shouldn't rush into abstraction without thinking it through. Ask yourself: Is this code really doing the same thing? How many times have I copied it? Is it likely to change soon? Sometimes, it's actually better to have a bit of duplication than to create a messy abstraction too early. Remember the Rule of Three and the AHA (Avoid Hasty Abstractions) approach - they're pretty handy.

In the end, it's all about finding that sweet spot between reusable code and something you won't dread maintaining later on.