The 4 stages of Software Integration Grief


It’s a common decision point in software development. A typical dilemma. You are moving happily and blissfully on the rainbow road of steady development when business comes and snaps you with a need that was needed for yesterday if not for last year.
Business didn’t think it was worth mentioning since so many systems they use seem to have the same thing so it’s assumed that it’s a feature that comes for free. “Look! this web page has it!”, so one thinks there’s no effort behind it. And to give another excuse, this feature is somewhat not central to the business. A nice to have, that suddenly -according to the latest strategy analysis done between a bourbon induced burp and a puff of the cigar (exaggeration done for entertainment purposes)- is what will give the company that missing competitive edge.
That’s how you’re given the task to build a chat that allows users to communicate between each other. Or add an AI agent to “help” -most likely piss off- users. Or a map that shows how to reach your destination to do some transaction. Or suddenly there’s a shopping cart in the system -when there was never a concept or product to sell-. Or you need to manage pictures. Or anything actually that might not be related to the core of your business.
So then you start to look at possible solution and your journey begins:
Stage 1 - Excitement
“Oh there’s no way we can develop in time, but wait, haven’t I seen the same thing already developed there? what was the name of the product? Yes Zumba Chat. Oh there are similar products. Let’s investigate them!”.
And the excitement is building as you see how you can potentially do some big development by just using some external tool. “All of those products do what we need!. They all say they are easy to use!”.
“Here it says we can use their cloud, but it looks expensive, and also our GDPR policy forbids us to use external clouds unless they comply with X, Y and Z. Installing this in house seems like a viable option. It’s the option explained here in the documentation”. You give it some though but you finally give it in.
“That is so good, the development is already done, why should we re-invent the wheel if the product already exists! The setup is also simple. I just need to download the app, which just a simple Docker image! Maybe I have to pay some license, or I could stick to the free tier. But all in all I am saving a lot of development time! It will be just some needed integrations with the rest of the ecosystem, and probably I should pay some attention to the operational side, after all, we will have to maintain the infra.”
Stage 2 - Commitment and perseverance
The decision is taken. We are going to integrate with Zumba chat!
It’s installation time now. “It was a single docker image right? That should be easy! All of our services are Dockerized so it’s just like using the already existing pipeline our awesome devops guys did for us but with this already done image! Easy peasy”.
“Of course, I am aware, it’s not just the Docker image, it’s a whole service, it needs a database. This is nothing our of the ordinary, I can just use our cloud available database like the rest of the services!”. But then the versions aren’t exactly compatible. Or the cloud version is not the one that the app expects so it works glitchy. Yet, you still manage to overcome the problems - though not before gaining level 10 experience of the quirks and intricacies of the database.
Oh and database isn’t the only thing the app needs. Zumba chat also requires, a cache like Redis, any queue mechanism, an SMTP server to send emails and a key to access your bank account. Nothing will deter you though! But being consistent with the spirit of saving time, you see that the tool offers a way to install all of those if you were to use a whole EC2 instance. Yes it installs all of that infra in that machine, not only the app, but the database, the queue mechanism, Redis, MongoDB, and all the circus, but it’s the way they say it surely works.
So an exception in the otherwise repetitive boring ecosystem is born. A piece that is not like the others, it won’t benefit from any kind of central platform or maintenance the devops guys can give you since they can’t be bothered by such a different piece of infrastructure. But, for sure we’re still saving time.. right?
Well, yes still it’s been less time than it would have taken if we had to develop the whole thing in-house. But of course it took double the time form the initial expectation, which only take into consideration the development bits needed to integrate the rest of the system with this piece and an easy deployment of the tool.
But so far so good. Here’s some piece of functioning software in production. Yay!
Stage 3 - Frustration
And we enter maintenance mode. At first there are no signs that you should worry. The decision was a good decision. The product was what you needed, the exact features. You needed something somewhat generic that wasn’t too related to the core of your business and so you couldn’t pretend to develop such a tangential feature yourself. Maybe the installation was difficult but it was successful.
But the first maintenance problems appear. This is a feature that uses a complete different set of technologies than the rest of the system and so some typical maintenance issues were overlooked. No problem, let’s add some custom alarms to make sure there’s a little bit of maintenance there.
Oh, here it comes the first security issue. Let’s upgrade the tool. “I have to re-install it again. Well, nothing I didn’t manage to do before, right? Let’s just pray three ave-marias and deploy the latest. Phew, everything is still there and working”.
Then there’s a small bug. “How come this notification aren’t being sent?”. Oh yes, the developers of Zumba are aware of that, they have that issue open. They will solve it in the next release. Another three ave-marias with the deploy and all good. Oh, last deploy introduced another bug that breaks another feature as it’s not really compatible with the version of the cloud database we use. Ouch, maybe we can roll back.. In the meantime just don’t use that feature…
These small accidents are adding up little by little, and you feel you have little control of what you can do.
And the turning point, the unexpected. From business you receive a message: “can we change the subject of this email notification to say something else instead?”. The doors to customization opened just a little bit there. Yet your professional answer is “let’s see what I can do”. And so you reverse engineer the product, and create the chat’s with that specific subject needed so the notifications are sent with that message.
But business said “it would be nice that we could also do this instead of what’s happening now”, slamming open the customization door. “The deal, and I asked like several times, was that you would never ask what you are asking to me right now! I even used the exact same scenario in my hypothetical example!”.
Still, your urge to please and the desire to keep your job keeps you at bay from starting any fight. And you are a problem solver. You spend time learning how it all works. Maybe there’s a way to use the product, that will give the expected result. Maybe we can have a single user that is the one that sends all the chat messages. Or any other workaround, reverse engineering that will bring us there. But no luck. Then you go deeper, you start taking a look in the code. Eventually you find that piece in the code that might solve what business is asking of you.
But this is not your code, you didn’t develop it, you are just the operations guy in this scenario. You might open an issue, even an MR/PR to that project. But that’s all you can do. You don’t dare to do that change manually in the installation you have in place as you don’t have the tests and others safety mechanisms the creators of the tool might have. It’s too risky, other things might break, and not to say we were already in a weak position from the security and maintenance issues above.
Stage 4 - Realization
And in the midst of the frustration for not being able to properly help, you realize something.
You already learned a lot by trying to reverse-engineer that tool so it would behave the different way your business expected to but that it was not originally thought off from that tool. And later you learnt the code, the innards. You’ve learned how this thing worked, all the workflows, all the use cases, you even see how you could have done some stuff better.
And you see that all of these time, could have been spent developing in-house the exact same tool. You realize it wasn’t that difficult after all. There was just some learning of the domain. You could have done something better. And you would have had the power to change it later. You could have been, now, in a situation where changes of business needs on a wimp could be solved, like the rest of things you’ve developed so far, without giving a second thought. And that it would have taken less time that what it took reaching the current state where moving forward is not possible anymore.
And so you promise to yourself that you will never fall for this again.
Dodging the bullet
That’s a grim experience, that might or might not have happened to me in a way that I might or might not deliberately brought upon myself just to learn from it. And I would like to explain how to avoid it. But for that, I need to explain some concepts, from my point of view, of course.
Integration levels
Explaining the experience above, I have been focusing on integrating a full external system, which is what gives the most amount of pain. But software integration happen at many levels. Just that not all of them are so painful. Actually, only the example above is the one that will bring you to a pulling-hair situation.
Here’s a broad graph to understand (from my point of view, of course), the different levels of integrating with external pieces of software:
First it’s important to notice that the Y axis contemplates the land of development NOT done in-house. Artifacts developed by others. Now we all use libraries in our projects, most languages come with a standard library that gives many features and it would be stupid of us to not take advantage of it. I consider the use of those libraries equal to development in-house actually.
The next level are libraries developed by others, not included in the standard library of languages, but they are things that are somewhat unit testable, like encryption algorithms, are also not reason for concern. As well as you don’t want to reinvent the wheel so you don’t want to write a library that deals with HTTP requests. All of that, technically an integration with a third party, could be considered just development in-house. There’s little justification to build your own web server if the available ones work the same for you. That’s what I call there “framework-libraries”, but could be “tool-libraries”.
Now, more concerning would be to actually integrate the whole framework. And allow the framework to dictate how you develop. I have already written many articles already and pasted the same diagrams over and over. I will repeat it here again. The main reason clean(/onion/ports an adapters) architecture exists is to decouple technical concerns from business concerns, and it’s something that frameworks are very good at doing (coupling), frameworks dictate you how you have to design and persist data. The proper way is to not let the frameworks dictate how you develop, but use them to your advantage, using the pieces that you need. If you let frameworks dictate how you develop, you will soon reach its limitations.
Until here we’ve seen things that could be considered in-house development. In order to develop your features, you will make use of all of that. So this integrations are somewhat not concerning (except relying fully on the framework).
Then we go on the land of things that are understood as integrations, and the ones that give you frustration. There’s any integration with a third party service. Now that one, if done properly, might not give you much frustration. Think about API calls to get some information. Using some patterns like ACL, facades and others to make sure you translate from the external domain to yours, making sure your domain is proper, isolated and not influenced by those external factors. This after all, it’s the exact same as integrating with other services that might be developed in-house, so you can try to use the same communication mechanisms you might have in place: webhooks, messages communication (event driven), BFF and others.
And finally, at the worst level there’s the integration with a full system. I am referring in this case to host in house the full system, developed by others. Something that you install in your premises. That has it’s own database and internal dynamics that can’t be edited or modified unless you install the next version. Something like this:
Entering the danger zone
Everything below the danger line falls into somewhat regular development. Integrating with a third party service is a quite common job. You might need to be somewhat extra careful, encapsulating the part that communicates with the external service in a place where it doesn’t leak to your main service’s domain, behind an anticorruption layer for example, but otherwise it’s like communicating or integrating with any other service.
You might manage to use the same communication patterns you use for services in-house. After all it’s just a relationship between two bounded contexts and so those kind of relationships explained in implementing domain driven design will apply (you know I like to back up my blog articles from quotes of may favorite books):
There are several DDD organizational and integration patterns, one of which commonly exists between any two Bounded Context. Each of the following definitions is largely quoted from Evans(the blue book)
[..]
Conformist: When two development teams have an upstream/downstream relationship in whcih the upstream team has no motivation to provide for the downstream team’s needs.
Anticorruption Layer: [..] As a downstream client, create an isolating layer to provide your system with functionality of the upstream system in terms of your own doamin model. This layer talks to the other system through its existing interface, requiring little or no modification to the other system.[..]
Open Host Service: Define a protocol that gives access to your subsystem as a set of services. [..]
Published Language: the translations between the models of two Bounded Contexts requires a common language.
I’ve just put the kinds that will commonly match a typical integration with a third party system from our service. So a common status for this kind of integration is something that look like this:
So that’s one thing, but the other thing, above the danger line, is when you need to take care too of that external service: when you need to install it in your premises, configure the database, the infrastructure, etc, and even after all of that you have zero control of what this service does, it’s like a black box to you.
That situation is objectively the worst situation you can be in as a developer. I say objectively because I am basing in this quote from accelerate (another of my favorite books) which reached some conclusions from real data produced by their surveys.
We examined a large number of types of systems to discover if there was a correlation between the type of system and team performance. […]
We discovered that low performers were more likely to say that the software they were building-or the set of services they had to interact with-was custom software developed by another company (e.q.g, an outsourcing partner). Low performers were also more likely to be working on mainframe systems. Interestingly, having to integrate against a mainframe systems was not significantly correlated with performance.
Now, they went on to say that the important point is not really the type of the system but that whatever you do with architecture, whatever your set up, monolith, legacy mainframe etc, the important is to focus on deployability and testability, that what gives performance:
Although in most cases the type of system you are building is not important in terms of achieving high performance, two architectural characteristics are. Those who agreed with the following statements were more likely to be in the high-performing group:
We can do most of our testing without requiring an integrated environment
We can and do deploy or release our application independently of other application/services it depends on
Maybe the last statement can be true, but quite hard to agree with the first statement in the case to have maintain software developed by another company.
Informed decision
Now that we have more context, it’s time to review the decision. What should we take into account before jumping straight to the excitement phase?
In general you should be very weary of using external tools for anything. It’s not only that the state they will leave you in is very uncomfortable, it also kills any possible evolution, iteration, change, any kind of development there.
So the first thing that you have to make sure, absolutely sure, is that this will never ever be the case. A confirmation of that from business is insufficient. They will be the first ones to change their mind. Your judgement will work better. And for that you have to take into account some things about the problem at hand. Let’s review the graph:
The first thing to take into account is how far away from the core business is this feature. The further away, the less concerning it is to do whatever solution there. Forks balanced on top of wooden toothpicks can become acceptable solutions in that case.
The problem here is that many fail to assess how far away from the core business something is. For example, one can think that users management is something generic. After all, all systems have to deal with users right? So why not using a third part tool? Well users is commonly a thing that requires customization specific to your system, so unless that third party tool allows for some way of customization, you’re in deep trouble.
Things that are for real generic, they are already commonly solved by a third party tool, and the integration is already a solved problem: logging (exporting logs and then using a tool like Kibana or Grafana for example), notifications (emails, or push notifications with Firebase or any other)… any technical “horizontal” concern (something that can be useful in all the services). And in those cases you really will need to go to the “full external system” kind of integration. All of those offer APIs and to avoid the painful scenario I’ve described above.
So that leaves us with things that are not real generic. Which is basically your domain. Even if it’s not core domain. In that case, you should avoid at all costs integrating with a third party tool. Nobody will design the domain you want because they will not have your exact same needs. Some might try to make something generic out of it. Continuing with the example of users, they might add meaningless “attributes” and “values” to attach to the user. Still those attributes will not work the exact way you need. If somebody goes forward with that idea, they will eventually solve the world, making the universal app, that is useful for nobody. It will be the nth framework. Frameworks should just be used to solve technical concerns, not business concerns.
Once you have identified what kind of feature is the one you’re dealing with, then you can choose what kind of integration, if any, is the one that your willing to spend time on.
So.. shall we develop it in house?
My personal rule of thumb is basically the following.
For domain kind of features (remember that many fall in that category and sometimes it looks they don’t. Even things that are actually technical concerns for one can be domain for others):
Close to core domain: definitely do NOT depend on external tools
Tangential - small (things that one can thing are generic, like users). If they are small, my experience so far showed me, that if you have a good development platform, the developing of brand new tangential small domains is normally faster or equal at the time that it takes to just learn whatever third party tool you would like to integrate.
Tangential - big: if we’re speaking about more than 4 possible aggregate roots (close to 10) with complicated business rules - like accounting - then a third party tool is the way to go. Of course something that you can easily integrate with, always trying to avoid the “deploy the full tool in house” scenario
Far away from core. Like I said, whatever thing is OK. Ideally, you should NOT even be the one to maintain that tool. Probably your company uses a CRM, or ERP systems, even the landing page might be just some Wordpress that you install on a computer that’s balancing on a circus ball on top of a wooden board that is on top of an elephant trunk, who’s floating on top of a water mattress in an Olympic swimming pool.
For technical kind of features it’s all a balance of available resources and their expertise. There’s little excuse that justifies customizing stuff in this area (would you build your own web server, container image builder, or logger?). Only if there’s an ultra specific need you might do any of those things. Here, using a generic tool will save you lots of time.
Subscribe to my newsletter
Read articles from Francesc Travesa directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
