Application Monitoring: Black-Box vs White-Box

Application Monitoring: Black-Box vs White-Box
Application monitoring, or just monitoring, refers to detecting if anything is failing in your application. Failures include performance degradation, capacity planning, security anomalies, and resource usage.
These failures could have impacts, such as impacting directly your users, or 3rd parties that are reliant on you.
So, with monitoring, we want to ensure that our system functions according to its expected behavior.
In this spirit, there are two complementary ways that we can perform this monitoring: Black-box and White-box.
Black-box monitoring
With black-box monitoring, we look at the system you want to monitor from the perspective of the user or, broadly, putting yourself in the shoes of an external perspective.
You don’t care about the internal implementation details, you are just interested in what are the observable outputs and behavior the system gives you.
Key characteristics
Focus on external behavior: As stated previously, our intention is to monitor the system from the point of view of an external observer, like a user. So we want to look for things like how fast you can get a response, the availability of the services, and if the services are behaving as expected, meaning, that they deliver the promised set of features. This might include doing things a user would do.
No internal access: Every conclusion is based on metrics and alerts that we take exclusively based on external observations, without any need to see the internal implementation details.
This black-box monitoring has the advantage of being user-centric, which means that it allows us to identify and fix any issues that might arise in the user journeys. Simplicity is also something that is a huge advantage of this approach since we don’t have to instrument or access any of the internal systems.
Even though this black-box approach is simple, it might have some drawbacks. As mentioned earlier, one of the advantages of this approach is its simplicity due to the fact that we don’t have to be very granular with our instrumentation level. But this also can be a problem because it makes it harder to try to understand the root cause of something when things go out of normal.
Another disadvantage of this approach is the fact that it alerts us whenever we have problems that are already impacting customers. This means that we are more reactive in our actions, and not as proactive as we should be.
Now let’s have a look at this concept in practice. Assume that we are in the context of a streaming application. If you were taking this black-box approach, what you only care about is if the video you are loading on your screen is loading fast. You don’t care how the streaming service has implemented it, you just care that the video is being loaded fast.
So, in summary, black-box monitoring is used to tell you something is wrong and might be impacting others.
White-box monitoring
In this approach, you can now monitor your system in a transparent way. This means that you can directly instrument your code, or gather metrics from the infrastructure usage. You can be much more granular on what you can consider as the source for the monitoring data because you can see the inner workings of your system.
Key characteristics
Focus on internal behavior: In contrast with the black-box monitoring, here we are focused on the inner workings of the system. Collecting logs, metrics, and traces is our priority. The sources for such data can vary from infrastructure resources, like CPU and memory, to more application level, like the number of HTTP requests processed.
Developer-centric: Often these metrics are used by developers, and the operations team, to understand the health and performance of the system.
Even though this white-box approach is a bit more cumbersome, because the amount of things we might need to instrument is a bit higher than in the case of black-box monitoring, and generates a much bigger data volume, the white-box approach still has some advantages.
First, let’s consider the fact that we are allowed to go deeper in our analysis. This goal is easier to achieve because we are much more granular when it comes to monitoring/observing.
Secondly, based on the data that we can analyze, we can try to understand if there might exist problems for users. This means that approaches are more predictive than proactive.
An example of using this approach is monitoring resources like CPU, and RAM, or even checking how performant your database queries are. Tracing is another example since it allows you to check the flow of a request inside your system.
In summary, white-box monitoring is used to tell you why something is not ok and helps you to understand the root causes of such failures.
Which approach to use?
In my opinion, the discussion should not be about which one to use, but rather how we can create some symbiosis between the two.
These two approaches should be seen as complimentary in the sense that, when used together, they give you a deeper understanding of the state and performance of your system.
As stated previously, one gives us the notion that something is not going right, and the other one lets us dig deeper and understand why that’s happening.
Subscribe to my newsletter
Read articles from Rafael Câmara directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
