Software Architecture - The Hard Parts [Chapter 10] Distributed Data Access
Introduction
In this chapter we’re going to be diving into the different ways services can read data they do not own, in monolithic systems using a single database, developers don’t give a second thought to reading database tables but when data is broken into separate databases owned by distinct services, data access for read operations become complex.
Data Access Patterns
Below are some of the most commonly used data access patterns so that a service can access data it doesn’t own.
Interservice Communication Pattern
This is by far the most common pattern for accessing data, if one service needs data it doesn’t have direct access to it simply asks the owning service for it by using some sort of remote access protocol.
Pros | Cons |
Simplicity | Slower performance due to latency, especially if a user’s request is dependent on having this style of communication in the business request. Latencies mainly include network, security and data latencies |
No data volume issues (direct service calls) | Services are tightly coupled together, one service must rely on the other service being available so that it can fulfill its needs. The absence of the service that has the data directly impacts the calling service. Also they must scale together since they are tightly coupled to meet high demand. |
Column Schema Replication Pattern
Here, columns are replicated across tables therefore replicating the data and making it available to other bounded contexts.
Pros | Cons |
The service requiring read access has immediate access to the data which increases performance, fault tolerance and scalability. | Data synchronization and consistency making sure both columns are synched together through possible asynchronous communication. |
Very useful in some data aggregation, reporting scenarios. | Very hard to govern data ownership since data is replicated to other services they can update it even though they don’t officially own that data which can cause data consistency issues |
Replicated Caching Pattern
While caching is a well known pattern for increasing performance and responsiveness, it can also be an effective way for distributed data access and sharing. Leveraging in memory caching so that data needed by other services is made available to each service without them having to ask for it.
There exist different models of caching between services but the basic one is the in memory caching where each service has its own cache separate from other services. This isn’t really that useful for sharing data between services because of the lack of synchronization between the services.
Another caching model is distributed caching where data is held externally in a caching server, services make requests to that external server to retrieve or update shared cache. However its not that useful for data access due to the following:
No different that the inter service communication pattern (still tightly coupled to a caching server instead of a service)
Different services can update data breaking the bounded context regarding data ownership which can causes inconsistencies between caches and the owning database.
Latency issues since the way to access the cache is through network calls as described earlier.
Another model is replicated caching where each service has its own in memory data that is kept in sync between the services allowing the same data to be shared across multiple services. Any update made to the cache is asynchronously propagated to the other caches in the services.
From the caching models mentioned replicated caching is the most suitable for addressing distributed data access.
Pros | Cons |
Services have their own in memory replica so they no longer need to make external calls to other services for data. | Service dependency with regard to the cache data and startup timing. So the service that has the replicated cache must run after the service owning the cache starts. IF that service is unavailable the other service has to go in a waiting state till the cache gets filled up. Only a startup problem though. |
Updates made by the cache owning service will reflect to all other services containing the replica cache. | If the volumes of data are too high the feasibility of this pattern diminishes quickly. Also every service instance has its own replicated cache so if 5 instances are required that’s the cache size multiplied by 5. Careful analyzing must take place here to not hog all of the memory resources. |
Greatly responsive, fault tolerant and scalable | Very hard to keep the caches fully in sync if the rate of change of the data is too high. The pattern is more suited for relatively static data (data that doesn’t change that often) |
Can scale independently. | Configuration and setup management where its not that straightforward configuring this replicating mechanism together. |
Data Domain Pattern
In a previous chapter, a way to resolve joint ownership was to make both services share the database ownership. This same pattern can be used for data access too.
A solution is to create a data domain combining multiple tables into a shared schema accessible to both services needing the data which makes for a broader bounded context.
Pros | Cons |
Services completely decoupled from each other | Sharing data is generally discouraged in distributed architectures |
Very high data consistency and integrity (no need for replication, synchronization, etc) | Since multiple services have direct access to schema, any schema changes impact directly these services and they will have to change accordingly. |
No additional contracts needed to transfer data between services. | Potential pitfall to security issues concerning data access. Where one service has complete access to the data in that domain. |
Summary
In this chapter we went through some of the popular data access patterns where one service basically needs data from another. The trade offs of each one and the answer to which one should I pick is a big “it depends” as always 🤣
In the next chapter we’re going to be talking about some famous distributed architecture sagas! stay tuned
Subscribe to my newsletter
Read articles from Amr Elhewy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by