What is a Distributed System (Computing)?

Maxat AkbanovMaxat Akbanov
7 min read

A distributed system (computing) is a group of programs that run on multiple computers (called nodes), working together to accomplish a shared goal.

These computers communicate and stay in sync using a common network. The nodes can be separate physical machines, or even separate software processes running on the same or different machines. This setup is often called distributed computing or distributed databases. The main goal of such systems is to avoid having a single point of failure and to distribute the workload, making the system more reliable and efficient.

Key Characteristics

  1. Decentralization: No single node controls the entire system; nodes operate autonomously.

  2. Concurrency: Multiple nodes execute tasks simultaneously.

  3. Scalability: Systems can grow by adding more nodes to handle increased load.

  4. Fault Tolerance: The system can continue functioning even if some nodes fail.

  5. Heterogeneity: Nodes may have different hardware, software, or configurations.

  6. Resource sharing: A distributed system can share hardware, software, or data

Difference between Centralized System and a Distributed System

In a centralized computing system, all processing, data storage, and control reside in a single computer or a tightly coupled group of servers at one location.

In contrast, a distributed computing system consists of multiple independent computers (nodes) that collaborate over a network to function as a cohesive system.

The primary distinction lies in their communication patterns: in centralized systems, all nodes (e.g., clients or peripherals) interact directly with the central node, which holds the system’s state (data and logic). This can cause network congestion and delays under heavy load due to the central node’s limited capacity. Conversely, distributed systems enable nodes to communicate with each other, distributing tasks and data to avoid bottlenecks. A centralized system is vulnerable to a single point of failure, where the failure of the central node halts the entire system. Distributed systems, however, are designed for resilience, continuing to operate even if some nodes fail, as tasks and data are spread across multiple nodes.

Difference between Distributed Computing and Microservices

Distributed systems and microservices are not the same, though they are related concepts and often overlap in practice.

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs.

Distributed systems are a general paradigm for coordinating multiple computers, while microservices are a specific way to design applications by breaking them into small, independent services.

Think of distributed systems as the infrastructure or foundation, and microservices as a software design pattern that can (and often does) leverage that foundation.

Key difference:

  • Distributed system = "How" things are spread across machines.

  • Microservices = "How" software is designed and organized.

You can have a monolith on a distributed system. And you can build microservices that all run on a single machine during development.

AspectDistributed SystemsMicroservices
ScopeBroad: Any system with multiple nodesNarrow: Application architecture
FocusCoordination across machinesModular, independent services
GranularityCan involve entire systems (e.g., DBs)Fine-grained services within an app
ExamplesCassandra, Bitcoin, Apache SparkNetflix’s user service, payment service
ImplementationHardware + software coordinationSoftware design, often on distributed infra
DependencyGeneral concept, not tied to microservicesOften relies on distributed systems

Core Concepts

  • Communication: Nodes exchange messages via networks (e.g., TCP/IP, RPC, or message queues). Protocols like HTTP, gRPC, or MQTT are common.

  • Coordination: Nodes synchronize actions using techniques like distributed locks, consensus algorithms (e.g., Paxos, Raft), or leader election.

    💡
    A distributed lock ensures that if one actor (node, service instance, etc.) changes a shared resource - like a database record, file, or external service - no other node can step in until the first node is finished.
  • Data Distribution: Data is partitioned (sharding) or replicated across nodes to ensure availability and performance.

    💡
    Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system.
  • Consistency: Systems balance consistency, availability, and partition tolerance (CAP theorem):

    • Consistency: All nodes see the same data at the same time.

    • Availability: Every request gets a response, even if some nodes are down.

    • Partition Tolerance: The system functions despite network failures.

    • No system can guarantee all three simultaneously; trade-offs are made (e.g., eventual consistency in NoSQL databases).

  • Time and Ordering: Without a global clock, systems use logical clocks (e.g., Lamport timestamps) or vector clocks to order events.

    💡
    In a distributed system there is no global clock as a result different activity are possible to synchronize. if the local clocks of distributed system are set to a common time it will drastically change.

Types of Distributed Systems

  1. Client-Server: A central server handles requests from clients (e.g., web applications). A truly distributed client-server setup will have multiple server nodes to distribute client connections. Most modern client-server architectures are clients that connect to an encapsulated distributed system on the server.

  2. Peer-to-Peer (P2P): Nodes act as both clients and servers (e.g., BitTorrent). Peer-to-peer systems have the benefit of extreme redundancy. When a peer-to-peer node is initialized and brought online, it discovers and connects to other peers and synchronizes its local state with the state from the greater system. This feature means the failure of one node on a peer-to-peer system won’t disrupt any of the other nodes. It also means that a peer-to-peer system will persist.

  3. Distributed Databases: Data is stored across multiple nodes (e.g., Cassandra, DynamoDB). This means that rather than putting all data on one server or on one computer, data is placed on multiple servers or in a cluster of computers consisting of individual nodes.

  4. Distributed File Systems: Files are stored and accessed across nodes (e.g., HDFS, Google File System). A distributed file system (DFS) is a file system that spans across multiple file servers or multiple locations, such as file servers that are situated in different physical places. Files are accessible just as if they were stored locally, from any device and from anywhere on the network.

Examples

  • Web Services: Netflix uses microservices on AWS, distributing tasks like streaming, recommendations, and billing.

  • Databases: MongoDB shards data across nodes for scalability; Raft ensures consensus in etcd.

  • Blockchain: Decentralized ledgers like Bitcoin use distributed consensus for trustless transactions.

  • Big Data: Apache Kafka distributes event streams for real-time processing.

Many modern applications utilize distributed systems. High traffic web and mobile applications are distributed systems. Users connect in a client-server manner, where the client is a web browser or a mobile application. The server is then its own distributed system. Modern web servers follow a multi-tier system pattern. A load balancer is used to delegate requests to many server logic nodes that communicate over message queue systems.

Challenges of Distributed Systems

  1. Network Failures: Delays, packet loss, or partitions can disrupt communication.

  2. Partial Failures: Some nodes may fail while others continue, complicating coordination.

  3. Consistency vs. Performance: Strong consistency slows systems; eventual consistency risks stale data.

  4. Security: Nodes must authenticate and secure communications to prevent attacks.

  5. Debugging: Distributed systems are harder to monitor and debug due to their complexity.

Distributed systems are widely adopted and used in most modern software experiences. Social media apps, video streaming services, e-commerce sites, and more are all powered by distributed systems. Centralized systems naturally evolve into distributed systems to handle scaling. The use of microservices is a popular and widely adopted pattern for building a distributed system. However, distributed systems comes with a cost of careful design to handle complexity and failure modes.

0
Subscribe to my newsletter

Read articles from Maxat Akbanov directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Maxat Akbanov
Maxat Akbanov

Hey, I'm a postgraduate in Cyber Security with practical experience in Software Engineering and DevOps Operations. The top player on TryHackMe platform, multilingual speaker (Kazakh, Russian, English, Spanish, and Turkish), curios person, bookworm, geek, sports lover, and just a good guy to speak with!