What is a Distributed System (Computing)?

A distributed system (computing) is a group of programs that run on multiple computers (called nodes), working together to accomplish a shared goal.

These computers communicate and stay in sync using a common network. The nodes can be separate physical machines, or even separate software processes running on the same or different machines. This setup is often called distributed computing or distributed databases. The main goal of such systems is to avoid having a single point of failure and to distribute the workload, making the system more reliable and efficient.

Key Characteristics

Decentralization: No single node controls the entire system; nodes operate autonomously.
Concurrency: Multiple nodes execute tasks simultaneously.
Scalability: Systems can grow by adding more nodes to handle increased load.
Fault Tolerance: The system can continue functioning even if some nodes fail.
Heterogeneity: Nodes may have different hardware, software, or configurations.
Resource sharing: A distributed system can share hardware, software, or data

Difference between Centralized System and a Distributed System

In a centralized computing system, all processing, data storage, and control reside in a single computer or a tightly coupled group of servers at one location.

In contrast, a distributed computing system consists of multiple independent computers (nodes) that collaborate over a network to function as a cohesive system.

The primary distinction lies in their communication patterns: in centralized systems, all nodes (e.g., clients or peripherals) interact directly with the central node, which holds the system’s state (data and logic). This can cause network congestion and delays under heavy load due to the central node’s limited capacity. Conversely, distributed systems enable nodes to communicate with each other, distributing tasks and data to avoid bottlenecks. A centralized system is vulnerable to a single point of failure, where the failure of the central node halts the entire system. Distributed systems, however, are designed for resilience, continuing to operate even if some nodes fail, as tasks and data are spread across multiple nodes.

Difference between Distributed Computing and Microservices

Distributed systems and microservices are not the same, though they are related concepts and often overlap in practice.

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs.

Distributed systems are a general paradigm for coordinating multiple computers, while microservices are a specific way to design applications by breaking them into small, independent services.

Think of distributed systems as the infrastructure or foundation, and microservices as a software design pattern that can (and often does) leverage that foundation.

Key difference:

Distributed system = "How" things are spread across machines.
Microservices = "How" software is designed and organized.

You can have a monolith on a distributed system. And you can build microservices that all run on a single machine during development.

Aspect	Distributed Systems	Microservices
Scope	Broad: Any system with multiple nodes	Narrow: Application architecture
Focus	Coordination across machines	Modular, independent services
Granularity	Can involve entire systems (e.g., DBs)	Fine-grained services within an app
Examples	Cassandra, Bitcoin, Apache Spark	Netflix’s user service, payment service
Implementation	Hardware + software coordination	Software design, often on distributed infra
Dependency	General concept, not tied to microservices	Often relies on distributed systems

Core Concepts

Communication: Nodes exchange messages via networks (e.g., TCP/IP, RPC, or message queues). Protocols like HTTP, gRPC, or MQTT are common.
Coordination: Nodes synchronize actions using techniques like distributed locks, consensus algorithms (e.g., Paxos, Raft), or leader election.

💡

A distributed lock ensures that if one actor (node, service instance, etc.) changes a shared resource - like a database record, file, or external service - no other node can step in until the first node is finished.
Data Distribution: Data is partitioned (sharding) or replicated across nodes to ensure availability and performance.

💡

Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system.
Consistency: Systems balance consistency, availability, and partition tolerance (CAP theorem):
- Consistency: All nodes see the same data at the same time.
- Availability: Every request gets a response, even if some nodes are down.
- Partition Tolerance: The system functions despite network failures.
- No system can guarantee all three simultaneously; trade-offs are made (e.g., eventual consistency in NoSQL databases).
Time and Ordering: Without a global clock, systems use logical clocks (e.g., Lamport timestamps) or vector clocks to order events.

💡

In a distributed system there is no global clock as a result different activity are possible to synchronize. if the local clocks of distributed system are set to a common time it will drastically change.

Types of Distributed Systems

Client-Server: A central server handles requests from clients (e.g., web applications). A truly distributed client-server setup will have multiple server nodes to distribute client connections. Most modern client-server architectures are clients that connect to an encapsulated distributed system on the server.
Peer-to-Peer (P2P): Nodes act as both clients and servers (e.g., BitTorrent). Peer-to-peer systems have the benefit of extreme redundancy. When a peer-to-peer node is initialized and brought online, it discovers and connects to other peers and synchronizes its local state with the state from the greater system. This feature means the failure of one node on a peer-to-peer system won’t disrupt any of the other nodes. It also means that a peer-to-peer system will persist.
Distributed Databases: Data is stored across multiple nodes (e.g., Cassandra, DynamoDB). This means that rather than putting all data on one server or on one computer, data is placed on multiple servers or in a cluster of computers consisting of individual nodes.
Distributed File Systems: Files are stored and accessed across nodes (e.g., HDFS, Google File System). A distributed file system (DFS) is a file system that spans across multiple file servers or multiple locations, such as file servers that are situated in different physical places. Files are accessible just as if they were stored locally, from any device and from anywhere on the network.

Examples

Web Services: Netflix uses microservices on AWS, distributing tasks like streaming, recommendations, and billing.
Databases: MongoDB shards data across nodes for scalability; Raft ensures consensus in etcd.
Blockchain: Decentralized ledgers like Bitcoin use distributed consensus for trustless transactions.
Big Data: Apache Kafka distributes event streams for real-time processing.

Many modern applications utilize distributed systems. High traffic web and mobile applications are distributed systems. Users connect in a client-server manner, where the client is a web browser or a mobile application. The server is then its own distributed system. Modern web servers follow a multi-tier system pattern. A load balancer is used to delegate requests to many server logic nodes that communicate over message queue systems.

Challenges of Distributed Systems

Network Failures: Delays, packet loss, or partitions can disrupt communication.
Partial Failures: Some nodes may fail while others continue, complicating coordination.
Consistency vs. Performance: Strong consistency slows systems; eventual consistency risks stale data.
Security: Nodes must authenticate and secure communications to prevent attacks.
Debugging: Distributed systems are harder to monitor and debug due to their complexity.

Distributed systems are widely adopted and used in most modern software experiences. Social media apps, video streaming services, e-commerce sites, and more are all powered by distributed systems. Centralized systems naturally evolve into distributed systems to handle scaling. The use of microservices is a popular and widely adopted pattern for building a distributed system. However, distributed systems comes with a cost of careful design to handle complexity and failure modes.