The CAP theorem (Consistency, Availability, Partition tolerance) is a fundamental concept in distributed systems that describes the trade-offs among three key system properties. It states that a distributed system can only guarantee at most two out of the three properties simultaneously.
Here’s a detailed explanation of the CAP theorem with an example and a diagram:
The Three Properties
- Consistency (C):
- Every read receives the most recent write or an error.
- This ensures that all nodes in the distributed system return the same data at any time.
- Example: In a banking system, if you transfer money from one account to another, any subsequent read operation should reflect the updated balances immediately.
- Availability (A):
- Every request (read/write) receives a response, regardless of whether it is the latest version.
- This means the system remains operational even if some nodes fail.
- Example: If one node in a system goes down, the system should still handle requests using the remaining nodes.
- Partition Tolerance (P):
- The system continues to operate despite network partitions (failures in communication between nodes).
- This ensures the system is resilient to network issues and does not completely fail.
- Example: If a network link between two nodes is broken, both sides should still function independently.
CAP Theorem Trade-Offs
According to CAP theorem, a distributed system can provide at most two of these properties simultaneously in the presence of a network partition:
- CP (Consistency + Partition Tolerance):
- Ensures data consistency across all nodes even during a partition, but some nodes may not be available.
- Example: HBase or MongoDB (with strict consistency).
- AP (Availability + Partition Tolerance):
- Ensures the system is always available even during a partition, but data consistency may be compromised.
- Example: Cassandra or DynamoDB.
- CA (Consistency + Availability):
- Ensures consistency and availability as long as there are no network partitions, but sacrifices partition tolerance.
- Example: Traditional RDBMS systems like MySQL or PostgreSQL in a single-node setup.
Real-World Example
Scenario: Online Shopping System
- Imagine a distributed system for an e-commerce platform.
- A user adds a product to their cart. This operation is replicated across multiple servers to ensure availability and fault tolerance.
- Consistency:
- If the system is consistent, all servers must have the same updated cart data before responding to the user.
- During a network partition, the system may block updates until all servers synchronize, reducing availability.
- Availability:
- The system remains available even if some servers are disconnected.
- However, the cart data may differ between servers (inconsistency) during the partition.
- Partition Tolerance:
- During a partition, the system must tolerate the failure and continue working.
- It may prioritize either availability (AP) or consistency (CP), but not both.
Diagram of CAP Theorem
Here is a conceptual representation of the CAP theorem:
Partition Tolerance
/ \
/ \
/ \
Consistency Availability
In practice:
- CA systems work well without partitions but fail when one occurs.
- CP systems ensure correctness but may reject requests during a partition.
- AP systems prioritize availability but may return outdated or inconsistent data during a partition.
Detailed Example with CAP Trade-Offs
Banking System Example:
- Consistency (C): All branches of the bank show the same account balance at all times.
- Availability (A): ATMs and online banking always provide service.
- Partition Tolerance (P): The system can handle communication breakdowns between branches.
Trade-Off:
- During a network partition:
- CP: The system halts all transactions (availability is compromised) until partitions resolve to ensure consistent account balances.
- AP: Transactions continue (availability is maintained), but balances may be inconsistent across branches.
- CA: Without partition tolerance, the system cannot guarantee service during communication failures.