Anant Jain

CAP Twelve Years Later: How the “Rules” Have Changed

Paper Review

Abstract

The CAP theorem asserts that any networked shared-data system can have only two of three desirable properties. However, by explicitly handling partitions, designers can optimize consistency and availability, thereby achieving some trade- off of all three.

What is the CAP Theorem?

The CAP theorem, proposed by Eric Brewer in 2000, is a fundamental principle in distributed systems. It states that a distributed database system can only guarantee two of the following three properties simultaneously:

  • Consistency (C): All nodes in the system see the same data at the same time. Every read returns the most recent write or an error.
  • Availability (A): Every request receives a response (success or failure), even if some nodes are down. The system remains operational.
  • Partition Tolerance (P): The system continues to function even when network failures cause communication breakdowns between nodes.

In this 2012 paper, Brewer revisits the CAP theorem twelve years after its introduction, clarifying common misconceptions and providing practical guidance for building distributed systems.

Key Insights

The "2 of 3" Misconception

The traditional understanding that you must choose "2 of 3" properties has always been misleading. Here's why:

  • Partitions are rare: Network failures don't happen constantly. When the network is functioning normally, you can have both consistency and availability. The trade-off between C and A only matters during actual partition events.

  • Fine-grained choices: You don't make a single choice for your entire system. Different subsystems can make different decisions. Even within the same operation, the choice can vary based on the specific data or user involved.

  • Properties are continuous, not binary: Availability isn't just "on" or "off" - it ranges from 0% to 100%. Similarly, there are many levels of consistency, not just "consistent" or "inconsistent." Even partitions have nuances.

The Modern CAP Approach

Instead of rigidly choosing between consistency and availability, the modern goal is to maximize both for your specific application by:

  1. Detecting when partitions occur
  2. Entering a partition mode that may limit some operations
  3. Having a recovery plan to restore consistency and fix any problems that occurred during the partition

This is particularly important for wide-area systems (systems spanning multiple geographic locations), where network partitions are more likely to occur and must be tolerated.

CAP vs. ACID: Understanding the Difference

It's important to understand how CAP relates to traditional database ACID properties (Atomicity, Consistency, Isolation, Durability):

  • Consistency in ACID vs. CAP: In ACID, consistency means that a transaction preserves all database rules (like unique keys or foreign key constraints). In CAP, consistency only refers to "single-copy consistency" - meaning all nodes see the same data. This is a much narrower definition. When partitions occur, ACID consistency must also be restored during recovery.

  • Isolation matters most: Isolation (the "I" in ACID) is at the core of CAP. If your system requires strict ACID isolation (serializability), it can only operate on one side of a partition. This is because serializability requires nodes to communicate with each other, which is impossible during a partition. However, weaker forms of isolation can work across partitions by using compensation strategies during recovery.

  • Strategy: Running ACID transactions independently on each side of a partition makes recovery easier. You can then use compensating transactions to fix inconsistencies once the partition heals.

Understanding Partitions in Practice

Latency and partitions are deeply related: A partition isn't just a total network failure. Pragmatically, a partition is any communication delay that exceeds your acceptable timeout. This practical definition has important consequences:

  • No global notion of partitions: Some nodes might detect a partition while others don't, depending on which connections are affected.
  • Nodes can enter partition mode: When a node detects a partition, it can switch to a special operating mode - this is key to optimizing both C and A.

The partition decision: When a partition occurs, you face a choice: cancel the operation (decreasing availability) or proceed with the operation (risking inconsistency).

Real-World Examples and Strategies

Geographic considerations:

  • Within a single datacenter, network partitions are less likely but still possible
  • Across wide-area networks (multiple datacenters), partitions are more common
  • Google uses Paxos (a consensus algorithm) across geographic regions to maintain global consistency in systems like Chubby (distributed locking) and Megastore (distributed storage)

The hidden cost of forfeiting consistency: When you choose availability over consistency, you need to deeply understand your system's invariants (the rules that should never be violated, like "account balance cannot be negative" or "usernames must be unique"). Managing these invariants during and after partitions is challenging - similar to how concurrent programming is harder than sequential programming.

Scope of consistency: Within some boundary (like a datacenter), state might be consistent, but outside that boundary, consistency guarantees may not hold. This allows systems to optimize for different requirements in different contexts.

Managing Partitions: A Three-Step Strategy

The key to handling CAP trade-offs is to manage partitions explicitly:

  1. Detect the start of a partition: Monitor network health and latency to identify when a partition occurs
  2. Enter partition mode: Limit or modify certain operations based on your system's invariants
  3. Initiate partition recovery: When communication is restored, reconcile state and fix inconsistencies

Examples of Partition-Aware Systems

Quorum-based systems: One side of the partition has a quorum (majority) and can proceed with operations, while the other side cannot. This is a form of choosing consistency over availability for the minority side.

Disconnected operation systems: Mobile apps that work offline are a perfect example - they clearly operate in "partition mode" when disconnected and sync when reconnected.

Deciding Which Operations to Allow

Which operations you allow during partition mode depends on your system's invariants (business rules):

Example 1 - Unique keys:

  • Invariant: All keys in a table must be unique
  • Strategy: Allow duplicate keys during partition (prioritizing availability)
  • Recovery: Duplicate keys are easy to detect and merge after recovery

Example 2 - External events:

  • Example: Charging a credit card
  • Strategy: Record the intent during partition but don't execute it
  • Recovery: Execute the recorded intents after partition recovery

The design challenge: Create a matrix of all operations versus all invariants. For each combination, decide whether to:

  • Prohibit the operation (choose consistency)
  • Delay the operation until after recovery
  • Modify the operation to reduce risk
  • Allow the operation and fix conflicts later (choose availability)

User experience consideration: During partition mode, you face a UI challenge - how do you communicate that tasks are in progress but not yet complete? This is similar to how mobile apps show "pending" status for actions taken offline.

Partition Recovery: The Technical Details

Tracking Operations with Version Vectors

Version vectors are the best way to track the history of operations on both sides of a partition. They capture the causal dependencies among operations - basically, which operations happened before others, and which happened concurrently (at the same time on different sides).

With version vector history from both sides, the system can easily determine:

  • Which operations already have a known order (happened sequentially)
  • Which operations executed concurrently (happened at the same time on different sides and may conflict)

Two Hard Problems to Solve

During recovery, designers must address:

  1. State convergence: Make the state on both sides consistent again
  2. Compensation: Fix the mistakes or invariant violations that occurred during partition mode

Strategies for Automatic Conflict Resolution

Google Docs example: Text editing in Google Docs can automatically merge conflicts because it constrains operations to simple actions (applying styles, adding or deleting text). By limiting what users can do, the system can automatically resolve conflicts. This principle applies broadly: if you constrain operations during partitions, you can often merge state automatically during recovery.

CRDTs: Conflict-Free Replicated Data Types

Marc Shapiro and colleagues at INRIA developed CRDTs - special data structures that are guaranteed to converge to the same state after a partition. They work by ensuring operations are commutative (the order doesn't matter).

How CRDTs work: There are two approaches:

  1. Ensure all operations are commutative (A then B produces the same result as B then A)
  2. Represent values on a lattice structure where operations only move in one direction (monotonically increasing)

Example - CRDT sets: To create a set that can both add and delete items while remaining partition-tolerant:

  • Maintain two internal sets: one for additions, one for deletions
  • The actual set membership is the difference between them
  • Both sides can add and delete independently, and they'll converge when merged

Real-World Case Study: ATM Networks

ATM design provides an excellent real-world example of managing partitions and compensating for invariant violations.

The ATM Partition Problem

When an ATM loses connection to the central banking system, it faces a partition. The invariant at risk: "account balance cannot go negative beyond the overdraft limit."

Modern ATM strategy: Instead of becoming completely unavailable, ATMs enter a "stand-in mode" (partition mode) with intelligent limits:

  • Allow withdrawals up to a limit (e.g., $200)
  • Below this limit: withdrawals work normally
  • At the limit: deny further withdrawals
  • This bounds the risk while maintaining availability for most users

Key insight: The banking system doesn't rely on perfect consistency for correctness. Instead, it depends on auditing and compensation - the ability to detect and fix problems after they occur.

Compensation Strategies

When invariants are violated during partition, there are several ways to fix them:

  1. Simple approaches: "Last writer wins" - pick one update and ignore the other (loses data but simple)
  2. Smart merging: Combine operations intelligently when possible
  3. Human escalation: Have humans resolve conflicts that can't be automatically resolved

Compensating Transactions

Compensating transactions are a formal technique for handling long-lived transactions that might span partitions:

  • Break a large transaction into a saga - a series of smaller subtransactions
  • Each subtransaction commits immediately (unlike traditional transactions that hold locks)
  • If you need to "undo" the overall transaction later, issue compensating transactions that reverse the effects of each committed subtransaction
  • This is similar to how accounting works: instead of erasing a mistake, you add a correcting entry

The core principle: Designers must create compensating operations that both restore system invariants and correct any externalized mistakes (like money already dispensed from an ATM).

Looking Forward: The Future of CAP

As newer techniques like version vectors and CRDTs become available in developer-friendly frameworks, optimizing for both consistency and availability should become more widespread.

Important caveat: Unlike traditional ACID transactions (which provide automatic guarantees), this approach requires more thoughtful design:

  • You must understand your system's specific invariants
  • You need to carefully consider which operations can run during partitions
  • Solutions are not one-size-fits-all - they depend heavily on your application's specific requirements

The key takeaway: CAP is not about making a single binary choice. It's about understanding the trade-offs, planning for partitions explicitly, and designing recovery strategies that make sense for your application.

Key Takeaways

  1. The "2 of 3" rule is oversimplified: You don't permanently choose 2 properties. The trade-off between C and A only matters during actual partition events.

  2. Partitions are manageable: Detect them, enter partition mode, and have a recovery plan.

  3. Understand your invariants: Know which business rules must never be violated and which can be temporarily relaxed.

  4. Design for recovery: Use techniques like version vectors, CRDTs, and compensating transactions to handle conflicts.

  5. Real systems are nuanced: Properties like consistency and availability exist on a spectrum, not as binary states.

PDF


Over the next few Saturdays, I'll be going through some of the foundational papers in Computer Science, and publishing my notes here. This is #22 in this series.