Design a System for Geo-Distributed Data Storage

System Design
Hard
Apple
115.8K views

Discuss options for storing data across multiple continents (e.g., DynamoDB Global Tables, CockroachDB). Focus on multi-master conflicts and eventual consistency.

Why Interviewers Ask This

Interviewers at Apple ask this to evaluate your ability to balance strong consistency guarantees with global latency requirements. They specifically test your understanding of the CAP theorem in a real-world context, focusing on how you handle multi-master write conflicts and eventual consistency patterns across continents without compromising user experience or data integrity.

How to Answer This Question

1. Clarify Requirements: Immediately define the trade-off between consistency and availability (AP vs CP) and establish the SLA for data freshness across regions. 2. High-Level Architecture: Propose a geo-replicated schema using services like DynamoDB Global Tables or CockroachDB, explaining why you chose them over single-region solutions. 3. Conflict Resolution Strategy: Detail specific algorithms like Last-Write-Wins (LWW), Vector Clocks, or CRDTs to resolve concurrent writes from different masters. 4. Consistency Model: Explain the path from eventual consistency to strong consistency, discussing read-your-writes guarantees and anti-entropy mechanisms. 5. Edge Cases: Address network partitions, split-brain scenarios, and how you would handle catastrophic data loss or regional outages while maintaining system resilience.

Key Points to Cover

  • Explicitly choosing between AP and CP models based on specific use case requirements
  • Detailing concrete conflict resolution strategies like Vector Clocks or CRDTs
  • Explaining the mechanism of anti-entropy processes for state reconciliation
  • Demonstrating awareness of clock drift issues in timestamp-based comparisons
  • Defining clear failure modes and recovery strategies during network partitions

Sample Answer

To design a geo-distributed storage system for a global product like Apple Maps or iCloud, we must prioritize low-latency reads while managing complex write conflicts. I would start by selecting an AP system like DynamoD…

Common Mistakes to Avoid

  • Assuming strong consistency is always possible globally without significant latency penalties
  • Overlooking the problem of clock drift when proposing simple timestamp-based conflict resolution
  • Failing to discuss how the system behaves during a network partition or region outage
  • Ignoring the complexity of merging data types that cannot be simply overwritten like strings or numbers

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 54 Apple questions