Design a Distributed File Locking System

System Design
Hard
Google
73.1K views

Design a system that provides exclusive read/write access to shared files across a distributed cluster. Discuss using ZooKeeper or a dedicated lock service.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to handle distributed consistency, race conditions, and fault tolerance in high-concurrency environments. They specifically want to see if you understand the CAP theorem trade-offs when designing coordination services like ZooKeeper. The question tests your capacity to translate theoretical consensus algorithms into a practical, production-grade locking mechanism that prevents data corruption across a cluster.

How to Answer This Question

1. Clarify requirements immediately: define if locks are exclusive or shared, synchronous vs asynchronous, and the expected failure modes like network partitions or node crashes. 2. Propose a centralized coordinator using an external service like ZooKeeper or etcd rather than building a custom consensus layer from scratch, as this aligns with Google's preference for leveraging mature infrastructure. 3. Detail the lock acquisition flow: explain how clients create ephemeral sequential nodes to request access and how the system determines the winner based on sequence numbers. 4. Discuss release mechanisms: describe how ephemeral nodes automatically vanish on client death to prevent deadlocks, ensuring liveness. 5. Address edge cases explicitly: cover split-brain scenarios, leader election failures, and how to handle clock skew or network delays during lock contention.

Key Points to Cover

  • Explicitly choosing a proven coordination service like ZooKeeper over building a custom consensus layer
  • Using ephemeral sequential nodes to handle automatic lock release on client failure
  • Explaining the logic of comparing sequence numbers to determine lock ownership fairly
  • Addressing the thundering herd problem through targeted notifications rather than broadcasts
  • Demonstrating understanding of CAP theorem trade-offs regarding availability versus strict consistency

Sample Answer

To design a distributed file locking system, I would first clarify that we need exclusive write locks and potentially shared read locks, with a strong guarantee of safety even under network partitions. Given the scale at…

Common Mistakes to Avoid

  • Suggesting a naive timeout-based approach where clients check if a lock is stale, which fails during network partitions
  • Ignoring the difference between exclusive and shared locks, leading to potential data corruption scenarios
  • Failing to mention ephemeral nodes, resulting in deadlocks if the client process crashes unexpectedly
  • Overlooking the performance cost of having every waiting client watch the lock holder instead of chaining notifications

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 145 Google questions