Design a Distributed Semaphore

System Design
Hard
Stripe
33.6K views

Design a system that acts as a distributed lock or semaphore to control access to a shared resource across multiple servers. Discuss using ZooKeeper or Redis.

Why Interviewers Ask This

Interviewers at Stripe ask this to evaluate your ability to handle distributed consensus and race conditions in high-throughput financial systems. They specifically test your understanding of eventual consistency, leader election, and failure scenarios like network partitions or node crashes. The goal is to see if you can design a solution that guarantees mutual exclusion without creating a single point of failure.

How to Answer This Question

1. Clarify Requirements: Define the semaphore's capacity (e.g., allow N concurrent connections), latency constraints typical for payment processing, and durability needs. 2. Choose a Consensus Model: Decide between using Redis with Lua scripts for speed versus ZooKeeper for strong consistency and watch mechanisms. 3. Design the Core Logic: Explain how clients acquire a lock by creating an ephemeral node or incrementing a counter, and release it upon timeout or explicit unlock. 4. Handle Failures: Detail strategies for handling client crashes where locks must be auto-released via TTLs or heartbeat mechanisms to prevent deadlocks. 5. Discuss Trade-offs: Compare the CAP theorem implications of your choice, emphasizing availability for Stripe's global scale versus strict consistency for transaction integrity.

Key Points to Cover

  • Explicitly defining the difference between a lock (binary) and a semaphore (count-based) before starting
  • Demonstrating knowledge of ephemeral nodes and sequential ordering in ZooKeeper
  • Addressing the 'livelock' scenario where a crashed client holds a lock forever
  • Discussing the trade-off between strong consistency (ZooKeeper) and high throughput (Redis)
  • Explaining how to handle network partitions and split-brain scenarios effectively

Sample Answer

To design a distributed semaphore for a system like Stripe, I would first clarify the requirements. We need to limit access to a shared resource, such as a database connection pool or a specific API endpoint, across thou…

Common Mistakes to Avoid

  • Ignoring the need for automatic lock expiration, leading to permanent deadlocks if a client crashes
  • Focusing solely on the happy path and failing to discuss what happens during a network partition
  • Proposing a central coordinator that acts as a single point of failure without a failover strategy
  • Confusing mutual exclusion locks with semaphores by forgetting to implement the counting logic
  • Overlooking the performance cost of polling instead of using efficient watch/notification mechanisms

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 57 Stripe questions