Design a Distributed Unique ID Generator (Snowflake)
Design a service to generate globally unique, monotonically increasing IDs (like Twitter's Snowflake). Discuss structure, time component, and high availability.
Why Interviewers Ask This
Interviewers at Google ask this to evaluate your ability to design scalable systems under strict constraints. They specifically test your understanding of clock synchronization, collision avoidance in distributed environments, and how to handle node failures without compromising ID uniqueness or monotonicity.
How to Answer This Question
1. Clarify Requirements: Define the scale (e.g., billions per day), latency needs, and whether IDs must be strictly monotonically increasing or just sortable. 2. Propose Structure: Outline a bit-field layout similar to Snowflake, allocating bits for timestamp, machine ID, and sequence number. 3. Address Time Handling: Discuss using high-resolution clocks and handling clock skew by falling back to sequence numbers if time doesn't advance. 4. Ensure Uniqueness: Explain how to generate unique worker IDs and handle sequence number rollover within a millisecond. 5. High Availability: Detail strategies like pre-allocating IDs or using a centralized service with replication for fault tolerance. 6. Trade-offs: Conclude by comparing your solution against alternatives like UUIDs or database auto-increments, highlighting why your approach fits Google's scale.
Key Points to Cover
- Explicitly defining the bit-field allocation strategy for timestamp, machine ID, and sequence
- Demonstrating a concrete strategy for handling clock skew and negative time deltas
- Explaining how to assign unique worker IDs without creating a single point of failure
- Discussing the trade-off between strict monotonicity and system availability during clock issues
- Calculating capacity based on the proposed bit distribution to prove scalability
Sample Answer
To design a globally unique, monotonically increasing ID generator, I would adapt the Snowflake algorithm while addressing its specific challenges. First, I'd define the requirements: we need roughly 10 billion IDs daily…
Common Mistakes to Avoid
- Ignoring clock synchronization issues entirely, assuming all servers have perfectly synchronized time
- Failing to explain how to prevent duplicate IDs when the sequence number overflows within a single millisecond
- Proposing a centralized database for every ID generation request, which creates a bottleneck and violates scalability goals
- Overlooking the need for a unique identifier per node, leading to potential collisions if multiple services run on the same host
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.