Design a System for A/B Testing
Design a service that allows feature toggles to route users into different experimental groups (A/B testing). Focus on user bucketing, state persistence, and analysis integration.
Why Interviewers Ask This
Interviewers at Google ask this to evaluate your ability to design distributed systems with strict consistency and low-latency requirements. They specifically assess how you handle user bucketing logic, ensure feature flags remain consistent across sessions, and manage the trade-offs between availability and correctness in a high-scale environment.
How to Answer This Question
1. Clarify Requirements: Define scale (users per second), latency constraints, and whether experiments are short-term or long-running. Ask about consistency needs for bucketing. 2. High-Level Design: Propose a microservices architecture with an API Gateway, a Bucketing Service, and a Configuration Store. 3. Detail User Bucketing: Explain using a hash function (like MurmurHash) on user ID and experiment ID to deterministically assign users to variants, ensuring they stay in the same group. 4. Address State Persistence: Discuss storing experiment configurations in a fast key-value store like Redis or Spanner, emphasizing eventual consistency vs. strong consistency. 5. Data Pipeline: Outline how clickstream data flows to a warehouse for analysis, mentioning schema design for experiment metadata. 6. Edge Cases: Cover handling new users, traffic shifting, and rollback mechanisms.
Key Points to Cover
- Use deterministic hashing algorithms like MurmurHash to guarantee user consistency across servers
- Leverage distributed databases like Spanner or sharded Redis for low-latency state persistence
- Implement asynchronous logging pipelines (e.g., Pub/Sub to BigQuery) for scalable data collection
- Define clear fallback strategies to maintain service availability during infrastructure failures
- Address traffic shifting and immediate rollback capabilities for real-time experiment management
Sample Answer
To design an A/B testing system, I'd start by clarifying that we need to support millions of daily active users with sub-10ms latency for flag resolution. The core component is the Bucketing Service. When a request arriv…
Common Mistakes to Avoid
- Ignoring the requirement for deterministic bucketing, leading to users seeing different variants on subsequent visits
- Focusing only on the UI implementation while neglecting the backend data pipeline for statistical analysis
- Proposing synchronous writes for analytics events, which would create unacceptable latency bottlenecks
- Failing to discuss how to handle edge cases like sudden traffic spikes or partial rollout failures
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.