Design the Twitter News Feed
Design the system that generates the news feed for Twitter/X. Focus on the fan-out mechanism (push vs. pull), feed ranking, and handling celebrity users (hot spots).
Why Interviewers Ask This
Interviewers ask this to evaluate your ability to balance scalability with real-time performance in high-traffic systems. They specifically test your understanding of fan-out patterns, how to handle hot spots like celebrity users without crashing the system, and your capacity to prioritize trade-offs between consistency and availability.
How to Answer This Question
1. Clarify Requirements: Define scope (read vs. write QPS), latency goals, and scale (e.g., 500M daily active users). 2. High-Level Architecture: Sketch a flow from Tweet creation to Feed retrieval, identifying core components like APIs, databases, and caches. 3. Fan-Out Strategy: Debate Push vs. Pull models; recommend Hybrid for Twitter's specific mix of casual and celebrity users. 4. Hot Spot Handling: Detail how to isolate celebrity feeds using pre-computation or specialized queues to prevent cache stampedes. 5. Ranking & Refinement: Briefly explain how to integrate ML-based ranking logic post-retrieval. 6. Trade-offs: Conclude by discussing consistency, storage costs, and failure scenarios.
Key Points to Cover
- Propose a Hybrid Fan-Out strategy to balance load between push and pull
- Explicitly address the 'Celebrity Hot Spot' problem with isolation techniques
- Differentiate between write optimization and read optimization paths
- Demonstrate awareness of caching layers (Redis/Memcached) for low latency
- Articulate clear trade-offs regarding data consistency versus availability
Sample Answer
To design Twitter's feed, I first clarify that we need sub-second latency for billions of reads while handling massive write spikes. The core challenge is the fan-out mechanism. A pure pull model is too slow for millions…
Common Mistakes to Avoid
- Ignoring the scale difference between regular users and celebrities, leading to an inefficient all-push or all-pull solution
- Focusing solely on database schema without explaining the real-time data propagation mechanism
- Overlooking the impact of network latency when aggregating feeds from multiple sources
- Failing to define clear metrics for success, such as tail latency requirements or throughput targets
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.