Design a Twitter Feed (Conceptual Data Storage)
Describe the data structure required for a user's chronological Twitter/X feed, supporting billions of posts. Focus on the fan-out-on-write model using specialized storage like Redis or Cassandra.
Why Interviewers Ask This
Interviewers ask this to evaluate your ability to design scalable data systems that handle massive write-throughput. Specifically, they test if you understand the trade-offs between fan-out-on-write and fan-out-on-read models when serving billions of users with low latency requirements.
How to Answer This Question
1. Clarify constraints immediately: assume 500 million daily active users and strict read/write latency targets typical of Microsoft's scale.
2. Define the core problem: fetching a chronological feed without scanning the entire database for every request.
3. Propose the Fan-Out-On-Write model as the primary strategy, explaining how posts are pushed to follower caches upon creation.
4. Detail the storage architecture: use Redis for hot follower timelines due to sub-millisecond access speeds and Cassandra or HBase for persistent post storage.
5. Address edge cases like 'ghost followers' (users who follow but don't see new posts) and handling high-profile accounts with millions of followers by switching to fan-out-on-read for them only.
6. Conclude by summarizing the consistency vs. availability trade-off inherent in this distributed design.
Key Points to Cover
- Explicitly choosing Fan-Out-On-Write over Fan-Out-On-Read for standard users
- Justifying Redis usage for hot data caching to meet latency SLAs
- Identifying the celebrity account bottleneck and proposing a hybrid solution
- Explaining how to separate metadata from full content storage
- Acknowledging the trade-off between strong consistency and system scalability
Sample Answer
To design a Twitter feed capable of handling billions of posts, I would prioritize the Fan-Out-On-Write model to ensure low-latency reads. When a user posts, we push that content ID into the pre-computed timeline lists o…
Common Mistakes to Avoid
- Suggesting a simple SQL join for every feed request, which would cause database collapse at scale
- Ignoring the 'celebrity problem' where one user has millions of followers
- Failing to distinguish between storing the tweet content versus just the reference ID
- Overlooking the need for sharding strategies to distribute the write load across servers
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.