Design a Twitter Feed (Conceptual Data Storage)

Data Structures
Medium
Microsoft
96.7K views

Describe the data structure required for a user's chronological Twitter/X feed, supporting billions of posts. Focus on the fan-out-on-write model using specialized storage like Redis or Cassandra.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to design scalable data systems that handle massive write-throughput. Specifically, they test if you understand the trade-offs between fan-out-on-write and fan-out-on-read models when serving billions of users with low latency requirements.

How to Answer This Question

1. Clarify constraints immediately: assume 500 million daily active users and strict read/write latency targets typical of Microsoft's scale. 2. Define the core problem: fetching a chronological feed without scanning the entire database for every request. 3. Propose the Fan-Out-On-Write model as the primary strategy, explaining how posts are pushed to follower caches upon creation. 4. Detail the storage architecture: use Redis for hot follower timelines due to sub-millisecond access speeds and Cassandra or HBase for persistent post storage. 5. Address edge cases like 'ghost followers' (users who follow but don't see new posts) and handling high-profile accounts with millions of followers by switching to fan-out-on-read for them only. 6. Conclude by summarizing the consistency vs. availability trade-off inherent in this distributed design.

Key Points to Cover

  • Explicitly choosing Fan-Out-On-Write over Fan-Out-On-Read for standard users
  • Justifying Redis usage for hot data caching to meet latency SLAs
  • Identifying the celebrity account bottleneck and proposing a hybrid solution
  • Explaining how to separate metadata from full content storage
  • Acknowledging the trade-off between strong consistency and system scalability

Sample Answer

To design a Twitter feed capable of handling billions of posts, I would prioritize the Fan-Out-On-Write model to ensure low-latency reads. When a user posts, we push that content ID into the pre-computed timeline lists o…

Common Mistakes to Avoid

  • Suggesting a simple SQL join for every feed request, which would cause database collapse at scale
  • Ignoring the 'celebrity problem' where one user has millions of followers
  • Failing to distinguish between storing the tweet content versus just the reference ID
  • Overlooking the need for sharding strategies to distribute the write load across servers

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 166 Data Structures questionsBrowse all 107 Microsoft questions