Design a Chat System (WhatsApp/WeChat)

System Design
Hard
Meta
126.2K views

Design a large-scale 1-on-1 and group chat application. Focus on real-time messaging, message delivery guarantees, state management, and the use of WebSockets vs. persistent connections.

Why Interviewers Ask This

Meta asks this to evaluate your ability to architect real-time systems handling millions of concurrent connections. They specifically test your understanding of WebSocket persistence, message delivery guarantees like at-least-once semantics, and how to manage state consistency across distributed servers for both private and group chats without data loss.

How to Answer This Question

1. Clarify requirements: Define scale (DAU), latency targets (sub-second), and features like read receipts or typing indicators. 2. Estimate capacity: Calculate messages per second and storage needs using Meta's typical traffic patterns. 3. Design the API: Outline REST endpoints for initial sync and define the WebSocket protocol for real-time streams. 4. Architecture core: Propose a sharded microservices model where users connect to specific chat nodes based on user ID hashing. 5. Ensure reliability: Discuss message queues (Kafka) for durability, acknowledgment protocols, and conflict resolution for group edits. 6. Address scaling: Explain horizontal scaling strategies for connection managers and database sharding for message history.

Key Points to Cover

  • Explicitly choosing WebSockets over HTTP polling for low-latency bidirectional communication
  • Implementing a write-through pattern with Kafka to guarantee message durability before delivery
  • Using consistent hashing to shard users across servers for efficient load distribution
  • Defining an acknowledgment protocol to ensure at-least-once delivery semantics
  • Addressing the complexity of fan-out optimization for group messaging scenarios

Sample Answer

To design a scalable chat system like WhatsApp, I start by defining non-functional requirements: sub-50ms latency globally and 99.99% availability. First, we need an API gateway that handles authentication via OAuth, the…

Common Mistakes to Avoid

  • Focusing only on database schema while ignoring the critical role of connection management and WebSocket heartbeats
  • Proposing simple polling mechanisms which fail under the high concurrency demands of a platform like Meta
  • Neglecting to discuss message ordering and potential race conditions in group chat updates
  • Overlooking the strategy for handling offline users and syncing message history upon reconnection

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 71 Meta questions