Design a Chat System (WhatsApp/WeChat)
Design a large-scale 1-on-1 and group chat application. Focus on real-time messaging, message delivery guarantees, state management, and the use of WebSockets vs. persistent connections.
Why Interviewers Ask This
Meta asks this to evaluate your ability to architect real-time systems handling millions of concurrent connections. They specifically test your understanding of WebSocket persistence, message delivery guarantees like at-least-once semantics, and how to manage state consistency across distributed servers for both private and group chats without data loss.
How to Answer This Question
1. Clarify requirements: Define scale (DAU), latency targets (sub-second), and features like read receipts or typing indicators. 2. Estimate capacity: Calculate messages per second and storage needs using Meta's typical traffic patterns. 3. Design the API: Outline REST endpoints for initial sync and define the WebSocket protocol for real-time streams. 4. Architecture core: Propose a sharded microservices model where users connect to specific chat nodes based on user ID hashing. 5. Ensure reliability: Discuss message queues (Kafka) for durability, acknowledgment protocols, and conflict resolution for group edits. 6. Address scaling: Explain horizontal scaling strategies for connection managers and database sharding for message history.
Key Points to Cover
- Explicitly choosing WebSockets over HTTP polling for low-latency bidirectional communication
- Implementing a write-through pattern with Kafka to guarantee message durability before delivery
- Using consistent hashing to shard users across servers for efficient load distribution
- Defining an acknowledgment protocol to ensure at-least-once delivery semantics
- Addressing the complexity of fan-out optimization for group messaging scenarios
Sample Answer
To design a scalable chat system like WhatsApp, I start by defining non-functional requirements: sub-50ms latency globally and 99.99% availability. First, we need an API gateway that handles authentication via OAuth, the…
Common Mistakes to Avoid
- Focusing only on database schema while ignoring the critical role of connection management and WebSocket heartbeats
- Proposing simple polling mechanisms which fail under the high concurrency demands of a platform like Meta
- Neglecting to discuss message ordering and potential race conditions in group chat updates
- Overlooking the strategy for handling offline users and syncing message history upon reconnection
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.