Design an Email Service (SMTP/Sending)

System Design
Medium
Meta
114.7K views

Design a system that can reliably send billions of emails. Focus on queueing, dealing with spam filters, bounce handling, and throttling.

Why Interviewers Ask This

Meta asks this to evaluate your ability to design highly available, distributed systems that handle massive scale. They specifically test your understanding of reliability patterns like idempotency and backoff strategies, not just basic SMTP protocols. The goal is to see if you can balance throughput with deliverability while managing complex state transitions in a distributed environment.

How to Answer This Question

1. Clarify requirements immediately: Ask about daily volume, latency SLAs, and specific metrics for bounce rates or spam complaints to define 'billions'. 2. Define the high-level architecture: Sketch a flow from User -> API Gateway -> Message Queue (like Kafka) -> Worker Pool -> SMTP Servers. 3. Deep dive into the queue: Explain how to partition queues by domain to prevent single-tenant issues and ensure ordering where needed. 4. Address deliverability: Discuss handling bounces asynchronously via webhooks, implementing exponential backoff for retries, and managing IP reputation pools. 5. Discuss throttling and limits: Explain rate limiting strategies per recipient domain to avoid blacklisting, referencing Meta's need for global consistency. 6. Conclude with monitoring: Highlight tracking key metrics like delivery rate, bounce rate, and queue depth to ensure system health.

Key Points to Cover

  • Explicitly mention partitioning the message queue by recipient domain to isolate failures
  • Describe an asynchronous bounce handling mechanism with exponential backoff for retries
  • Explain the separation of IP pools to protect overall sender reputation
  • Detail the implementation of idempotency keys to prevent duplicate email sends
  • Highlight real-time monitoring of queue depth and delivery metrics for operational visibility

Sample Answer

To design an email service capable of sending billions of emails reliably, I would start by defining the core components: an ingestion API, a distributed message queue, a worker pool, and a reputation management system.…

Common Mistakes to Avoid

  • Focusing only on the SMTP protocol details without addressing the distributed system architecture required for scale
  • Ignoring the critical need for asynchronous processing when handling bounces and hard errors
  • Overlooking the importance of rate limiting per domain, which is essential to avoid IP blacklisting
  • Forgetting to discuss idempotency, leading to potential duplicate emails sent during network failures

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 71 Meta questions