Design a System to Handle Schema Migration

System Design
Medium
LinkedIn
61.8K views

Design a robust process for executing non-blocking database schema migrations on a live production system. Discuss blue/green migration strategies and rollback plans.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to balance data integrity with service availability in high-scale environments like LinkedIn's feed. They assess whether you understand the risks of locking tables during traffic spikes and if you can architect a strategy that allows zero-downtime updates while maintaining backward compatibility between old and new application versions.

How to Answer This Question

1. Clarify constraints: Ask about current traffic volume, database type (e.g., PostgreSQL vs. Cassandra), and acceptable downtime thresholds typical for LinkedIn's scale. 2. Define the core challenge: Explain that direct ALTER TABLE commands block writes, which is unacceptable for a live social network. 3. Propose the two-phase approach: First, add the column as nullable without migrating existing data. Second, run a background job to backfill data asynchronously. 4. Discuss versioning: Describe how to deploy code that reads both old and new columns simultaneously to ensure smooth transitions. 5. Detail rollback and monitoring: Outline how to revert code changes quickly if errors occur and emphasize using feature flags to toggle migrations safely.

Key Points to Cover

  • Explicitly rejecting single-command ALTER TABLE due to lock risks
  • Implementing a dual-write pattern for backward compatibility
  • Using asynchronous backfill jobs to avoid performance degradation
  • Designing a reversible rollback mechanism via feature flags
  • Leveraging blue/green deployment for safe traffic shifting

Sample Answer

To handle schema migration on a live system like LinkedIn's, I would avoid blocking DDL operations entirely. My strategy follows a four-step phased rollout. First, I would modify the schema to add the new column as nulla…

Common Mistakes to Avoid

  • Suggesting a maintenance window to stop all traffic, ignoring real-world availability needs
  • Failing to mention backward compatibility between old and new application versions
  • Overlooking the risk of partial data migration causing consistency errors
  • Not defining a concrete plan for reverting changes if the migration fails mid-process

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 26 LinkedIn questions