Design a Video Recommendation Engine (Short Form)

System Design
Hard
Google
56.2K views

Design the system for personalized short-video recommendations (like TikTok). Focus on low latency, real-time feature extraction, and candidate generation from a massive corpus.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to architect high-throughput, low-latency systems handling massive scale. They specifically test your understanding of the two-stage funnel (candidate generation vs. ranking), real-time feature processing for short-form content, and how to balance personalization with exploration in a cold-start scenario.

How to Answer This Question

1. Clarify requirements immediately: Define latency targets (under 200ms), daily active users, and video ingestion rates typical of Google-scale products. 2. Propose a Two-Stage Architecture: Start with Candidate Generation using Approximate Nearest Neighbor (ANN) search on user embeddings to filter millions of videos down to hundreds. 3. Detail Real-Time Feature Extraction: Explain how you capture immediate user signals like swipe speed and watch time to update user vectors dynamically. 4. Describe the Ranking Model: Discuss using a deep neural network that ingests both static features and real-time context to score candidates. 5. Address Scale and Consistency: Mention sharding strategies, caching layers (like Redis), and A/B testing frameworks to validate model improvements before global rollout.

Key Points to Cover

  • Explicitly defining the two-stage funnel (Candidate Generation followed by Ranking) as the core architectural pattern
  • Demonstrating knowledge of Approximate Nearest Neighbor (ANN) algorithms for efficient retrieval from massive datasets
  • Addressing the specific challenge of real-time feature extraction for short-form content engagement signals
  • Proposing concrete solutions for latency reduction through caching and optimized data structures
  • Discussing mechanisms to handle cold-start problems and content diversity within the ranking logic

Sample Answer

To design a TikTok-style recommendation engine at Google scale, I would prioritize a two-stage funnel architecture to handle the massive corpus while maintaining sub-200ms latency. First, in the Candidate Generation phas…

Common Mistakes to Avoid

  • Focusing solely on the machine learning model without explaining the data pipeline and infrastructure required for real-time inference
  • Ignoring the latency constraints inherent in short-form video apps, leading to suggestions that are computationally too heavy
  • Overlooking the candidate generation step and proposing to rank all available videos, which is impossible at scale
  • Failing to mention how to handle real-time user feedback, resulting in a system that feels static and unresponsive

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 173 System Design questionsBrowse all 129 Google questions