Design a Time Series Database (TSDB)

System Design
Hard
Tesla
36K views

Design a database optimized for storing and querying time-series data (e.g., sensor readings, stock prices). Discuss compression and indexing strategies.

Why Interviewers Ask This

Tesla evaluates this question to assess your ability to architect systems for high-velocity IoT data from vehicles. They specifically test your understanding of write-heavy workloads, efficient compression algorithms like Delta-of-Delta or Gorilla, and time-based indexing strategies that enable rapid aggregation without sacrificing storage costs.

How to Answer This Question

1. Clarify requirements: Define write throughput (millions of events/sec per vehicle), retention policies, and query patterns like range scans or aggregations over specific time windows. 2. Propose a schema: Suggest a columnar storage format optimized for time-series, separating metadata from metrics to maximize compression ratios. 3. Detail ingestion: Describe a write-ahead log followed by a memory buffer (memtable) that flushes to immutable disk segments to handle burst traffic. 4. Explain compression: Discuss encoding techniques such as run-length encoding for constant values and bit-packing for sensor IDs to reduce Tesla's massive fleet storage costs. 5. Address querying: Outline an inverted index on tags (e.g., VIN, sensor type) combined with sorted timestamp indexes to accelerate point-in-time lookups.

Key Points to Cover

  • Explicitly mention compression algorithms like Delta-of-Delta or Gorilla relevant to sensor data
  • Propose a columnar storage architecture rather than row-based SQL tables
  • Address the write-heavy nature of IoT telemetry with memtables and immutable segments
  • Explain how to balance latency for real-time monitoring versus cost for long-term storage
  • Design a partitioning strategy based on unique identifiers like VINs for data isolation

Sample Answer

To design a TSDB for Tesla's fleet, I would prioritize write throughput and storage efficiency given the volume of telemetry from millions of vehicles. First, I'd define the schema using a column-oriented store where eac…

Common Mistakes to Avoid

  • Focusing solely on relational database features like ACID transactions instead of write optimization
  • Ignoring the massive scale of data ingestion expected from a fleet of autonomous vehicles
  • Suggesting generic compression methods like ZIP instead of domain-specific time-series encodings
  • Overlooking the need for automatic data expiration or tiered storage for old telemetry data

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 190 System Design questionsBrowse all 29 Tesla questions