Design a Public Transit Monitoring System
Design a real-time system that tracks the location of buses/trains and provides accurate ETA predictions to users. Focus on sensor ingestion and prediction model deployment.
Why Interviewers Ask This
Interviewers at Uber ask this to evaluate your ability to architect real-time systems under latency constraints. They specifically assess how you handle high-velocity sensor ingestion, manage data consistency for location tracking, and design scalable prediction models that account for dynamic traffic variables while maintaining system reliability.
How to Answer This Question
1. Clarify Requirements: Define scale (vehicles per hour), latency targets (sub-second updates), and core features like ETA accuracy versus historical routing. 2. High-Level Architecture: Propose a microservices approach separating ingestion from computation, utilizing Kafka or Pulsar for event streaming. 3. Data Ingestion Strategy: Detail how GPS sensors publish coordinates via MQTT or gRPC to an edge gateway before entering the stream processor. 4. Prediction Engine Design: Explain using a sliding window of historical traffic data combined with real-time congestion feeds fed into a machine learning model deployed via Kubernetes for auto-scaling. 5. Storage & Retrieval: Describe using a time-series database like Cassandra for raw telemetry and Redis for caching live ETAs to ensure low-latency user responses. 6. Edge Cases: Address network failures by implementing local buffering on devices and fallback logic for stale data.
Key Points to Cover
- Demonstrating knowledge of streaming technologies like Kafka or Flink for real-time data processing
- Addressing the specific challenge of latency between sensor data and user-facing ETA display
- Proposing a hybrid storage solution combining time-series databases with in-memory caches
- Explaining how the prediction model integrates both historical trends and live traffic conditions
- Designing for fault tolerance through device-side buffering and graceful degradation strategies
Sample Answer
To design this system, I would start by defining non-functional requirements: sub-second latency for location updates and 95% accuracy for ETAs within a 10-minute window. The architecture relies on an event-driven pipeli…
Common Mistakes to Avoid
- Focusing solely on database schema without explaining the real-time data flow and ingestion pipeline
- Ignoring the need for a caching layer, leading to unrealistic latency expectations for end users
- Treating the prediction problem as a simple average calculation rather than a complex ML deployment scenario
- Overlooking edge cases such as poor network connectivity or GPS drift in urban canyons
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.