Find Median from Data Stream
Design a data structure that supports adding new integers and finding the median efficiently. Use two Heaps (Min-Heap and Max-Heap).
Why Interviewers Ask This
IBM evaluates candidates on their ability to optimize real-time data processing, a core competency for their cloud and cognitive systems. This specific problem tests your mastery of heap data structures, understanding of time complexity trade-offs between insertion and retrieval, and capacity to design balanced algorithms for streaming data scenarios common in enterprise environments.
How to Answer This Question
1. Clarify Requirements: Confirm input constraints, whether the stream is infinite, and if duplicates are allowed. IBM values precision before coding. 2. Define Strategy: Propose using two heaps—a max-heap for the lower half and a min-heap for the upper half—to maintain median access in O(1) time. 3. Explain Balancing Logic: Detail how you will ensure the size difference between heaps never exceeds one, ensuring the median is always at the root of one or both heaps. 4. Walk Through Operations: Step-by-step trace adding a number: push to appropriate heap, rebalance if sizes differ, then show how to calculate the median based on current heap roots. 5. Analyze Complexity: Explicitly state O(log n) for addNum and O(1) for findMedian, discussing space complexity as well.
Key Points to Cover
- Explicitly defining the invariant that heap sizes must differ by no more than one
- Demonstrating clear logic for determining which heap receives the new element
- Explaining the rebalancing mechanism to maintain O(log n) insertion time
- Correctly handling edge cases like empty streams or single elements
- Articulating the final time and space complexity analysis clearly
Sample Answer
To solve this efficiently, I would use a dual-heap approach. We maintain a max-heap for the smaller half of numbers and a min-heap for the larger half. The key invariant is that the size of these heaps differs by at most…
Common Mistakes to Avoid
- Using a sorted array instead of heaps, resulting in O(n) insertion time
- Failing to rebalance heaps after every insertion, breaking the median logic
- Not handling the case where total elements are even versus odd correctly
- Overlooking duplicate values when comparing against heap roots
Sound confident on this question in 5 minutes
Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.