Find Median from Data Stream

Algorithms
Hard
IBM
136.3K views

Design a data structure that supports adding new integers and finding the median efficiently. Use two Heaps (Min-Heap and Max-Heap).

Why Interviewers Ask This

IBM evaluates candidates on their ability to optimize real-time data processing, a core competency for their cloud and cognitive systems. This specific problem tests your mastery of heap data structures, understanding of time complexity trade-offs between insertion and retrieval, and capacity to design balanced algorithms for streaming data scenarios common in enterprise environments.

How to Answer This Question

1. Clarify Requirements: Confirm input constraints, whether the stream is infinite, and if duplicates are allowed. IBM values precision before coding. 2. Define Strategy: Propose using two heaps—a max-heap for the lower half and a min-heap for the upper half—to maintain median access in O(1) time. 3. Explain Balancing Logic: Detail how you will ensure the size difference between heaps never exceeds one, ensuring the median is always at the root of one or both heaps. 4. Walk Through Operations: Step-by-step trace adding a number: push to appropriate heap, rebalance if sizes differ, then show how to calculate the median based on current heap roots. 5. Analyze Complexity: Explicitly state O(log n) for addNum and O(1) for findMedian, discussing space complexity as well.

Key Points to Cover

  • Explicitly defining the invariant that heap sizes must differ by no more than one
  • Demonstrating clear logic for determining which heap receives the new element
  • Explaining the rebalancing mechanism to maintain O(log n) insertion time
  • Correctly handling edge cases like empty streams or single elements
  • Articulating the final time and space complexity analysis clearly

Sample Answer

To solve this efficiently, I would use a dual-heap approach. We maintain a max-heap for the smaller half of numbers and a min-heap for the larger half. The key invariant is that the size of these heaps differs by at most…

Common Mistakes to Avoid

  • Using a sorted array instead of heaps, resulting in O(n) insertion time
  • Failing to rebalance heaps after every insertion, breaking the median logic
  • Not handling the case where total elements are even versus odd correctly
  • Overlooking duplicate values when comparing against heap roots

Sound confident on this question in 5 minutes

Answer once and get a 30-second AI critique of your structure, content, and delivery. First attempt is free — no signup needed.

Try it free

Related Interview Questions

Browse all 145 Algorithms questionsBrowse all 29 IBM questions