Introduction
LeetCode 295: Find Median from Data Stream is a challenging problem in which you are tasked with calculating the median from a data stream. The key challenge is that the data comes in continuously, and you need an efficient way to maintain and calculate the median after each new number is added.
To solve this efficiently, the Heap (Priority Queue) data structure is commonly used. You can maintain the data stream in two heaps: one max-heap for the lower half of the data, and one min-heap for the upper half. By maintaining these heaps, you can efficiently calculate the median after each insertion.
Problem Statement
The problem involves a data stream, and you need to calculate the median of all elements in the stream so far.
The MedianFinder
class should implement:
addNum(num: int)
: Adds the integer num
from the data stream to the data structure.
findMedian()
: Returns the median of all elements so far.
For this problem, the median is defined as:
- If the list has an odd number of elements, the median is the middle element.
- If the list has an even number of elements, the median is the average of the two middle elements.
Example
python
# Example 1:
Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output:
[null, null, null, 1.5, null, 2.0]
Explanation:
addNum(1)
– adds the number 1 to the data stream.
addNum(2)
– adds the number 2 to the data stream.
findMedian()
– calculates the median of the numbers [1, 2], which is (1 + 2) / 2 = 1.5
.
addNum(3)
– adds the number 3 to the data stream.
findMedian()
– calculates the median of the numbers [1, 2, 3], which is 2.0
(the middle number).
✅ Step-by-Step Solution
🧠 Intuition
The key idea here is to use two heaps:
- Max-Heap (
low
) to store the smaller half of the numbers.
- Min-Heap (
high
) to store the larger half of the numbers.
By maintaining the two heaps, we can efficiently:
- Add a new number to one of the heaps.
- Balance the heaps (ensure that the size difference between the heaps is at most 1).
- Find the median:
- If the heaps are of equal size, the median is the average of the root elements from both heaps.
- If the heaps are not of equal size, the root of the heap with more elements is the median.
This ensures that both adding numbers and finding the median are efficient, with operations being done in logarithmic time.
✅ Approach
- Add Number (
addNum(num: int)
):
- First, decide which heap (max or min) the number should go into based on the value.
- Add the number to the correct heap.
- Balance the heaps if necessary. The size difference between the heaps should be at most 1.
- Find Median (
findMedian()
):
- If the heaps are of equal size, the median is the average of the roots of both heaps.
- If one heap has more elements than the other, the median is the root of the heap with more elements.
- Balancing the Heaps:
- If the size of
low
exceeds the size of high
by more than one, move the root from low
to high
.
- If the size of
high
exceeds the size of low
by more than one, move the root from high
to low
.
✅ Python Code
python
import heapq
class MedianFinder:
def __init__(self):
# Max-heap to store the smaller half (invert the value for max-heap simulation)
self.low = [] # max-heap (inverted values)
# Min-heap to store the larger half
self.high = [] # min-heap
def addNum(self, num: int) -> None:
# Add number to max-heap (low)
heapq.heappush(self.low, -num)
# Balance: Ensure that all elements in low are <= all elements in high
if self.low and self.high and (-self.low[0] > self.high[0]):
# Move the root of low to high
heapq.heappush(self.high, -heapq.heappop(self.low))
# Balance the size of heaps (low and high differ by at most 1 element)
if len(self.low) > len(self.high) + 1:
heapq.heappush(self.high, -heapq.heappop(self.low))
if len(self.high) > len(self.low):
heapq.heappush(self.low, -heapq.heappop(self.high))
def findMedian(self) -> float:
# If the heaps are of equal size, the median is the average of the two roots
if len(self.low) == len(self.high):
return (-self.low[0] + self.high[0]) / 2.0
# Otherwise, the median is the root of the heap with more elements
return float(-self.low[0])
🧪 How It Works
- Initialization (
__init__
):
- We initialize two heaps:
low
as a max-heap (by pushing negative values).
high
as a min-heap.
- Adding a Number (
addNum
):
- We push the number into the
low
heap (as a negative value to simulate a max-heap).
- We then check if the largest value in
low
is greater than the smallest value in high
. If so, we move the largest value from low
to high
to maintain the order.
- We then balance the sizes of the heaps to ensure that the size difference between
low
and high
is at most 1.
- Finding the Median (
findMedian
):
- If the heaps are of equal size, the median is the average of the roots of
low
and high
.
- If the heaps are not of equal size, the median is the root of the heap with more elements (which will always be
low
).
⏱️ Time and Space Complexity
MetricValueTime Complexity (addNum)O(log n), where n
is the number of elements added so far. The time complexity arises from inserting a number into the heap and possibly balancing the heaps.Time Complexity (findMedian)O(1), because finding the median only requires accessing the roots of the heaps.Space ComplexityO(n), where n
is the number of elements added to the data structure.
- Time Complexity for
addNum(num: int)
: Inserting into a heap takes O(log n) time, where n
is the current size of the data stream.
- Time Complexity for
findMedian()
: Accessing the root of the heaps is a constant-time operation, so O(1).
- Space Complexity: The space complexity is O(n) due to the storage required for the heaps.
🔍 Edge Cases
- Single Number: If only one number has been added, the median is simply that number.
- For example, after adding
1
, the median is 1
.
- Even Number of Elements: When there is an even number of elements, the median is the average of the two middle numbers.
- For example, after adding
1
, 2
, the median is (1 + 2) / 2 = 1.5
.
- Odd Number of Elements: When there is an odd number of elements, the median is the middle number.
- For example, after adding
1
, 2
, 3
, the median is 2
.
- Balanced Data Stream: Even if numbers are added in increasing or decreasing order, the median calculation will still work correctly.
✅ Conclusion
LeetCode 295: Find Median from Data Stream challenges us to efficiently calculate the median from a continuous stream of data. Using two heaps—one max-heap and one min-heap—allows us to maintain a dynamic data stream and calculate the median in O(log n) time for each addition and O(1) time for finding the median. This approach ensures that the solution is both time-efficient and space-efficient, making it suitable for real-time data processing applications.