Programming & Development / April 10, 2025

LeetCode 295: Find Median from Data Stream – Efficient Median Calculation in a Stream

LeetCode 295 Median from Data Stream dynamic data stream heap median calculation priority queue real-time data processing Python heapq

Introduction

LeetCode 295: Find Median from Data Stream is a challenging problem in which you are tasked with calculating the median from a data stream. The key challenge is that the data comes in continuously, and you need an efficient way to maintain and calculate the median after each new number is added.

To solve this efficiently, the Heap (Priority Queue) data structure is commonly used. You can maintain the data stream in two heaps: one max-heap for the lower half of the data, and one min-heap for the upper half. By maintaining these heaps, you can efficiently calculate the median after each insertion.

Problem Statement

The problem involves a data stream, and you need to calculate the median of all elements in the stream so far.
The MedianFinder class should implement:
  1. addNum(num: int): Adds the integer num from the data stream to the data structure.
  2. findMedian(): Returns the median of all elements so far.
For this problem, the median is defined as:
  • If the list has an odd number of elements, the median is the middle element.
  • If the list has an even number of elements, the median is the average of the two middle elements.

Example

python

# Example 1:
Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]

Output:
[null, null, null, 1.5, null, 2.0]

Explanation:

  1. addNum(1) – adds the number 1 to the data stream.
  2. addNum(2) – adds the number 2 to the data stream.
  3. findMedian() – calculates the median of the numbers [1, 2], which is (1 + 2) / 2 = 1.5.
  4. addNum(3) – adds the number 3 to the data stream.
  5. findMedian() – calculates the median of the numbers [1, 2, 3], which is 2.0 (the middle number).

✅ Step-by-Step Solution

🧠 Intuition

The key idea here is to use two heaps:

  1. Max-Heap (low) to store the smaller half of the numbers.
  2. Min-Heap (high) to store the larger half of the numbers.

By maintaining the two heaps, we can efficiently:

  • Add a new number to one of the heaps.
  • Balance the heaps (ensure that the size difference between the heaps is at most 1).
  • Find the median:
  • If the heaps are of equal size, the median is the average of the root elements from both heaps.
  • If the heaps are not of equal size, the root of the heap with more elements is the median.

This ensures that both adding numbers and finding the median are efficient, with operations being done in logarithmic time.

✅ Approach

  1. Add Number (addNum(num: int)):
  • First, decide which heap (max or min) the number should go into based on the value.
  • Add the number to the correct heap.
  • Balance the heaps if necessary. The size difference between the heaps should be at most 1.
  1. Find Median (findMedian()):
  • If the heaps are of equal size, the median is the average of the roots of both heaps.
  • If one heap has more elements than the other, the median is the root of the heap with more elements.
  1. Balancing the Heaps:
  • If the size of low exceeds the size of high by more than one, move the root from low to high.
  • If the size of high exceeds the size of low by more than one, move the root from high to low.

✅ Python Code

python

import heapq

class MedianFinder:

    def __init__(self):
        # Max-heap to store the smaller half (invert the value for max-heap simulation)
        self.low = []  # max-heap (inverted values)
        # Min-heap to store the larger half
        self.high = []  # min-heap

    def addNum(self, num: int) -> None:
        # Add number to max-heap (low)
        heapq.heappush(self.low, -num)
        
        # Balance: Ensure that all elements in low are <= all elements in high
        if self.low and self.high and (-self.low[0] > self.high[0]):
            # Move the root of low to high
            heapq.heappush(self.high, -heapq.heappop(self.low))
        
        # Balance the size of heaps (low and high differ by at most 1 element)
        if len(self.low) > len(self.high) + 1:
            heapq.heappush(self.high, -heapq.heappop(self.low))
        if len(self.high) > len(self.low):
            heapq.heappush(self.low, -heapq.heappop(self.high))

    def findMedian(self) -> float:
        # If the heaps are of equal size, the median is the average of the two roots
        if len(self.low) == len(self.high):
            return (-self.low[0] + self.high[0]) / 2.0
        # Otherwise, the median is the root of the heap with more elements
        return float(-self.low[0])

🧪 How It Works

  1. Initialization (__init__):
  • We initialize two heaps:
  • low as a max-heap (by pushing negative values).
  • high as a min-heap.
  1. Adding a Number (addNum):
  • We push the number into the low heap (as a negative value to simulate a max-heap).
  • We then check if the largest value in low is greater than the smallest value in high. If so, we move the largest value from low to high to maintain the order.
  • We then balance the sizes of the heaps to ensure that the size difference between low and high is at most 1.
  1. Finding the Median (findMedian):
  • If the heaps are of equal size, the median is the average of the roots of low and high.
  • If the heaps are not of equal size, the median is the root of the heap with more elements (which will always be low).

⏱️ Time and Space Complexity

MetricValueTime Complexity (addNum)O(log n), where n is the number of elements added so far. The time complexity arises from inserting a number into the heap and possibly balancing the heaps.Time Complexity (findMedian)O(1), because finding the median only requires accessing the roots of the heaps.Space ComplexityO(n), where n is the number of elements added to the data structure.

  • Time Complexity for addNum(num: int): Inserting into a heap takes O(log n) time, where n is the current size of the data stream.
  • Time Complexity for findMedian(): Accessing the root of the heaps is a constant-time operation, so O(1).
  • Space Complexity: The space complexity is O(n) due to the storage required for the heaps.

🔍 Edge Cases

  1. Single Number: If only one number has been added, the median is simply that number.
  • For example, after adding 1, the median is 1.
  1. Even Number of Elements: When there is an even number of elements, the median is the average of the two middle numbers.
  • For example, after adding 1, 2, the median is (1 + 2) / 2 = 1.5.
  1. Odd Number of Elements: When there is an odd number of elements, the median is the middle number.
  • For example, after adding 1, 2, 3, the median is 2.
  1. Balanced Data Stream: Even if numbers are added in increasing or decreasing order, the median calculation will still work correctly.

✅ Conclusion

LeetCode 295: Find Median from Data Stream challenges us to efficiently calculate the median from a continuous stream of data. Using two heaps—one max-heap and one min-heap—allows us to maintain a dynamic data stream and calculate the median in O(log n) time for each addition and O(1) time for finding the median. This approach ensures that the solution is both time-efficient and space-efficient, making it suitable for real-time data processing applications.


Comments

No comments yet

Add a new Comment

NUHMAN.COM

Information Technology website for Programming & Development, Web Design & UX/UI, Startups & Innovation, Gadgets & Consumer Tech, Cloud Computing & Enterprise Tech, Cybersecurity, Artificial Intelligence (AI) & Machine Learning (ML), Gaming Technology, Mobile Development, Tech News & Trends, Open Source & Linux, Data Science & Analytics

Categories

Tags

©{" "} Nuhmans.com . All Rights Reserved. Designed by{" "} HTML Codex