So here’s an interesting problem: Given an NxM integer matrix in which each row is sorted, find the overall median of the matrix assuming N*M is odd. For example,
It’s interesting because there are some obvious solutions which are sub-optimal and the optimal solution is non-obvious. There are a few obvious solutions:
- Copy the matrix values in an array, sort and return the middle element – Space complexity: O(N*M), Time complexity: O(N*M*log NM)
- Copy the matrix values in an array and use selection algorithm to find the median – Space complexity: O(N*M), Time complexity: O(N*M)
- Modify the sort procedure/selection algorithm to work with a matrix instead of a 1-D array (a matrix is actually stored in row-major fashion as a 1-D array) – This wouldn’t require any extra memory but the time complexity would still be O(N*M*log NM) with sorting and O(N*M) with selection algorithm.
All of these solutions are solving a more general problem by throwing away the information that each row in the matrix is sorted. We should be able to reduce the time complexity by using this information.
Consider this insight: Given an integer x, we can count the number of matrix elements ≤ x in O(N*log M), by simply doing a binary search on each row. x will be the median if x is an element of the matrix and number of matrix elements ≤ x equals 1 + N*M/2. Let mn and mx be the minimum and maximum elements of the matrix respectively, then a boolean function
is monotonic in x. The smallest x for which it becomes positive is the median because number of elements ≤ x would equal 1 + N*M/2 (would be greater if x is repeated but that’s not a problem) and since f(x) became true starting at x, x must be an element of the matrix.
Let’s work with the same example.
|x||count(elements ≤ x)||f(x) = count >= 5|
x = 8 is the inflection point at which f(x) first becomes positive and is the median.
It’s rare to find a solution with a binary search within a binary search. We can run a discrete binary search on the range [mn, mx] to find the first x for which f(x) becomes true. The overall time complexity is O(N*log M * log mx-mn) = O(32*N* log M) since an integer is at most 32 bits.
Here’s the solution: