Median of Two Sorted Arrays
Problem Statement:
There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
You may assume nums1 and nums2 cannot be both empty.
Example 1:
nums1 = [1, 3] nums2 = [2] The median is 2.0
Example 2:
nums1 = [1, 2] nums2 = [3, 4] The median is (2 + 3)/2 = 2.5
Solution One: Merge Together
This is the most intuitive solution. We first merge the two sorted array together, then we return the median based on whether the merged array has even elements or odd elements. The only thing we need to take care is the special cases like empty array or one array run out of index during the loop.
The code is pretty straight forward… Since we only used several variables to store the intermediate results, so the space complexity is O(1).
Time Complexity: O(m+n)
Space Complexity: O(1)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Solution { | |
public: | |
double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) { | |
int pre = –1, current = –1; | |
int N1 = nums1.size(), N2 = nums2.size(); | |
int aStart = 0, bStart = 0; | |
int total = N1 + N2; | |
for(int i=0; i<total/2 + 1; i++) | |
{ | |
pre = current; | |
if(aStart < N1 && (bStart >= N2||nums1[aStart] < nums2[bStart])) | |
{ | |
current = nums1[aStart++]; | |
} | |
else | |
current = nums2[bStart++]; | |
} | |
if((total&1) == 0) | |
return (pre+current)/2.0; | |
else | |
return current; | |
} | |
}; |
Solution Two: K Element
Since we should guarantee O(log(m+n)) running time. The brute force can not meet the requirement. We we see the logarithm time complexity, we should immediately recognize that we should try to use divide and conquer. Let’s take a close look at this problem:
If we want to find the median of two sorted arrays, it’s a special case of finding the kth smallest element. Let’s assume we want to find the kth smallest element from two sorted arrays, for each iteration, we can get rid of about k/2 elements from the two arrays.
For example: find the 7th smallest number from the two sorted arrays
If we want to find the 7th smallest elements of the two arrays (array A and B), we only need to compare the A[k/2](A[3])(here k/2 is the k/2th element, not index) and B[k/2](B[3]), since A[3](8) > B[3](3), we can say that the first 3 elements from B can not be the 7th smallest element. In our case, they are 1, 2, 3. We get rid of them from B, and compare the rest of the elements:
More generally, if A[k/2] < B[k/2], then the k/2 elements from A cannot be the kth smallest element. A simple proof: if A[k/2] is indeed smaller than B[k/2], let’s assume that elements before B[k/2] are all smaller than A[k/2], which means that A[k/2] is at most (k/2 – 1)*2 + 1 = k – 1 smallest from the two sorted arrays. All elements from A before A[k/2] is smaller than A[k/2], so they must be impossible to be the k smallest element.
After we get rid of the first 3 elements from B, what will we do next?
We need to update k, which is 7 in this case, since we already get rid of 3 impossible cases, we need to update k = 7 – 3 = 4 now. Then we continue to compare the A[2](3) and B[2](5), since A[2] < B[2], we need to get rid of 1 and 3 from A and update k = 4 – 2 = 2 again.
Next we compare A[1] and B[1], since they are equal, we can safely get rid of one. Let’s delete 4 from B instead, and update k = k – 1 = 1.
Now we have k = 1, this should be the first smallest element from the remaining two arrays. We simply compare A[1] and B[1] again, we can find the 7th smallest element is 4 in A.
OK, everything seems OK now. There is still some special case we need to consider:
Since every time we divide the k by 2 and then we compare A[k/2] and B[k/2]. What if the length of A (or B) is smaller than k/2. Then we should compare the last element of A with the B[k/2].
In this case, since B[3] > A[2], we get rid of the whole A array. And then k = k – 2(length of A) = 5. Then we only need to return the 5th element in B, and that will be the 7th element in both arrays.
The most beautiful thing of this algorithm is we do not need to worry about the even or odd number of array elements. We implement this algorithm using recursion. For each recursion, we have to check whether the length of array is less than current k/2. If the length is smaller, we get rid of the smaller number of the length number elements from the array, else we get rid of k/2 elements from the array. After that, we update the k value and send the new arrays to new recursion. The termination of recursion is simple: either k == 1 or one of the array is empty.
In our case, k == (m + n + 1) /2. The time complexity should be O(log(k)), which means O(log(m+n)). As for the space complexity, since our recursion is tail recursion, so the space complexity is O(1).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Solution { | |
public: | |
double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) { | |
int N1 = nums1.size(), N2 = nums2.size(); | |
int total = N1 + N2; | |
int left = (total+1)/2; | |
int right = (total + 2) / 2; | |
//If the final array is odd, the median will calculate twice | |
return (getKth(nums1, 0, N1, nums2, 0, N2, left) + getKth(nums1, 0, N1, nums2, 0, N2, right))/2.0; | |
} | |
double getKth(vector<int> &nums1, int start1, int end1, vector<int> &nums2, int start2, int end2, int k) | |
{ | |
int len1 = end1 – start1, len2 = end2 – start2; | |
if(len1>len2) | |
return getKth(nums2, start2, end2, nums1, start1, end1, k); | |
if(len1 == 0) | |
return nums2[k + start2 –1]; | |
if(k == 1) return min(nums1[start1], nums2[start2]); | |
int index1 = start1 + min(len1, k/2) – 1; | |
int index2 = start2 + min(len2, k/2) – 1; | |
if(nums1[index1]>nums2[index2]) | |
return getKth(nums1, start1,end1,nums2, index2+1, end2, k – (index2 – start2 + 1)); | |
else | |
return getKth(nums1, index1+1, end1, nums2, start2, end2, k – (index1 – start1 + 1)); | |
} | |
}; |
Time Complexity: O(log(m+n))
Space Complexity: O(1)
Solution Three: Divide
Can we do better?
Another interesting approach involves division of the array. It’s a little bit more complex.
If an array with its length equals to m, we can always have m+1 positions to divide the array, see pic below:
We can divide array A and array B in i and j position, respectively, and divide the two arrays into both L and R parts.
If we combine both L parts of the two arrays and R parts of the two arrays together, we can see following characteristics:
- If the total length of A and B is even, if we can guarantee:
1. Number of elements from L part and R part are equal, which means:
i + j = m – i + n – j or j = (m + n ) / 2 – i
2. The maximum number of left part is less or equal than the minimum number of right part, max(A[i – 1], B[j – 1]) <= min(A[i], B[j]), then the median is :
(max(A[i – 1], B[j – 1]) + min(A[i], B[j])) / 2.0
- If the total length of A and B is odd, if we can guarantee:
1. Number of elements from L part is 1 greater than number of elements from R part:
i + j – 1 = m – i + n – j or j = (m + n + 1) /2 – i
2. The maximum number of left part is less or equal than the minimum number of right part, max(A[i – 1], B[j – 1]) <= min(A[i], B[j]), then the median is :
max(A[i – 1], B[j – 1])
OK, let’s take a close look at the j = (m + n + 1) /2 – i, basically, this condition works well for both even and odd. Since the int 1/2 is 0, so we can use this as the final formula. The only thing we need to guarantee is m <= n, because 0 <= i <= m, in order to guarantee that 0 <= j <= n, that m <= n.
OK, so far so good.
If we need to guarantee max(A[i – 1], B[j – 1]) <= min(A[i], B[j]), we already know that array A and B are sorted, so the A[ i – 1 ] < A[ i ] and B[ j – 1] < B[j] can be guaranteed. The only thing we need to guarantee is A[ i – 1 ] < B[j] and B[j – 1] < A[i]. We have two conditions:
- B[j – 1] > A[i], and in order to guarantee that i and j are not out of the bounds, then j!= 0 and i!=m.
We can see clearly from the above picture, if we are under this situation, we need to increase i and decrease j. Lucky for us, since i and j are bound together, when we increase i, j is automatically decreased.
- A[i – 1] > B[j], the same as above: i!=0 and j!=n.
At this situation, we need to decrease i and increase j. Easy enough.
Special cases related to boundaries:
- If i == 0 or j == 0, which means the division line is at the beginning of one array.
Then the maximum of left part is A[i-1] or B[j-1], while the minimum value from the right part is still min(A[i], B[j]). If A[i-1] > B[j] or B[j -1] > A[i], we still need to move i like we discussed before.
- If i == m or j == n, which means the division line is at the end of one array.
Then the minimum of right is A[i] or B[j], the maximum of left is still max(A[i – 1], B[j – 1]). If A[i-1] > B[j] or B[j -1] > A[i], we still need to move i like we discussed before.
Almost done!
When it comes to our code, we need to increase our i, what should we do? Should we move i one by one? No! We can always divide i by 2, and continue until we find the final answer.
The time complexity is log(min (m, n) ), since we binary search the shorter array.
Time Complexity: O(log(min( m, n )))
Space Complexity: O(1)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Solution { | |
public: | |
double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) { | |
int N1 = nums1.size(), N2 = nums2.size(); | |
if(N1 > N2) | |
return findMedianSortedArrays(nums2, nums1); | |
int iMin = 0, iMax = N1; | |
while(iMin <= iMax) | |
{ | |
int i = (iMin + iMax) / 2; | |
int j = (N1 + N2 +1)/2 – i; | |
if(i!=N1 && j!=0 && nums2[j-1] > nums1[i]) | |
iMin = i + 1; | |
else if(j!=N2 && i!=0 && nums1[i-1] > nums2[j]) | |
iMax = i – 1; | |
else | |
{ | |
int maxLeft = 0; | |
if(i == 0) { | |
maxLeft = nums2[j – 1]; | |
} | |
else if(j == 0) | |
{ | |
maxLeft = nums1[i – 1]; | |
} | |
else{ | |
maxLeft = max(nums1[i-1], nums2[j-1]); | |
} | |
if((N1 + N2) %2 == 1) return maxLeft; | |
int minRight = 0; | |
if(i == N1){ | |
minRight = nums2[j]; | |
} | |
else if(j == N2){ | |
minRight = nums1[i]; | |
} | |
else{ | |
minRight = min(nums1[i], nums2[j]); | |
} | |
return (maxLeft + minRight)/2.0; | |
} | |
} | |
return 0; | |
} | |
}; |
That’s all for this problem. Divide and conquer is truly powerful in algorithm design.