Wednesday, January 14, 2015

QuickSelect - Medium

Problem : Given an unsorted array, find the kth smallest element.

The problem is called the Selection problem. It's been intensively studied and has a couple of very interesting algorithms that do the job. I'll be  describing an algorithm called QuickSelect. The algorithm derives its name from QuickSort. You will probably recognise that most of the code directly borrows from QuickSort. The only difference being there is a single recursive call rather than 2 in QuickSort.

The naive solution is obvious, simply sort the array `O(nlogn)` and return the kth element. Infact, you can partially sort it and use the Selection sort to get the solution in `O(nk)`

An interesting side effect of finding the kth smallest element is you end up finding the k smallest elements. This also effectively gives you (n - k) largest elements in the array as well. These elements are not in any particular order though.

The version I'm using uses a random pivot selection, this part of the algorithm usually decides how fast it will be. Ideally you should pick the median, but calculating median of an array is basically a selection problem itself! (where k = n/2)

So we usually use the cheap route and pick a random pivot. We use the `partition` function from `quickSort`to separate the array into 2 subarrays. The left one will contain all elements less than or equal to the pivot value. The right one will have elements greater than pivot.

At this point we check if the k lies within [0,pivotIndex] or [pivotIndex,n].
We then recurse in that direction, ignoring the other subarray , it doesn't interest us anymore since it won't have our solution.

Complexity: O(nlogn) on average.

Theoretically, It is possible to do this in O(n). But the implementation hides huge constants and complexities which doesn't make it worth the effort.

Friday, August 29, 2014

Inversion Count - Medium

Given an Array A[1..n] of distinct integers find out how many instances are there where A[i] > A[j] where i < j. Such instances can be called Inversions.

Say, [100,20,10,30] the inversions are (100,20), (100, 10), (100,30), (20,10) and the count is of course 4

The question appears as an exercise in the  Chapter 4 of Introduction to Algorithms.  (3rd Edition)

As per our tradition we'll start with the worst algorithm and incrementally improve upon it. So on with it.

So the complexity of the algorithm is O(n^2). We can do this even faster using divide and conquer, more aptly if we use a variation of Merge Sort. To do this, unfortunately, we will have to mutate the array.

The key to the algorithm is to understand that in an ascending sorted array the inversion count is zero. Say [10,20,30]

If we perhaps have an element 100 before the rest of the array, the inversion count becomes 3. [100,10,20,30]. If we placed the 100 at the second position the inversion count would be 2, [10,100,20,30].

So here's what we're going to do, we write a recursive function which returns inversion count. The function first recursively does it for the first half and second half of the array, the byproduct of these recursive calls is that both half's become sorted. (Just like Merge Sort!) Now if you have two sorted Arrays which should be merged into one, we can piggy back on the merging process (of merge sort!) and count the inversions by merely counting the number of times an element from the first half was greater than the element in the second half.

An example, for those who found my explanation confusing.

Say, at some point of the recursion, we have the following
First half : [20,40,60,80] , Second half : [10,30,50,70]

If you recall the merge subroutine, we first compare 20 and 10, since 20 > 10, that counts as an inversion. But also note, since the first half is sorted therefore it is guaranteed that  [20, 40, 60, 80] will all be greater than 10. Since 20 had to be the minimum element of the first half. Therefore the inversion count for 10 in this instance becomes 4 (size of the unexplored first half). Important to note, this algorithm only works if the integers are all distinct.

Speaking generally, let's say i is the index for first half, j is the index for the second. Aux is the auxillary array to store the result. Then,

If A[i] > A[j] then    // implies A[j] < A[i…mid]
    invCount = mid - i + 1
    Aux[k++] = A[j++]
    Aux[k++] = A[i++]
End If

Now for the code, notice it's uncanny resemblance to merge sort.

And that's Inversion Count! :)

Friday, September 20, 2013

Shortest Interval in k-sorted list - Hard

Given k-sorted array, find the minimum interval such that there is at least one element from each array within the interval. Eg. [1,10,14],[2,5,10],[3,40,50]. Output : 1-3

To solve this problem, we perform a k-way merge as described here. At each point of 'popping' an element from an array. We keep track of the minimum and maximum head element (the first element) from all the k-lists. The minimum and maximum will obviously contain the rest of the header elements of the k-arrays. As we keep doing this, we find the smallest interval (max - min). That will be our solution. Here's a pictorial working of the algorithm.

And here's the Python code. Time Complexity : O(nlogk)