Skip to main content

Inversion Count - Medium

Given an Array A[1..n] of distinct integers find out how many instances are there where A[i] > A[j] where i < j. Such instances can be called Inversions.

Say, [100,20,10,30] the inversions are (100,20), (100, 10), (100,30), (20,10) and the count is of course 4

The question appears as an exercise in the  Chapter 4 of Introduction to Algorithms.  (3rd Edition)

As per our tradition we'll start with the worst algorithm and incrementally improve upon it. So on with it.
def inversionCount_1(A):
c = 0
for i in xrange(len(A)):
for j in xrange(i + 1,len(A)):
if A[i] > A[j]:
c += 1
return c

So the complexity of the algorithm is O(n^2). We can do this even faster using divide and conquer, more aptly if we use a variation of Merge Sort. To do this, unfortunately, we will have to mutate the array.

The key to the algorithm is to understand that in an ascending sorted array the inversion count is zero. Say [10,20,30]

If we perhaps have an element 100 before the rest of the array, the inversion count becomes 3. [100,10,20,30]. If we placed the 100 at the second position the inversion count would be 2, [10,100,20,30].

So here's what we're going to do, we write a recursive function which returns inversion count. The function first recursively does it for the first half and second half of the array, the byproduct of these recursive calls is that both half's become sorted. (Just like Merge Sort!) Now if you have two sorted Arrays which should be merged into one, we can piggy back on the merging process (of merge sort!) and count the inversions by merely counting the number of times an element from the first half was greater than the element in the second half.

An example, for those who found my explanation confusing.

Say, at some point of the recursion, we have the following
First half : [20,40,60,80] , Second half : [10,30,50,70]

If you recall the merge subroutine, we first compare 20 and 10, since 20 > 10, that counts as an inversion. But also note, since the first half is sorted therefore it is guaranteed that  [20, 40, 60, 80] will all be greater than 10. Since 20 had to be the minimum element of the first half. Therefore the inversion count for 10 in this instance becomes 4 (size of the unexplored first half). Important to note, this algorithm only works if the integers are all distinct.

Speaking generally, let's say i is the index for first half, j is the index for the second. Aux is the auxillary array to store the result. Then,

If A[i] > A[j] then    // implies A[j] < A[i…mid]
    invCount = mid - i + 1
    Aux[k++] = A[j++]
Else
    Aux[k++] = A[i++]
End If

Now for the code, notice it's uncanny resemblance to merge sort.

def inversionCount_2(A):
def inversionCount_2(A,l,r):
if l >= r: return 0
m = (l + r) / 2
# recurse and store inversion counts for the halfs
invCount = inversionCount_2(A,l,m) + inversionCount_2(A,m + 1,r)
merged = []
i,j = l,m + 1
while i <= m and j <= r:
if A[i] < A[j]:
merged.append(A[i])
i += 1
else:
invCount += m - i + 1 #A[j] < A[i..m]
merged.append(A[j])
j += 1
merged.extend(A[i:m + 1]) # rest of first half
merged.extend(A[j:r + 1]) # rest of the second half
for i in xrange(l,r + 1): #copy from aux to Arr
A[i] = merged[i - l]
return invCount
return inversionCount_2(A,0,len(A) - 1)
And that's Inversion Count! :)

Comments

Popular posts from this blog

Find Increasing Triplet Subsequence - Medium

Problem - Given an integer array A[1..n], find an instance of i,j,k where 0 < i < j < k <= n and A[i] < A[j] < A[k]. Let's start with the obvious solution, bruteforce every three element combination until we find a solution. Let's try to reduce this by searching left and right for each element, we search left for an element smaller than the current one, towards the right an element greater than the current one. When we find an element that has both such elements, we found the solution. The complexity of this solution is O(n^2). To reduce this even further, One can simply apply the longest increasing subsequence algorithm and break when we get an LIS of length 3. But the best algorithm that can find an LIS is O(nlogn) with O( n ) space . An O(nlogn) algorithm seems like an overkill! Can this be done in linear time? The Algorithm: We iterate over the array and keep track of two things. The minimum value iterated over (min) The minimum increa...

Dijkstra's algorithm - Part 1 - Tutorial

This will be a 3 Part series of posts where I will be implementing the Dijkstra's Shortest Path algorithm in Python. The three parts will be 1) Representing the Graph 2) Priority Queue 3) The Algorithm To represent a graph we'll be using an  Adjacency List . Another alternative is using an Adjacency Matrix, but for a sparse graph an Adjacency List is more efficient. Adjacency List An Adjacency List is usually represented as a HashTable (or an Array) where an entry at `u` contains a Linked List. The Linked List contains `v` and optionally another parameter `w`. Here `u` and `v` are node(or vertex) labels and `w` is the weight of the edge. By Traversing the linked list we obtain the immediate neighbours of `u`. Visually, it looks like this. For implementing this in Python, we'll be using the dict()  for the main HashTable. For the Linked List we can use a list of 2 sized tuples (v,w).  Sidenote: Instead of a list of tuples, you can use a dict(), ...

Find the Quadruplets - Hard

Problem - Given 4 arrays A,B,C,D. Find out if there exists an instance where A[i] + B[j] + C[k] + D[l] = 0 Like the Find the Triple problem, we're going to develop 4 algorithms to solve this. Starting with the naive O(n^4) solution. Then we proceed to eliminate the inner-most loop with a Binary Search, reducing the complexity to O(n^3 logn) Now, we replace the last 2 loops with the left-right traversal we did in the previous 3 posts. Now the complexity is O(n^3). Finally, we reduce the complexity to O(n^2 logn) at the cost of O(n^2) Space Complexity. We store every combination of A[i] + B[j] and store it in AB[]. Similarly we make CD[] out of C[i] + D[j]. So, AB = A x B CD = C x D We then sort AB and CD (which costs O(n^2 log(n^2)) ~ O(n^2 logn) ) and then run a left-right linear Algorithm on AB and CD. (Note : Their size is of the order O(n^2)) So the overall complexity is due to sorting the large array of size n^2. which is O(n^2 logn).