Skip to main content

Longest Increasing Subsequence - Part 2 - Hard

As promised, this is the follow up to the first posted earlier that there was a more efficient algorithm for finding the longest increasing subsequence, better than the O(n^2) DP solution we came up with.

It's not easy to understand, but once you get what the variables mean, it seems pretty straight forward. We'll be implementing another DP solution, similar to the previous one, except with an additional data structure that'll reduce the overall complexity to O(nlogn).

The wikipedia seems no help. The explanation is contrived and not easy to visualize. Hope this post will clear a lot of doubts. So we need 3 arrays for solving an instance of LIS.

M[i] - where M[i] is the minimum ending element of a increasing subsequence of length i. This is the key part of the new algorithm. Unlike the earlier solution where we kept the longest length discovered at index i , we store an Array which holds the minimum value of a subsequence of length i.

I[i] - where M[i] = A[I[i]] simply put, its the index of M[i] in the input Array. The length will be the same as M[i]. Both increase in size together. To define it, I[i] is the index of the minimum ending element of an increasing subsequence of length i. 

P[i] - where P[i] is the index of the previous member of the longest subsequence. It's for reconstructing the subsequence. You don't need this, if you're only interested in the length of the subsequence.

Initially, M[0] = 0 (or Integer.MIN_VALUE) and I[0] = -1 because a zero length subsequence is kind of a null set.

So the algorithm goes like this


  1.  For i = 0 to n - 1
  2. j = binarySearch(M, A[i]) #Such that M[j-1] < A[i] < M[j]
  3. M[j] = A[i]
  4. I[j] = i
  5. P[i] = I[j-1] #The index of the subsequence of the length j 
Very well, what does it all mean, though?

Ok, so lets understand the motivation behind the M[] array. Whenever we have a new element A[i], we try to "place it" at the end of  the longest increasing subsequence we have encountered yet without breaking the increasing property. But where do I place it?  Since A[i] is the last element we have seen so far, there is no other place but the end of the known subsequences. If you decide to insert A[i] into an earlier position, though the subsequence will be legal, it wouldn't be optimal.  The M[] array is basically keeping track of len(M) increasing subsequences at any time. So A[i] can be appended to one of them, the best one is the subsequence of length j where M[j-1] < A[i] < M[j].

Clearly, we can't extend the subsequence of length j since M[j] > A[i] (it wouldn't be increasing otherwise). We could insert A[i] into a subsequence of length k < j, but that wouldn't be the optimal one. So instead we're going to modify the existing subsequence of length j so that the last element is A[i] instead of a previously occurring element. At this point we're going to assign M[j] = A[i].  What this effectively means is that, we have discovered a increasing subsequence of length j that ends with A[i]. If we ever wanted to extend a subsequence ie, insert M[j+1] it has to be at least M[j] (ie A[i])

Notice when we're inserting or replacing elements in M[]  we are creating a sorted array. So we can run a binary search on the array. Here's the O(logn) component of the algorithm.

Assigning I[j] = i, is pretty straightforward, we simply remember the index of the new element in M[]. 

P[i] = I[j - 1], this requires an explanation. Remember the P[] array holds the index of the previous member of the increasing subsequence. So if I have just inserted/replaced the last element of a subsequence of length j, the previous element is the last element of the optimal increasing subsequence of length j - 1, which is conveniently I[j - 1]. Finally we simply backtrack the P array to obtain the subsequence in reverse order.


import random,bisect

R = list(set(map(lambda x: random.randint(100,999), xrange(10))))
random.shuffle(R)

def longest_increasing_subsequence(W):
    M = [0]
    I = [-1]
    P = []
    for i,v in enumerate(W):
        j = bisect.bisect(M,v)
        if j == len(M):
            M.append(v)
            I.append(i)
        else:
            M[j] = v
            I[j] = i
        P.append(I[j-1])
        
    lis = []
    index = I[-1]
    while index >= 0:
        lis.append(W[index])
        index = P[index]
    lis.reverse()
    return lis

print R
print longest_increasing_subsequence(R)
This Bunk powered by CodeBunk
Run
Full source code here.

Comments

Popular posts from this blog

Find Increasing Triplet Subsequence - Medium

Problem - Given an integer array A[1..n], find an instance of i,j,k where 0 < i < j < k <= n and A[i] < A[j] < A[k]. Let's start with the obvious solution, bruteforce every three element combination until we find a solution. Let's try to reduce this by searching left and right for each element, we search left for an element smaller than the current one, towards the right an element greater than the current one. When we find an element that has both such elements, we found the solution. The complexity of this solution is O(n^2). To reduce this even further, One can simply apply the longest increasing subsequence algorithm and break when we get an LIS of length 3. But the best algorithm that can find an LIS is O(nlogn) with O( n ) space . An O(nlogn) algorithm seems like an overkill! Can this be done in linear time? The Algorithm: We iterate over the array and keep track of two things. The minimum value iterated over (min) The minimum increa...

Merge k-sorted lists - Medium

Problem - Given k-sorted lists, merge them into a single sorted list. A daft way of doing this would be to copy all the list into a new array and sorting the new array. ie O(n log(n)) The naive method would be to simply perform k-way merge similar to the auxiliary method in Merge Sort. But that is reduces the problem to a minimum selection from a list of k-elements. The Complexity of this algorithm is an abysmal O(nk). Here's how it looks in Python. We maintain an additional array called Index[1..k] to maintain the head of each list. We improve upon this by optimizing the minimum selection process by using a familiar data structure, the Heap! Using a MinHeap, we extract the minimum element from a list and then push the next element from the same list into the heap, until all the list get exhausted. This reduces the Time complexity to O(nlogk) since for each element we perform O(logk) operations on the heap. An important implementation detail is we need to keep track ...

3SUM - Hard

Problem - Given an Array of integers, A. Find out if there exists a triple (i,j,k) such that A[i] + A[j] + A[k] == 0. The 3SUM  problem is very similar to the 2SUM  problem in many aspects. The solutions I'll be discussing are also very similar. I highly recommend you read the previous post first, since I'll explain only the differences in the algorithm from the previous post. Let's begin, We start with the naive algorithm. An O(n^3) solution with 3 nested loops each checking if the sum of the triple is 0. Since O(n^3) is the higher order term, we can sort the array in O(nlogn) and add a guard at the nested loops to prune of parts of the arrays. But the complexity still remains O(n^3). The code is pretty simple and similar to the naive algorithm of 2SUM. Moving on, we'll do the same thing we did in 2SUM, replace the inner-most linear search with a binary search. The Complexity now drops to O(n^2 logn) Now, the hash table method, this is strictly not ...