Skip to main content

The Missing Number - Easy

For my first post, let's start with an easy one. It's a popular question among interviewers and you prolly have heard about this one. Never the less, let's solve it, for old times sake. :)

Problem : Given an Array of integers of size N. The array contains all the numbers from 1 to N in arbitrary order. But one number (say X) has been replaced with 0. Find the missing number in the most efficient way possible.

Solution:

Before we get to the solution, we'll build a test case using the following Python Snippet.

import random
N = 10
arr = range(1,N+1) #initialize arr with [1,2,3,4...,N]
random.shuffle(arr) #randomize order
r = random.randint(0,N) #pick random r between [0,N)
arr[r] = 0 # set random element to 0
The Naive method is an O(n^2) algorithm, where we perform a linear search (ie O(n) ) for each element 1-N.
def missing_number_naive(arr):
for v in range(1,N+1): # for each v in [1,N]
if v not in arr: # is v present in arr?
return v
return None
For small values of N, this is perfectly fine but can we do better?

Of course, with a small bit of help from 8th grade maths. Hopefully, you remember that,
1+2+3....+n = n (n+1)/2
So how does that help? Say we didn't have any missing number and the problem was simply to sum all the elements of the array? Since we know the array contains all the values of the range [1,N] we know the sum will be N * (N + 1) /2 

If we replaced one arbitrary element with 0, the expected sum will be (N * (N + 1) / 2) - X  
But we can calculate the Expected sum by iterating through the array and adding all the elements, so we finally have
X = (N * (N+1) / 2) - sum(array)
Here's the Python solution
def missing_number_linear(arr):
return (N * (N+1) / 2) - sum(arr)
Now, the inbuilt sum() function in Python is linear.

This solution is perfectly fine for Python, but if we translate this to an High Level Language (HLL) like Java or C, you'll end up with this.
long sum = 0;
for(int i = 0 ; i < N; i++)
sum += arr[i];
return N * (N + 1) / 2 - sum;
Can you spot the problem with the above snippet? How will you solve it? Answer during the next update.

Update: Python supports arbitrary precision numbers by default, unlike Java. Therefore the above code might fail for Large values of N. More specifically N > sqrt(Integer.MAX_VALUE). How do we solve it? Use a bigger datatype (long,BigInteger).

Alternate Solutions:

Using a Hash Table/Bit Array: This solution is quite obvious. Have a Bit Array of size N, Iterate over the array and mark true the index of the bit array for each value in the array. Find the only element that is false.
Note: If there were multiple numbers missing, this will find all of them. So this is probably the best method for multiple missing numbers. We'll discuss other methods in the future.
Space Complexity : O(n/8), Time : O(n)

The XOR method : This is pretty smart method and doesn't even have the possibility of Integer Overflows. The idea is we can determine (1^2^3....^N) in O(1). (see below) Say P = (1 ^ 2 ^ .... ^ N), we then Xor all the values in the Array which will be Q = (1^2^...(X-1)^(X+1)^....N)
Our missing X = P ^ Q

To find the the Xor result of the range 1 to N, we observe the following pattern.

N = 0, P= 0
1,1
2,3
3,0
4,4
.....
84,84
85,1
86,87
87,0
88,88


The pattern repeats with a period of 4,

  1. When N is divisible by 4, the Xored value is N itself
  2. When N % 4 is 1, the Xored value is 1
  3. When N % 4 is 2, the Xored value is N+1
  4. When N % 4 is 3, the Xored value is 0
Here's a small python snippet for the same. Time Complexity : O(n)
def missing_number_xor(arr):
Q = [N,1,N+1,0][N%4]
P = 0
for v in arr:
P = P ^ v
return P ^ Q

You can find the entire source code here.

Comments

Popular posts from this blog

Find Increasing Triplet Subsequence - Medium

Problem - Given an integer array A[1..n], find an instance of i,j,k where 0 < i < j < k <= n and A[i] < A[j] < A[k]. Let's start with the obvious solution, bruteforce every three element combination until we find a solution. Let's try to reduce this by searching left and right for each element, we search left for an element smaller than the current one, towards the right an element greater than the current one. When we find an element that has both such elements, we found the solution. The complexity of this solution is O(n^2). To reduce this even further, One can simply apply the longest increasing subsequence algorithm and break when we get an LIS of length 3. But the best algorithm that can find an LIS is O(nlogn) with O( n ) space . An O(nlogn) algorithm seems like an overkill! Can this be done in linear time? The Algorithm: We iterate over the array and keep track of two things. The minimum value iterated over (min) The minimum increa...

Dijkstra's algorithm - Part 1 - Tutorial

This will be a 3 Part series of posts where I will be implementing the Dijkstra's Shortest Path algorithm in Python. The three parts will be 1) Representing the Graph 2) Priority Queue 3) The Algorithm To represent a graph we'll be using an  Adjacency List . Another alternative is using an Adjacency Matrix, but for a sparse graph an Adjacency List is more efficient. Adjacency List An Adjacency List is usually represented as a HashTable (or an Array) where an entry at `u` contains a Linked List. The Linked List contains `v` and optionally another parameter `w`. Here `u` and `v` are node(or vertex) labels and `w` is the weight of the edge. By Traversing the linked list we obtain the immediate neighbours of `u`. Visually, it looks like this. For implementing this in Python, we'll be using the dict()  for the main HashTable. For the Linked List we can use a list of 2 sized tuples (v,w).  Sidenote: Instead of a list of tuples, you can use a dict(), ...

Find the Quadruplets - Hard

Problem - Given 4 arrays A,B,C,D. Find out if there exists an instance where A[i] + B[j] + C[k] + D[l] = 0 Like the Find the Triple problem, we're going to develop 4 algorithms to solve this. Starting with the naive O(n^4) solution. Then we proceed to eliminate the inner-most loop with a Binary Search, reducing the complexity to O(n^3 logn) Now, we replace the last 2 loops with the left-right traversal we did in the previous 3 posts. Now the complexity is O(n^3). Finally, we reduce the complexity to O(n^2 logn) at the cost of O(n^2) Space Complexity. We store every combination of A[i] + B[j] and store it in AB[]. Similarly we make CD[] out of C[i] + D[j]. So, AB = A x B CD = C x D We then sort AB and CD (which costs O(n^2 log(n^2)) ~ O(n^2 logn) ) and then run a left-right linear Algorithm on AB and CD. (Note : Their size is of the order O(n^2)) So the overall complexity is due to sorting the large array of size n^2. which is O(n^2 logn).