 # Beating Binary Search: The Interpolation Search

Interpolation search is an algorithm for finding a given key in a sorted array. While binary search always chooses the middle element for comparison, discarding one half of the search space or the other, interpolation search tries to predict where the key would lie in the search space via a linear interpolation, reducing the search space to the part before or after the estimated position if the key is not found there. This method will only work if calculations on difference between key values are sensible.

The estimated position of the key is calculated as follows:

Let the current search space be A[i…j] and we are looking for x, then by a linear interpolation of values, x should be found at position:

`                  p = i + ((j - i) * (x - A[i])) / (A[j] - A[i])`

Pseudocode:

Average case complexity of interpolation search is O(log log N) if the keys are uniformly distributed, where N is the number of keys. However, the worst case complexity is O(N) e.g., searching for 1000 in 1, 2, 3, …, 999, 1000, 109.

Take a look at C++ implementation.

## 7 thoughts on “Beating Binary Search: The Interpolation Search”

1. Anonymous says:

how do you calculate average and worst case complexity here?

• kartik kukreja says:

The worst case complexity is easy to calculate. The case when the keys increase exponentially forms the worst case for the algorithm and can take a linear number of comparisons, hence O(N) worst case complexity.
Average case complexities are always difficult to compute. Please refer to this paper for average complexity of interpolation search.

2. Koustav Chatterjee says:

p = i + ((j – i) * (x – A[i])) / (A[j] – A[i])

How do you arrive at this equation ?

• kartik kukreja says:

That’s a simple linear interpolation. [i..j] is the range of indices, [A[i]..A[j]] is the range of values. If we are looking for x, we should look at the index which is (x-A[i])/(A[j]-A[i]) fraction away from the index i. We get that index as i + (j-i)*(x-A[i])/(A[j]-A[i]).

3. swaggerjeevan says:

The worst case happens when A[r]-A[l] is huge generally or when the array elements increase exponentially.
The best case is when the search key is either on A[l] or A[r].
Is this true?

• Kartik Kukreja says:

When we are using interpolation search, we make an assumption about the input distribution. If the input satisfies that distribution, the search will be fast, otherwise not.

In this post, the assumption that we have used is that the numbers are roughly drawn from a linear distribution. The numbers being huge doesn’t have anything to do with this. As long as they are close to linear distribution, interpolation search will be very fast.

For this version of interpolation search that uses linear interpolation, an exponential distribution will be the worst case. If we knew that our input will have an exponential distribution, we could use exponential interpolation instead.

The best case is when the input is according to our assumption. For this version of the interpolation search, that means that the number we are looking for lies at the position we linearly interpolate from A[l] and A[r] i.e., we find the element in one lookup.

4. swaggerjeevan says:

That helped..thanks for the detailed explanation !