I came across this incredibly interesting problem that took me quite a while to solve: *implementing a queue that supports the minimum operation*, i.e., aside from the normal queue operations (enqueue, dequeue), it should also support a minimum operation, whereby a client could ask for the minimum element in the queue at any point. Before someone jumps in and says well that’s just a min priority queue, no, there’s a difference. Priority queues don’t return elements in the order in which they were inserted but in the order of their priority. This queue has to return elements in the order in which they were inserted and also support querying for the current minimum element in the queue.

Now, I had to solve this problem for an online judge so I just had to return the right sequence of outputs for the given sequence of inputs i.e., I could read and preprocess the entire input before returning any output. This led me to go off in the wrong direction with this approach:

- Simulate all enqueue/dequeue operations on a vector by just maintaining the queue boundaries and actually keeping the deleted elements in their place
- Mark the ranges for all the minimum queries and maintain them in a queue
- Preprocess the vector by building a segment tree for answering min queries in a range
- Process all min queries in the queue and write output

The expected time complexity was O(N) and this approach has O(N log N) time complexity but an online judge would have a hard time distinguishing them so I expected this solution to pass. But this is not the right approach and if I’m implementing a data structure, my clients would want results for their queries when they ask them, not after I’ve seen all the queries.

Another approach is to use a normal queue and a min indexed priority queue. The normal queue would store the elements and indices of those elements in the indexed priority queue, so that it can delete them in order. The min operation would take O(1) but enqueue and dequeue operations would now have O(log N) complexity. It’s a reasonable solution and might be perfectly acceptable – it answers the min queries when a client asks them, not after it has seen the entire input. But there exists a better solution.

It was really difficult to come up with the right solution and I don’t think it’s possible to come up with it in an interview setting without getting any hints so I can’t really describe the thought process but let’s go through the hints.

**Hint 1: How would we implement a stack that supports the min operation?**

Well, that’s easy. All elements in a stack are pushed and popped off the same end. We can just store pairs of elements (the pushed element and the current minimum) on the stack. When we want to push a new element on the stack, we need only look at the top pair to get the minimum of all the elements in the stack and calculate the new minimum. A query for minimum also need only look at the top pair. Here’s the simple implementation:

template <typename T> | |

class StackWithMin { | |

private: | |

stack< pair<T, T> > S; | |

public: | |

void push(T& x) { | |

S.push(pair<T, T>(x, S.empty() ? x : min(x, S.top().second))); | |

} | |

T pop() { | |

if (S.empty()) | |

throw "stack empty"; | |

pair<T, T> top = S.top(); S.pop(); | |

return top.first; | |

} | |

T getMin() { | |

if (S.empty()) | |

throw "stack empty"; | |

return S.top().second; | |

} | |

int size() { | |

return S.size(); | |

} | |

bool empty() { | |

return S.empty(); | |

} | |

}; |

But it’s unclear how that helps us. Storing the current minimum worked with stacks because elements enter and leave through the same end. The same approach does not seem to extend to queues, well, until you see hint 2.

**Hint 2: How would we implement a queue with two stacks?**

Wow! I never thought there could be any legitimate purpose in implementing a queue with two stacks except for it being a decade-old interview question. So how do we implement a queue with two stacks? We use one stack, say S1, for answering dequeue requests and the other, say S2, for enqueue requests. When S1 becomes empty, we pop all the elements off S2 and push them into S1. Using two stacks in this way gives us FIFO behavior. Here’s a simple implementation:

template <typename T> | |

class QueueWithTwoStacks { | |

private: | |

stack<T> S1, S2; | |

public: | |

void enqueue(T& x) { | |

S2.push(x); | |

} | |

T dequeue() { | |

if (S1.empty()) { | |

while (!S2.empty()) { | |

T top = S2.top(); S2.pop(); | |

S1.push(top); | |

} | |

} | |

if (S1.empty()) | |

throw "queue empty"; | |

T top = S1.top(); S1.pop(); | |

return top; | |

} | |

int size() { | |

return S1.size() + S2.size(); | |

} | |

bool empty() { | |

return S1.empty() && S2.empty(); | |

} | |

}; |

Note that this doesn’t have worst-case O(1) complexity for each dequeue operation. It offers an amortized O(1) time complexity for each dequeue operation, which means that for a sequence of N enqueue/dequeue operations, the total complexity is O(N), which is still pretty good.

It should be pretty clear now how we can combine these two hints to create a queue that supports min operation. We can implement the queue using two stacks, with each stack supporting min operation. The overall minimum for the queue is just the minimum of the two stacks. Here’s a simple implementation:

template <typename T> | |

class QueueWithMin { | |

private: | |

stack< pair<T, T> > S1, S2; | |

public: | |

void enqueue(T& x) { | |

S2.push(pair<T, T>(x, S2.empty() ? x : min(x, S2.top().second))); | |

} | |

T dequeue() { | |

if (S1.empty()) { | |

while (!S2.empty()) { | |

pair<T, T> top = S2.top(); S2.pop(); | |

top.second = S1.empty() ? top.first : min(top.first, S1.top().second); | |

S1.push(top); | |

} | |

} | |

if (S1.empty()) | |

throw "empty queue"; | |

pair<T, T> top = S1.top(); S1.pop(); | |

return top.first; | |

} | |

T getMin() { | |

if (empty()) | |

throw "empty queue"; | |

return S1.empty() ? S2.top().second : (S2.empty() ? S1.top().second : min(S1.top().second, S2.top().second)); | |

} | |

bool empty() { | |

return S1.empty() && S2.empty(); | |

} | |

int size() { | |

return S1.size() + S2.size(); | |

} | |

}; |

Pretty cool, huh! Enqueue and min operations have worst case O(1) time complexity while dequeue has amortized O(1) complexity.

A friend of mine described another solution for this problem, which is much easier to discover on your own. Remember how we calculated the largest rectangle in a histogram using a stack? Before pushing a bar on the stack, we got rid of all the larger bars. That’s exactly what we need here. Effectively the auxiliary data structure maintains a non-decreasing subsequence and the smallest element is always at the bottom. When the dequeued element is the same as the minimum element, we’ll have to remove the bottom element (this is why the sequence is non-decreasing; we need the duplicates). We can’t use a stack as the auxiliary data structure because we need to delete from both ends but that’s what a deque is for. Here’s a simple implementation:

template <typename T> | |

class QueueWithMin { | |

private: | |

queue<T> Q; | |

deque<T> D; | |

public: | |

void enqueue(T& x) { | |

Q.push(x); | |

while (!D.empty() && D.back() > x) | |

D.pop_back(); | |

D.push_back(x); | |

} | |

T dequeue() { | |

if (Q.empty()) | |

throw "queue empty"; | |

if (D.front() == Q.front()) | |

D.pop_front(); | |

T top = Q.front(); Q.pop(); | |

return top; | |

} | |

T getMin() { | |

return D.front(); | |

} | |

int size() { | |

return Q.size(); | |

} | |

bool empty() { | |

return Q.empty(); | |

} | |

}; |

In this case, dequeue and min operations have worst case O(1) time complexity while enqueue has amortized O(1) complexity. Both these solutions are pretty similar.

]]>Problem of the day for today is k^{th} permutation: *Given numbers n and k, 1 <= k < INT_MAX, return k ^{th} permutation of the set [1,2,…,n]*. For example, given n=3 and k=4, the permutations of [1,2,3] in order are:

- “123”
- “132”
- “213”
- “231”
- “312”
- “321”

k=4^{th} permutation is “231”. To simplify the output, a string concatenation of the numbers is returned.

How should we think about this problem? Some observations:

- The set [1,2,..,n] has n! permutations.
- There are (n-1)! permutations where 1 is in the first place. When we swap 1 and 2, we skip those (n-1)! permutations. More generally, if we swap the first element with x
^{th}element, we skip (x-1)*(n-1)! permutations.

These two observations are enough to solve this problem. If we divide k by (n-1)!, we’ll get the element we need to swap the first element with. That element will be placed in its correct position in the k^{th} permutation and we can recurse on the remaining elements for k % (n-1)!^{th} permutation.

Now there are some challenges in implementation:

- Calculating n! for all n? We only care about those factorials which are less than INT_MAX, which is true only for n <= 12. So we don’t need to worry about large factorials. We can compute them on the fly.
- How to maintain the set of numbers? We need to find the x
^{th}element from the start (suggesting vector) but also maintain the smaller set in sorted order (suggesting set). A valid trade-off is to use vector and pay linear cost for removing element in the middle or shuffling elements around.

Here’s the recursive solution:

int factorial(int n) { | |

if (n > 12) return INT_MAX; | |

int f = 1; | |

for (int i = 2; i <= n; i++) | |

f *= i; | |

return f; | |

} | |

void getPermutationHelper(vector<int>& num, int k, stringstream& ss) { | |

if (num.size() == 0) return; | |

int f = factorial(num.size() - 1); | |

int incr = k / f; | |

ss << to_string(num[incr]); | |

num.erase(num.begin() + incr); | |

getPermutationHelper(num, k % f, ss); | |

} | |

string getPermutation(int n, int k) { | |

k--; // 0-based indexing | |

if (k >= factorial(n)) return ""; // error | |

vector<int> num(n); | |

for (int i = 0; i < n; ++i) | |

num[i] = i+1; | |

stringstream ss; | |

getPermutationHelper(num, k, ss); | |

return ss.str(); | |

} |

Tidbits:

- Concatenating n strings using string concatenation would be O(cn
^{2}) operation, where c is the max component string size. Using stringstream instead reduced it to O(cn). - Passing stringstream as an argument to the recursive function made it tail recursive i.e., more space efficient if the compiler implements it properly (we use a single stack frame instead of n frames).

It’s equally easy to implement it iteratively:

int factorial(int n) { | |

if (n > 12) return INT_MAX; | |

int f = 1; | |

for (int i = 2; i <= n; i++) | |

f *= i; | |

return f; | |

} | |

string getPermutation(int n, int k) { | |

k--; // 0-based indexing | |

if (k >= factorial(n)) return ""; // error | |

vector<int> num(n); | |

for (int i = 0; i < n; i++) | |

num[i] = i+1; | |

stringstream perm; | |

for (int i = 0; i < n; i++) { | |

int fact = factorial(n-i-1); | |

int incr = k / fact; | |

int t = num[i+incr]; | |

for (int j = i+incr; j > i; j--) | |

num[j] = num[j-1]; | |

num[i] = t; | |

k %= fact; | |

perm << to_string(num[i]); | |

} | |

return perm.str(); | |

} |

This time I’m shuffling elements around instead of erasing an element. Both of these operations have the same time complexity. The overall complexity in both cases in O(n^{2} + cn). Interestingly, interviewbit classifies this problem under backtracking, but we are not doing any backtracking. We directly hone in on the solution.

Problem of the day for today is Largest rectangle in a histogram: *Given n non-negative integers representing the histogram’s bar heights where the width of each bar is 1, find the area of largest rectangle in the histogram.*

In this example, we are given 7 heights [6, 2, 5, 4, 5, 1, 6] and can see that the area of the largest rectangle is 3*4 = 12.

If you want to try out this problem before looking at the solution, you can do that at the link provided above. I’ll try to explain my thought process for coming up with a solution.

The first insight is to determine which rectangles even need to be considered for the solution: those which cover a contiguous range of the input histogram (their width is integer – no point in covering half a bar) and whose height equals the minimum bar height in the range (rectangle height cannot exceed the minimum bar height in the range and there’s no point in considering a height less than the minimum height because we can just increase the height to the minimum height in the range and get a better solution). This greatly constrains the set of rectangles we need to consider. Formally, we need to consider only those rectangles with width = j-i+1 (0≤i≤j<n) and height = min(height[i..j]).

At this point, we can directly implement this solution. There are only n^2 choices for i and j. If we naively calculate the minimum height in the range [i..j], this will have time complexity O(n^3). Instead, we can keep track of the minimum height in the inner loop for j, leading to the following implementation with O(n^2) time complexity and O(1) auxiliary space complexity:

int largestRectangleArea(vector<int> &A) { | |

int maxArea = 0; | |

for (int i = 0; i < A.size(); i++) { | |

for (int j = i, mn = A[i]; j < A.size(); j++) { | |

mn = min(mn, A[j]); | |

maxArea = max(maxArea, (j-i+1) * mn); | |

} | |

} | |

return maxArea; | |

} |

We are still doing a lot of repeated work by considering all n^2 rectangles. There are only n possible heights. For each position j, we need to consider only 1 rectangle: the one with height = height[j] and width = k-i+1, where 0≤i≤j≤k<n, height[i..k] ≥ height[j], height[i-1] < height[j] and height[k+1] < height[j]. In the above example, for index 3, we need only consider the rectangle with height = 4 and width = 3 ([5, 4, 5]).

Put simply, we need to grow the rectangle from j towards left and right while the bar heights are greater than or equal to height[j]. If we do this naively, we’ll still end up with O(n^2) time complexity. If for any index j, we could find, in better than linear complexity, the first indices i (while going left from j) and k (while going right from j) where the bar height becomes smaller than height[j], we could solve this problem in complexity better than O(n^2).

There’s a trick to do this with a stack. We can precompute the left end of the rectangle situated at position i and having height = height[i] as follows:

vector posLeft(n); stack S; for (int i = 0; i < n; i++) { while (!S.empty() && height[S.top()] ≥ height[i]) S.pop(); posLeft[i] = S.empty() ? 0 : S.top() + 1; S.push(i); }

Each index is pushed onto the stack and popped off the stack only once, so the overall time complexity of this procedure is O(n). To get an intuition for what it is doing, observe that each bar gets rid of the higher bars on the left of it so that no bar on the right of it need to consider those indices.

Similarly, we can precompute the right end of the rectangle for each position i. This gives us the following implementation with O(n) space and time complexity:

int largestRectangleArea(vector<int> &A) { | |

int n = A.size(); | |

vector<int> posLeft(n); | |

stack<int> S; | |

for (int i = 0; i < n; i++) { | |

while (!S.empty() && A[S.top()] >= A[i]) | |

S.pop(); | |

posLeft[i] = S.empty() ? 0 : S.top() + 1; | |

S.push(i); | |

} | |

vector<int> posRight(n); | |

while (!S.empty()) S.pop(); | |

for (int i = n-1; i >= 0; i--) { | |

while (!S.empty() && A[S.top()] >= A[i]) | |

S.pop(); | |

posRight[i] = S.empty() ? n-1 : S.top() - 1; | |

S.push(i); | |

} | |

int maxArea = 0; | |

for (int i = 0; i < n; i++) | |

maxArea = max(maxArea, (posRight[i] - posLeft[i] + 1) * A[i]); | |

return maxArea; | |

} |

This should be a perfectly acceptable solution but we can do further optimizations. Observe that we don’t need to precompute and store both left and right ends. If we precompute the right ends, we can iterate from the left and compute the areas while computing the left ends (or we can do it the other way around).

int largestRectangleArea(vector<int> &A) { | |

int n = A.size(); | |

vector<int> posRight(n); | |

stack<int> S; | |

for (int i = n-1; i >= 0; i--) { | |

while (!S.empty() && A[S.top()] >= A[i]) | |

S.pop(); | |

posRight[i] = S.empty() ? n-1 : S.top() - 1; | |

S.push(i); | |

} | |

while (!S.empty()) S.pop(); | |

int maxArea = 0; | |

for (int i = 0; i < n; i++) { | |

while (!S.empty() && A[S.top()] >= A[i]) | |

S.pop(); | |

long long area = A[i] * (posRight[i] + 1 - (S.empty() ? 0 : S.top() + 1)); | |

if (area > maxArea) | |

maxArea = area; | |

S.push(i); | |

} | |

return maxArea; | |

} |

Oh but there’s still more optimizations that we can do. Observe when an index gets popped off the stack. This happens because we find the first index (on the right of it) which has a smaller height. This already gives us the right end of the rectangle. So we don’t need to precompute either the left or the right ends of the rectangle. Unfortunately, the space and time complexity is still O(n) but the solution is much prettier!

int largestRectangleArea(vector<int> &A) { | |

A.push_back(0); // sentinel to ensure all indices get popped off the stack | |

stack<int> S; | |

int maxArea = 0; | |

for (int i = 0; i < A.size(); i++) { | |

while (!S.empty() && A[S.top()] >= A[i]) { | |

int height = A[S.top()]; | |

S.pop(); | |

int left = S.empty() ? 0 : S.top() + 1, right = i - 1; | |

maxArea = max(maxArea, (right - left + 1) * height); | |

} | |

S.push(i); | |

} | |

return maxArea; | |

} |

Today I was looking at problems related to linked lists. They tend to be asked quite a lot in interviews but some of them look artificial with their complexity arising from linked lists being a poor data structure for the problem (e.g., median of a sorted linked list). The problem of the day for today is Detecting cycle in a linked list: *Given a linked list, return the node where the cycle begins or null if there is no cycle.* For example, in the image below, 4 should be returned.

I don’t want to just provide the solution but think through how an interviewer will describe the problem and how an interviewee may approach it. Typically, interviewers fist describe a part of the problem and then keep adding to it. For this problem, an interviewer may start by just asking to check if there’s a cycle in a linked list or not and after that’s done, ask for the node where the cycle begins.

The obvious solution that an interviewee can be expected to come up with would use a hashset to store visited nodes and declare a cycle if it visits a previously visited node. Incidentally, it also solves the second part of the problem because the first node that is visited again is where the cycle begins. Here’s the simple implementation:

/** | |

* Definition for singly-linked list. | |

* struct ListNode { | |

* int val; | |

* ListNode *next; | |

* ListNode(int x) : val(x), next(NULL) {} | |

* }; | |

*/ | |

ListNode* detectCycle(ListNode* A) { | |

unordered_set<ListNode*> seen; | |

for (ListNode* i = A; i != NULL; i = i->next) { | |

if (seen.find(i) != seen.end()) | |

return i; | |

seen.insert(i); | |

} | |

return NULL; | |

} |

For a linked list of N nodes, it has O(N) space and time complexity – pretty good, but now the interviewer would ask for a solution with O(1) auxiliary space complexity. At this point, the interviewee may require a hint or two but for the first part of the problem, (s)he can be expected to come up with the following approach: using two pointers – *slow *advanced one position and *fast *advanced two positions in each step and checking if they ever meet. It’s not that hard to come up with this approach for someone who hasn’t seen this before. If there is a cycle, both slow and fast pointers will enter it and while the fast pointer is moving away from the slow pointer at the speed of 1 position/step in one direction, it’s moving towards it at the same speed in the opposite direction so they are guaranteed to meet. Once both the pointers enter the cycle (of maximum length N), they have to meet after at most N-1 more steps, resulting in O(N) time complexity.

Now for the second part of the problem – finding the node where the cycle begins, this is a little tricky. Someone who has seen this problem before would know that this is solved by a trick in the Floyd’s cycle-finding algorithm. At this point, the interviewee may require a hint or two and can be expected to come up with this solution: after detecting a cycle using the above two-pointer approach, determine the length of the cycle, say k, and use two pointers – one at the start and another k positions ahead of start, with both being advanced 1 position/step. They will meet at the node at the beginning of the cycle. This still has O(N) time complexity and is a perfectly acceptable solution. Here’s the implementation:

ListNode* detectCycle(ListNode* A) { | |

if (A == NULL || A->next == NULL || A->next->next == NULL) return NULL; | |

ListNode *slow = A, *fast = A; | |

while (fast->next && fast->next->next) { | |

slow = slow->next; | |

fast = fast->next->next; | |

if (slow == fast) | |

break; | |

} | |

if (slow != fast) return NULL; | |

int length = 1; | |

for (fast = fast->next; fast != slow; fast = fast->next, length++); | |

for (fast = A; length > 0; fast = fast->next, length--); | |

for (slow = A; slow != fast; slow = slow->next, fast = fast->next); | |

return slow; | |

} |

There’s still one more optimization that can be done. We don’t need to calculate the length of the cycle.

Let k = y+z be the length of the cycle, and n, m be the number of times the fast and slow pointers respectively go through the cycle before meeting. Since the fast pointer was moving at twice the speed of slow pointer,

Distance traveled by fast pointer = 2 * distance traveled by slow pointer x + y + n*k = 2 * (x + y + m*k) x + y = (n-2m) * k

This means x+y is a multiple of k. If we start moving one pointer, say a, from the start and another, say b, from the meeting point at the same speed of 1 position/step, when a reaches the beginning of the linked list, having moved through x nodes, b would have moved through x nodes as well, but since x+y is a multiple of k and b started y positions ahead of the beginning of the cycle, they both meet at the beginning of the cycle.

This completes the proof. This solution also has the same O(N) time complexity but I don’t think someone who hasn’t seen this trick before can be expected to discover this in an interview. Here’s the implementation:

ListNode* detectCycle(ListNode* A) { | |

if (A == NULL || A->next == NULL || A->next->next == NULL) return NULL; | |

ListNode *slow = A, *fast = A; | |

while (fast->next && fast->next->next) { | |

slow = slow->next; | |

fast = fast->next->next; | |

if (slow == fast) | |

break; | |

} | |

if (slow != fast) return NULL; | |

for (fast = A; slow != fast; slow = slow->next, fast = fast->next); | |

return slow; | |

} |

Problem of the day for today is Longest palindromic substring: *Given a string, find its longest palindromic substring.*. For example, given the string “ababab”, the longest palindromic substring would be “ababa”.

The obvious brute-force solution is to check each substring for being a palindrome. In a string of length N, since there are only N*(N+1)/2 substrings and checking each of them takes at most O(N) time, the overall time complexity is O(N^{3}).

It’s easy to see that the brute-force solution is doing a lot of repeated computation. If S[3..4] is not a palindrome, then S[2..5] and S[1..6] are not going to be palindromes and there’s no point in checking. Similarly, if S[3..4] is a palindrome, then for S[2..5] to be a palindrome, we need only check S[2] == S[5]. We can get rid of this repeated computation with the following dynamic programming formulation:

P(i, j) = true if i >= j else P(i+1, j-1) && S[i] == S[j]

We can keep track of the largest palindromic length and its starting position, leading to the following simple implementation:

string longestPalindrome(string A) { | |

int N = A.size(); | |

vector<vector<bool> > P(N+1, vector<bool>(N+1, true)); | |

int start = 0, len = 1; | |

for (int k = 1; k < N; ++k) { | |

for (int i = 1; i <= N-k; ++i) { | |

P[i][i+k] = P[i+1][i+k-1] && (A[i-1] == A[i+k-1]); | |

if (P[i][i+k] && k+1 > len) | |

start = i-1, len = k+1; | |

} | |

} | |

string result(A.begin() + start, A.begin() + start + len); | |

return result; | |

} |

This reduces the time complexity to O(N^{2}) but it requires O(N^{2}) additional space.

There’s another clever way to think about this problem: Think of growing a palindromic substring from the center by expanding outwards if the characters on both sides match. We can determine the length of a palindromic substring situated at a given center in O(N) and there are only 2*(N-1) possible centers (one at each vertex and one between every two consecutive vertices). This will give us O(N^{2}) time complexity with no additional space requirement. Here’s the implementation:

int palindromeLengthFromCenter(string& A, int left, int right) { | |

int originalLeft = left; | |

while (left >= 0 && right < A.size() && A[left] == A[right]) { | |

left--; | |

right++; | |

} | |

return originalLeft - left; | |

} | |

string longestPalindrome(string A) { | |

int maxLen = 1, maxStart = 0; | |

for (int i = 0; i < A.size(); ++i) { | |

int s1 = palindromeLengthFromCenter(A, i-1, i+1); // center is at position i | |

int len1 = 1 + 2 * s1; | |

if (len1 > maxLen) | |

maxLen = len1, maxStart = i - s1; | |

int s2 = palindromeLengthFromCenter(A, i, i+1); // center is between positions i and i+1 | |

int len2 = 2 * s2; | |

if (len2 > maxLen) | |

maxLen = len2, maxStart = i - s2 + 1; | |

} | |

string result(A.begin() + maxStart, A.begin() + maxStart + maxLen); | |

return result; | |

} |

This code is still doing some repeated computation. Consider the string “abababa”, once we have calculated the palindrome lengths centered at positions 1, 2, 3, 4 (1-based indexing), we can utilize the symmetry of palindromes around their center and no longer need to calculate the palindrome lengths for positions 5, 6, 7. This coupled with a few other tricks reduces the time complexity to O(N) in manacher’s algorithm. I think it’s too complicated to remember or derive on your own and O(N^{2}) solution is good enough for interview purposes.