Today I was looking at problems related to linked lists. They tend to be asked quite a lot in interviews but some of them look artificial with their complexity arising from linked lists being a poor data structure for the problem (e.g., median of a sorted linked list). The problem of the day for today is Detecting cycle in a linked list: Given a linked list, return the node where the cycle begins or null if there is no cycle. For example, in the image below, 4 should be returned.
I don’t want to just provide the solution but think through how an interviewer will describe the problem and how an interviewee may approach it. Typically, interviewers fist describe a part of the problem and then keep adding to it. For this problem, an interviewer may start by just asking to check if there’s a cycle in a linked list or not and after that’s done, ask for the node where the cycle begins.
The obvious solution that an interviewee can be expected to come up with would use a hashset to store visited nodes and declare a cycle if it visits a previously visited node. Incidentally, it also solves the second part of the problem because the first node that is visited again is where the cycle begins. Here’s the simple implementation:
For a linked list of N nodes, it has O(N) space and time complexity – pretty good, but now the interviewer would ask for a solution with O(1) auxiliary space complexity. At this point, the interviewee may require a hint or two but for the first part of the problem, (s)he can be expected to come up with the following approach: using two pointers – slow advanced one position and fast advanced two positions in each step and checking if they ever meet. It’s not that hard to come up with this approach for someone who hasn’t seen this before. If there is a cycle, both slow and fast pointers will enter it and while the fast pointer is moving away from the slow pointer at the speed of 1 position/step in one direction, it’s moving towards it at the same speed in the opposite direction so they are guaranteed to meet. Once both the pointers enter the cycle (of maximum length N), they have to meet after at most N-1 more steps, resulting in O(N) time complexity.
Now for the second part of the problem – finding the node where the cycle begins, this is a little tricky. Someone who has seen this problem before would know that this is solved by a trick in the Floyd’s cycle-finding algorithm. At this point, the interviewee may require a hint or two and can be expected to come up with this solution: after detecting a cycle using the above two-pointer approach, determine the length of the cycle, say k, and use two pointers – one at the start and another k positions ahead of start, with both being advanced 1 position/step. They will meet at the node at the beginning of the cycle. This still has O(N) time complexity and is a perfectly acceptable solution. Here’s the implementation:
There’s still one more optimization that can be done. We don’t need to calculate the length of the cycle.
Let k = y+z be the length of the cycle, and n, m be the number of times the fast and slow pointers respectively go through the cycle before meeting. Since the fast pointer was moving at twice the speed of slow pointer,
Distance traveled by fast pointer = 2 * distance traveled by slow pointer x + y + n*k = 2 * (x + y + m*k) x + y = (n-2m) * k
This means x+y is a multiple of k. If we start moving one pointer, say a, from the start and another, say b, from the meeting point at the same speed of 1 position/step, when a reaches the beginning of the linked list, having moved through x nodes, b would have moved through x nodes as well, but since x+y is a multiple of k and b started y positions ahead of the beginning of the cycle, they both meet at the beginning of the cycle.
This completes the proof. This solution also has the same O(N) time complexity but I don’t think someone who hasn’t seen this trick before can be expected to discover this in an interview. Here’s the implementation: